Universal data pipeline

Description

TECHNICAL FIELD

The disclosed technologies relate generally to data pipeline computer systems and, more particularly, to a data pipeline computer system with methodology for preserving history of datasets.

BACKGROUND

Computers are very powerful tools for processing data. A computerized data pipeline is a useful mechanism for processing large amounts of data. A typical data pipeline is an ad-hoc collection of computer software scripts and programs for processing data extracted from “data sources” and for providing the processed data to “data sinks”. As an example, a data pipeline for a large insurance company that has recently acquired a number of smaller insurance companies may extract policy and claim data from the individual database systems of the smaller insurance companies, transform and validate the insurance data in some way, and provide validated and transformed data to various analytical platforms for assessing risk management, compliance with regulations, fraud, etc.

Between the data sources and the data sinks, a data pipeline system is typically provided as a software platform to automate the movement and transformation of data from the data sources to the data sinks. In essence, the data pipeline system shields the data sinks from having to interface with the data sources or even being configured to process data in the particular formats provided by the data sources. Typically, data from the data sources received by the data sinks is processed by the data pipeline system in some way. For example, a data sink may receive data from the data pipeline system that is a combination (e.g., a join) of data of from multiple data sources, all without the data sink being configured to process the individual constituent data formats.

One purpose of a data pipeline system is to execute data transformation steps on data obtained from data sources to provide the data in format expected by the data sinks. A data transformation step may be defined as a set of computer commands or instructions which, when executed by the data pipeline system, transforms one or more input datasets to produce one or more output or “target” datasets. Data that passes through the data pipeline system may undergo multiple data transformation steps. Such a step can have dependencies on the step or steps that precede it. One example of a computer system for carrying out data transformation steps in a data pipeline is the well-known MapReduce system. See, e.g., Dean, Jeffrey, et al., “MapReduce: Simplified Data Processing on Large Clusters”, Google, Inc., 2004.

Often, data pipeline systems are maintained “by hand”. That is, a software engineer or system administrator is responsible for configuring the system so that data transformation steps are executed in the proper order and on the correct datasets. If a data transformation step needs to be added, removed, or changed, the engineer or administrator typically must reconfigure the system by manually editing control scripts or other software programs. Similar editing tasks may be needed before the pipeline can process new datasets. Overall, current approaches for maintaining existing data pipeline systems may require significant human resources.

Another problem with existing data pipeline systems is the lack of dataset versioning. In these systems, when a dataset needs to be updated with new data, the data transformation step typically overwrites the old version of the dataset with the new version. This can be problematic if it is suspected or discovered thereafter that the old version of the dataset contained incorrect data that the new version does not contain. For example, the old version of the dataset may have been imported into an analytical software program which generated anomalous results based on the incorrect data. In this case, since the old version is lost when the new version is generated, it can be difficult to track down the source of the incorrect data.

Given the increasing amount of data collected by businesses and other organizations, processing data of all sorts through data pipeline systems can only be expected to increase. This trend is coupled with a need for a more automated way to maintain such systems and for the ability to trace and track data, including old versions of the data, as it moves through the data pipeline from data sources to data sinks.

The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a very general block diagram of an example computing device which may be used for implementing the disclosed technologies.

FIG. 2 is a block diagram of an example software system for controlling the operation of the computing device of FIG. 1.

FIG. 3 is a block diagram of an example distributed computing environment in which the disclosed technologies may be implemented.

FIG. 4 is a block diagram of a history preserving data pipeline system that implements the disclosed technologies, according to an embodiment of the present invention.

FIG. 5 is a block diagram of a build catalog entry, according to an embodiment of the present invention.

FIG. 6 is a block diagram of a derivation program entry, according to an embodiment of the present invention.

FIG. 7 is a block diagram of a transaction entry, according to an embodiment of the present invention.

FIG. 8 illustrates a simple example of a build dependency graph, according to an embodiment of the present invention.

FIG. 9 is an interaction diagram of a transaction protocol facilitated by a transaction service, according to an embodiment the present invention.

FIG. 10 is a flowchart illustrating steps of a computer-implemented process for preserving history of a derived dataset, according to an embodiment of the present invention.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed technologies. It will be apparent, however, that the disclosed technologies can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the disclosed technologies. As to the flowcharts, each block within the flowcharts represents both a method step and an apparatus element for performing the method step. Depending upon the requirements of the particular implementation at hand, the corresponding apparatus element may be configured in hardware, software, firmware or combinations thereof.

Overview

Given the deficiencies of current manual and ad-hoc approaches for implementing and managing a data pipeline system, a more automated and integrated approach would clearly be preferable. In accordance with an embodiment of the disclosed technologies, a history preserving data pipeline system is provided.

In one aspect, the history preserving data pipeline system improves on existing data pipeline technologies to provide “immutable” and “versioned” datasets. A dataset may be defined as a named collection of data. The datasets are “immutable” in the sense that it is not necessary to overwrite existing dataset data in order modify the dataset. The datasets are “versioned” in the sense that modifications to a dataset, including historical modifications, are separately identifiable.

Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset. More generally, the history preserving data pipeline system improves on existing data pipeline systems by providing the ability to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the dataset and even if the data source data is no longer available from the data source.

In another aspect, the history preserving data pipeline system improves on existing data pipeline technologies to provide immutable and versioned “derived” datasets. A derived dataset may be defined as a dataset that is generated (built) by executing a “derivation program”, potentially providing one or more other datasets as input to the derivation program. When executed, the derivation program may perform one or more operations on the input dataset(s). For example, the derivation program may transform the data in the input dataset(s) in some way to produce the derived dataset. For example, a derivation program may produce a derived dataset by filtering records in an input dataset to those comprising a particular value or set of values, or by joining together two related input datasets, or by replacing references in an input dataset to values in another input dataset with actual data referenced. Because derived datasets, like datasets generally, are immutable and versioned in the system, it is possible to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the derived dataset and even if the data source data is no longer available from the data source.

In yet another aspect, the history preserving data pipeline system improves on existing data pipeline systems by versioning derivation programs. By doing so, not only does the system provide the ability to trace dataset data to the data source data the dataset data is based on, but also, if the dataset is a derived dataset, to the version of the derivation program used to build the derived dataset. This is useful for tracking down errors in dataset data caused by errors or “bugs” (i.e., programming errors) in the version of the derivation program that was executed to build the dataset.

In yet another aspect, the history preserving data pipeline system improves on existing data pipeline systems by maintaining “build dependency data”. The build dependency data represents one or more directed acyclic graphs of build dependencies. From the build dependency data, the system can determine, for a given dataset, the order in which to build other datasets before the given dataset can be built. By doing so, human engineers are alleviated from some manual tasks required by existing data pipeline systems related to maintaining and determining dataset build dependencies.

These and other aspects of the history preserving data pipeline system are described in greater detail elsewhere in this document. First, however, an example of the basic underlying computer components that may be employed for implementing the disclosed technologies are described.

Basic Computing Environment

The disclosed technologies may be implemented on one or more computing devices. Such a computing device may be implemented in various forms including, but not limited to, a client, a server, a network device, a mobile device, a cell phone, a smart phone, a laptop computer, a desktop computer, a workstation computer, a personal digital assistant, a blade server, a mainframe computer, and other types of computers. The computing device described below and its components, including their connections, relationships, and functions, is meant to be exemplary only, and not meant to limit implementations of the disclosed technologies described in this specification. Other computing devices suitable for implementing the disclosed technologies may have different components, including components with different connections, relationships, and functions.

Basic Computing Device

FIG. 1 is a block diagram that illustrates an example of a computing device 100 suitable for implementing the disclosed technologies. Computing device 100 includes bus 102 or other communication mechanism for addressing main memory 106 and for transferring data between and among the various components of device 100. Computing device 100 also includes one or more hardware processors 104 coupled with bus 102 for processing information. A hardware processor 104 may be a general purpose microprocessor, a system on a chip (SoC), or other processor suitable for implementing the described technologies.

Main memory 106, such as a random access memory (RAM) or other dynamic storage device, is coupled to bus 102 for storing information and instructions to be executed by processor(s) 104. Main memory 106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor(s) 104. Such instructions, when stored in non-transitory storage media accessible to processor(s) 104, render computing device 100 into a special-purpose computing device that is customized to perform the operations specified in the instructions.

Computing device 100 further includes read only memory (ROM) 108 or other static storage device coupled to bus 102 for storing static information and instructions for processor(s) 104.

One or more mass storage devices 110 are coupled to bus 102 for persistently storing information and instructions on fixed or removable media, such as magnetic, optical, solid-state, magnetic-optical, flash memory, or any other available mass storage technology. The mass storage may be shared on a network, or it may be dedicated mass storage. Typically, at least one of the mass storage devices 110 (e.g., the main hard disk for the device) stores a body of program and data for directing operation of the computing device, including an operating system, user application programs, driver and other support files, as well as other data files of all sorts.

Computing device 100 may be coupled via bus 102 to display 112, such as a liquid crystal display (LCD) or other electronic visual display, for displaying information to a computer user. Display 112 may also be a touch-sensitive display for communicating touch gesture (e.g., finger or stylus) input to processor(s) 104.

An input device 114, including alphanumeric and other keys, is coupled to bus 102 for communicating information and command selections to processor 104.

Another type of user input device is cursor control 116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 104 and for controlling cursor movement on display 112. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.

Computing device 100 may implement the methods described herein using customized hard-wired logic, one or more application-specific integrated circuits (ASICs), one or more field-programmable gate arrays (FPGAs), firmware, or program logic which, in combination with the computing device, causes or programs computing device 100 to be a special-purpose machine.

Methods disclosed herein may also be performed by computing device 100 in response to processor(s) 104 executing one or more sequences of one or more instructions contained in main memory 106. Such instructions may be read into main memory 106 from another storage medium, such as storage device(s) 110. Execution of the sequences of instructions contained in main memory 106 causes processor(s) 104 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a computing device to operate in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical disks, magnetic disks, or solid-state drives, such as storage device 110. Volatile media includes dynamic memory, such as main memory 106. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid-state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor(s) 104 for execution. For example, the instructions may initially be carried on a magnetic disk or solid-state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computing device 100 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 102. Bus 102 carries the data to main memory 106, from which processor(s) 104 retrieves and executes the instructions. The instructions received by main memory 106 may optionally be stored on storage device(s) 110 either before or after execution by processor(s) 104.

Computing device 100 also includes one or more communication interface(s) 118 coupled to bus 102. A communication interface 118 provides a two-way data communication coupling to a wired or wireless network link 120 that is connected to a local network 122 (e.g., Ethernet network, Wireless Local Area Network, cellular phone network, Bluetooth wireless network, or the like). Communication interface 118 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information. For example, communication interface 118 may be a wired network interface card, a wireless network interface card with an integrated radio antenna, or a modem (e.g., ISDN, DSL, or cable modem).

Network link(s) 120 typically provide data communication through one or more networks to other data devices. For example, a network link 120 may provide a connection through a local network 122 to a host computer 124 or to data equipment operated by an Internet Service Provider (ISP) 126. ISP 126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 128. Local network(s) 122 and Internet 128 use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link(s) 120 and through communication interface(s) 118, which carry the digital data to and from computing device 100, are example forms of transmission media.

Computing device 100 can send messages and receive data, including program code, through the network(s), network link(s) 120 and communication interface(s) 118. In the Internet example, a server 130 might transmit a requested code for an application program through Internet 128, ISP 126, local network(s) 122 and communication interface(s) 118.

The received code may be executed by processor 104 as it is received, and/or stored in storage device 110, or other non-volatile storage for later execution.

Basic Software System

FIG. 2 is a block diagram of a software system for controlling the operation of computing device 100. As shown, a computer software system 200 is provided for directing the operation of the computing device 100. Software system 200, which is stored in system memory (RAM) 106 and on fixed storage (e.g., hard disk) 110, includes a kernel or operating system (OS) 210. The OS 210 manages low-level aspects of computer operation, including managing execution of processes, memory allocation, file input and output (I/O), and device I/O. One or more application programs, such as client application software or “programs” 202 (e.g., 202A, 202B, 202C . . . 202N) may be “loaded” (i.e., transferred from fixed storage 110 into memory 106) for execution by the system 200. The applications or other software intended for use on the device 100 may also be stored as a set of downloadable computer-executable instructions, for example, for downloading and installation from an Internet location (e.g., Web server).

Software system 200 may include a graphical user interface (GUI) 215, for receiving user commands and data in a graphical (e.g., “point-and-click” or “touch gesture”) fashion. These inputs, in turn, may be acted upon by the system 200 in accordance with instructions from operating system 210 and/or client application module(s) 202. The GUI 215 also serves to display the results of operation from the OS 210 and application(s) 202, whereupon the user may supply additional inputs or terminate the session (e.g., log off).

The OS 210 can execute directly on the bare hardware (e.g., processor(s) 104) 220 of device 100. Alternatively, a hypervisor or virtual machine monitor (VMM) 230 may be interposed between the bare hardware 220 and the OS 210. In this configuration, VMM 230 acts as a software “cushion” or virtualization layer between the OS 210 and the bare hardware 220 of the device 100.

VMM 230 instantiates and runs virtual machine instances (“guest machines”). Each guest machine comprises a “guest” operating system, such as OS 210, and one or more applications, such as applications 202, designed to execute on the guest operating system. The VMM 230 presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems. In some instances, the VMM 230 may allow a guest operating system to run as through it is running on the bare hardware 220 of the device 100 directly. In these instances, the same version of the guest operating system configured to execute on the bare hardware 104 directly may also be able to execute on VMM 230 without modification or reconfiguration. In other words, VMM 230 may provide full hardware and CPU virtualization to a guest operating system in some instances. In other instances, a guest operating system may be specially designed or configured to execute on VMM 230 for efficiency. In these instances, the guest operating system is “aware” that it executes on a virtual machine monitor. In other words, VMM 230 may provide para-virtualization to a guest operating system in some instances.

The above-described computer hardware and software are presented for purpose of illustrating basic underlying computer components that may be employed for implementing the disclosed technologies. The disclosed technologies, however, are not limited to any particular computing environment or computing device configuration. Instead, the disclosed technologies may be implemented in any type of system architecture or processing environment capable of supporting the disclosed technologies presented in detail below.

Distributed Computing Environment

While the disclosed technologies may operate within a single standalone computing device (e.g., device 100 of FIG. 1), the disclosed technologies may be implemented in a distributed computing environment. FIG. 3 is a block diagram of a distributed computing environment 300 in which the disclosed technologies may be implemented.

As shown, environment 300 comprises a history preserving data pipeline system 310 that implements one or more embodiments of the disclosed technologies, one or more data sources 320 (e.g., 320A, 320B, 302C . . . 320N) that provide data to the pipeline system 310, and one or more data sinks 330 (e.g., 330A, 330B, 330C . . . 330N) that consume data from the pipeline system 310.

In general, the data sources 320 provide data to the pipeline system 310 and the data sinks 330 consume data from the pipeline system 310. The pipeline system 310 stores data it obtains from the data sources 320 and data it provides to data sinks 330 in datasets, which are named collections of data. As described in greater detail elsewhere in this document, datasets are immutable and versioned to facilitate tracing of dataset data through the data pipeline 310 including historical (i.e., not current) versions of dataset data. In an embodiment, the current version of a dataset is the latest (most recent) version of the dataset.

The pipeline system 310 also manages aspects of building derived datasets, which are datasets that are generated by executing the current version of an associated derivation program.

In an embodiment, the current version of a derivation program is the latest (most recent) version of the derivation program. The derivation program may generate the data in a derived dataset it creates based on data in one or more other datasets. Alternatively, the derivation program may generate derived dataset set independent of any input datasets. For example, a derivation program may obtain data from one or more data sources 320 directly and use the obtained data to generate data of a derived dataset. It is also possible for a derivation program to generate derived dataset data in this way where the derivation program also accepts one or more other datasets as input used for generating the derived dataset.

In many cases, data provided by a data source 320 to the pipeline system 310 that is consumed by a data sink 330 from the pipeline system 310 is not consumed by the data sink 330 in the same data format as which it was provided. In other words, the data pipeline 310 may transform data provided by a data source 320 in one or more data transformation steps before it is provided to a data sink 330. More specifically, derivation programs may transform data in datasets when generating (building) derived datasets in one or more data transformation steps before the derived datasets are provided to data sinks 330.

A data transformation step generally involves converting data in a “source” data format to data in a “target” data format. Such a data transformation step may involve mapping data elements of the data in the source data format to data elements in the target data format. Such mapping can be one-to-one, one-to-many, many-to-one, or many-to-many. In an embodiment, a data transformation step on dataset data is carried out, at least in part, with a data analytics cluster computing instance such as, for example, APACHE SPARK instance, an APACHE HIVE instance, or the like. For example, a derivation program may contain one or more SPARK SQL, HIVEQL, or GROOVY commands which, when executed by the data pipeline system 310, carry out one or more data transformation steps on dataset data.

Data Sources

A data source 320 (e.g., 320B) is any source of data provided to the data pipeline system 310 for storing in one or more datasets. A dataset may be defined as a named collection of data. From the perspective of a data source 320 (e.g., 320C), data provided by the data source to the pipeline system 310 can be structured, semi-structured, or unstructured data.

Structured data includes, but is not limited to, data that conforms to a well-known data model. Examples of structured data include, but are not limited to, data stored in a relational database and spreadsheet data.

Semi-structured data includes, but is not limited to, data that may not necessarily conform to a well-defined data model but nonetheless includes self-describing structure. Such self-describing structure may be in the form of tags, markup elements, or other syntactic elements that separate semantic elements from each other within the data and enforce hierarchical relationships between semantic elements. Non-limiting examples of semi-structured data include, but are not limited to, eXtensible Markup Language (XML) data and JavaScript Object Notation (JSON) data.

Unstructured data includes, but is not limited to, data that does not conform to a data model and does not contain self-describing structure. Examples of unstructured data include, but are not limited to, HyperText Markup Language (HTML) data (e.g., web pages) and other text data.

A data source 320 (e.g., 320A) typically comprises one or more non-volatile data storage devices (e.g., one or more hard disks, solid state drives, or the like) on which the provided data is physically stored. Typically, the data is physically stored in one or more data containers such as, for example, in one or more file system files or in one or more other suitable data containers (e.g., a disk block). The one or more data storage devices (and hence the data source) may be embodied in a single computing device or distributed across multiple computing devices.

A data source 320 (e.g., 320A) typically also comprises a data access mechanism that a data requesting mechanism can use to obtain data from the data source. Typically, the data access mechanism of a data source comprises one or more executing software programs (e.g., application program 202A) for reading data from one or more data containers of one or more data storage devices of the data source in response to a request for the data from a data requesting mechanism and for providing the requested data to the data requesting mechanism in response to the request.

Typically, the data requesting mechanism also comprises one or more executing software programs (e.g., application program 202B). The data requesting mechanism may be a component of or a component separate from a data source 320 from which it requests data. Non-limiting examples of a data access mechanism include a database management system server, a network file server, a web server, or other server. Examples of a data requesting mechanism include, but are not limited to, a client application or other application for requesting data from a server.

The request for data from a data requesting mechanism to the data access mechanism of a data source 320 (e.g., 320N) may be made according to a well-known inter-process communication protocol such as, for example, a well-known networking protocol such as, for example, the HyperText Transfer Protocol (HTTP), the Structured Query Language (SQL) or other database query language networking protocol, a Remote Procedure Call (RPC) protocol (e.g., the Simple Object Access Protocol (SOAP)), a Network File System (NFS) protocol, and so forth. The network request may also be cryptographically secured according to a cryptographic protocol (e.g., Transport Layer Security/Secure Sockets Layer (TLS/SSL)).

In some instances, a data requesting mechanism may not use an inter-process communication mechanism such as a networking protocol to request data from a data access mechanism of a data source 320 (e.g., 320B). For example, if the data source 320 (e.g., 320B) is one or more file system files, then a data requesting mechanism may use an operating system application programming interface (API) to read data from the file(s). In this example, the operating system is considered to be the data access mechanism.

The distributed computing environment 300 may have tens, hundreds, or even thousands or more data sources 320. Each of the data sources 320 may provide different data, possibly even in different data formats. As just one simple example, one data source 320 (e.g., 320A) may be a relational database server that provides rows of data, another data source 320 (e.g., 320B) may be a log file that stores log entries as lines of character data, and another data source 320 (e.g., 320C) may be a web service that provides data in one or more Simple Object Access Protocol (SOAP) messages. Overall, the data pipeline system 310 may be provided with heterogeneous data from multiple heterogeneous data sources 320.

A data requesting mechanism that provides data obtained from a data source 320 (e.g., 320B) to the history preserving data pipeline system 310 is referred to herein as a “data provider”. The environment 300 may comprise multiple data providers. For example, there could be a separate data provider for each data source 320 that is to provide data to the data pipeline system 310. As described in greater detail elsewhere in this document, a data provider can use a transaction service 318 to provide data to the data pipeline system 310.

Data Sinks

A data sink 330 (e.g., 330B) is any consumer of dataset data from the data pipeline system 310. For the perspective of a data sink 330 (e.g., 330C), the consumed data can be structured, semi-structured, or unstructured data.

A data sink 330 (e.g., 330A) typically comprises a data analysis mechanism for processing data obtained from the data pipeline system 310 in some particular way. Typically, the data analysis mechanism comprises one or more executing software programs (e.g., application program 202A) for analyzing, organizing, or otherwise processing data and presenting the results of data processing to a user. Examples of a data analysis mechanism include, but are not limited to, a graphical analysis software application or other software application for generating graphical charts, reports, or other graphical analysis of data in a graphical user interface. Another example of a data analysis mechanism is a text-based search engine that parses and indexes text data to provide a full-text searching service to users of the search engine.

The distributed computing environment 300 may have tens, hundreds, or even thousands or more data sinks 330. Each of the data sinks 330 may consume different data, possibly even in different data formats. Further, a data sink 330 (e.g., 330B) may consume data provided by one or more data sources 320. In other words, a data sink 330 may consume data obtained by the data pipeline system 310 from one data source 320 (e.g., 320A) or more than one data source 320 (e.g., 320A and 320B). Accordingly, a function of the data pipeline system 310 may be to combine data from multiple data sources 320 into a format that is consumable by a data sink 330. This is just one example of a possible function performed by the data pipeline system 310.

Overall, the environment 300 may comprise N data sources 320 and M data sinks 330 where N is equal to or different from M. Further, data the pipeline system 310 obtains from a data source 320 (e.g., 320B) may be provided by the pipeline system 310 to one or more data sinks 330 (e.g., one or more of 330A, 330B, 330C . . . 330N). Similarly, the pipeline system 310 may combine data obtained from multiple data sources 320 (e.g., 320A and 320B) and provide the combined data to one or more data sinks 330 (e.g., one or more of 330A, 330B, 330C . . . 330N). As data moves through the pipeline system 310 from the data sources 320 to the data sinks 330, a number of data transformation steps may be performed on the data to prepare the data obtained from the data sources 320 for consumption by the data sinks 330.

Environment 300 may include one or more data consuming mechanisms (“data consumers”) for consuming (obtaining) dataset data from the data pipeline system 310 and providing the obtained data to one or more data sinks 330. Typically, a data consumer comprises one or more executing software programs (e.g., application program 202C). The data consumer may be a component of or a component separate from a data sink 330 to which it provides data. A data consumer may provide data obtained from the data pipeline system 310 in any manner that is suitable to a data sink 330 to which it is providing the data. For example, the data consumer may store the obtained data in a database or in a file system file or send the obtained data to a data sink 330 over a network (e.g., in one or more Internet Protocol (IP) packets). As described in greater detail elsewhere in this document, a data consumer can use the transaction service 318 of the history preserving data pipeline system 310 to consume (obtained) dataset data from the pipeline system 310.

History Preserving Data Pipeline System

A history preserving data pipeline system 310 comprises a storage plane 312 and a logic plane 316.

The storage plane 312 may be implemented with one or more non-volatile data storage devices, which may be distributed across one or more computing devices (e.g., device 100) on one or more data networks. The storage plane 312 comprises data lake 313, build database 314, and transaction database 315.

The data lake 313 is where datasets are stored. In an exemplary embodiment, the data lake 313 comprises a distributed file system implemented with commodity computing devices. For example, the data lake 313 may comprise the APACHE HADOOP DISTRIBUTED FILE SYSTEM (HDFS) or other distributed file system built on commodity computing hardware. The data lake 313 may also comprise archive storage for storing older dataset versions and/or to serve as a backup for a primary storage system of the data lake 313 (e.g., a distributed file system). In one exemplary embodiment, the AMAZON GLACIER archive storage service is used for archiving older versions of datasets.

The build database 314 and the transaction database 315 store metadata supporting functionality provided by the logic plane 316 of the history preserving data pipeline system 310 including metadata for supporting immutable and versioned datasets and for determining dataset build dependencies. The metadata stored and maintained in the build database 314 and the transaction database 315 by the logic plane 316 is described in greater detail elsewhere in this document with respect to FIG. 4.

The build database 314 and the transaction database 315 may be implemented with one or more conventional database systems that store data in one or more tables. The build database 314 and the transaction database 315 may be managed by the same database system or different database systems. At a minimum, the implementing database system should support atomic row updates. However, support for multi-row transactions is not required. In an exemplary embodiment, the APACHE HBASE database system is used to implement the build database 314 and the transaction database 315. In another exemplary embodiment, the APACHE CASSANDRA database system is used to implement the build database 314 and the transaction database 315. Another possible database system that may be used to implement the build database 314 and the transaction database 315 is the POSTGRES (also known as POSTGRESQL) open source database system.

Logic plane 316 may be implemented as one or more software programs (e.g., one or more application programs 202) that are configured to execute on one or more computing devices (e.g., device 100). Logic plane 316 comprises to two services: a build service 317 and a transaction service 318.

The transaction service 318 provides support for atomically creating, and updating immutable and versioned datasets in the context of transactions. Data providers may use the transaction service 318 to create and update datasets in the data lake 313 with data obtained from data sources 320 in the context of transactions. Data consumers may use the transaction service 318 to read data from datasets in the data lake 313 in the context of transactions that is then provided to the data sinks 330. In some embodiments, the transaction service 318 ensures that the data that can be read from a dataset is only data that has already been committed to the dataset by a previously successful transaction.

The build service 317 leverages the transaction service 318 to provide immutable and versioned derived datasets. A derived dataset may be defined as a dataset that is generated (built) by applying a derivation program (or one or more sets of computer-executable instructions) to one or more other datasets. Thus, it can be said that a derived dataset has a dependency on at least one other “base” dataset. A base dataset may accordingly be defined as a dataset on which at least one derived dataset has a dependency.

According to some embodiments, a derivation program may be defined as a set of instructions associated with a derived dataset and which, when executed, uses the transaction service 318 to read data from the base dataset(s) in the context of a transaction, transforms and/or validates the data in some way, and uses the transaction service 318 to write the transformed and/or validated data to the derived dataset in the context of a transaction. Each transaction that modifies a dataset is assigned a transaction identifier by the transaction service 318 that is unique to at least that dataset. The transaction service 318 records the transaction identifier in the transaction database 315. By doing so, each transaction that modifies a dataset is separately identifiable by its assigned transaction identifier. In addition, the transaction service 318 orders transactions on a dataset by the time that they are committed with corresponding transaction commit identifiers.

In order to increase automation of the pipeline, the build service 317 may maintain build dependency data that represents one or more directed acyclic graphs of dataset build dependencies. From the build dependency data, the build service 317 can determine for a given derived dataset the order in which to build other derived datasets before the given derived dataset can be built. As result, it is no longer necessary for a human engineer to determine the order in which datasets need to be built.

When a new version of a derived dataset is built, the build service 317 may create a build catalog entry (e.g., a row or record) in the build database 314. The build catalog entry identifies the version(s) of the base dataset(s) from which the new version of the derived dataset was built. By doing so, it can be determined for any given version of a derived dataset, including historical versions, the version(s) of the base dataset(s) from which the version of the derived dataset was built. Further, because datasets, including derived datasets, are immutable, data of a historical version of a derived dataset can be traced to the data from which it was derived, even if that data is also historical.

The build service 317 may also version derivation programs for tracing and tracking purposes. In this case, the build catalog entry may also contain the version of the derivation program that was executed by the build service 317 to build the new version of the derived dataset.

The functionality of the build service 317 and the transaction service 318 are described in greater detail elsewhere in this document.

The build service 317 and the transaction service 318 may each provide an interface by which users and/or other software programs can invoke the services thereof by issuing one or more commands thereto and/or requests thereof. For example, the interface may be a graphical user interface, a command line interface, a networking interface, or an application programming interface (API).

History Preserving Data Pipeline System Operation

FIG. 4 is a block diagram illustrating the history preserving data pipeline system 310 in greater detail. As shown, there are at least three computer “users” of the system 310: a dataset builder, one or more data providers, and one or more data consumers.

Dataset Builder

The dataset builder periodically invokes the build service 317 to build derived datasets. For example, the dataset builder may send a network request to or otherwise invoke the build service 317 to build one or more specifically identified datasets or to build all datasets.

In an embodiment, the dataset builder issues a “build all” command to the build service 317 on a periodic basis (e.g., once a day). The build service 317 interprets the build all command as a command to build all known derived datasets that are “out-of-date”. Known datasets are those specified in the build dependency data 406. Generally, a derived dataset is out-of-date if no version of the derived dataset exists in the data lake 313 or the current version of the derived dataset in the data lake 313 is out-of-date.

The build dependency data 406 represents one or more directed acyclic graphs (also referred to herein as a “build dependency graph”). There may be multiple such graphs if, for example, none of the datasets represented by a graph has a build dependency on a dataset represented by another graph. Each graph comprises nodes and one or more directed edges connecting the nodes. A leaf node of a graph corresponds to a dataset that does not have any build dependencies on another dataset. A non-leaf node of a graph corresponds to a dataset that has a build dependency on at least one other dataset. A root node of a graph is a non-leaf node but where there are no build dependencies on the dataset represented by the root node. A graph may have only one root node or may have multiple root nodes. A directed edge connecting two nodes in a graph represents a build dependency between two datasets. A graph may be represented in a computer memory as an N-ary tree data structure or other suitable data structure.

To illustrate a build dependency graph by a simple example, consider graph 800 of FIG. 8. Each circle of graph 800 represents a node of the build dependency graph and each arrow connecting two circles of graph 800 represents a directed edge of the build dependency graph. The letter in each circle of graph 800 represents the name of the dataset represented by the corresponding node. As shown, datasets F and A are represented by root nodes of the build dependency graph, datasets C, D, and E are represented by leaf nodes of the build dependency graph, and dataset B is represented by a non-leaf node of the build dependency graph. Also shown, dataset F has a build dependency on dataset C, dataset B has build dependencies on datasets C and D, and dataset A has build dependencies on datasets B, C, D, and E. Dataset A's build dependency on dataset C is transitive by way of dataset B. Datasets F and B may be considered the “parent” datasets of dataset C (and dataset C the “child” of datasets F and B), datasets B and A the parent datasets of dataset D (and dataset D the child of datasets B and A), and dataset A the parent dataset of datasets B, D, and E (and datasets B, D, and E the children of dataset A). However, dataset A is not considered a parent of dataset C and dataset C is not considered a child of dataset A.

Referring once again to FIG. 4, the dataset builder may be implemented as one or more computer programs or computer controls scripts (i.e., one or more sets of computer-executable instructions). The dataset builder may execute as part of the build service 317 and/or the transaction service 318 (i.e., in the same process space). Alternatively, the dataset builder may execute as a separate process from the process(es) of the build service 317 and the transaction service 318.

In an embodiment, the dataset builder implements a message queue between the transaction service 318 and the build service 317. When a new version of a dataset in the data lake 313 is created or updated in the context of a committed transaction, the transaction service 318 adds a message to the tail of the message queue specifying the name of the created or updated dataset and a version identifier for the new version of the dataset. In an embodiment, the version identifier for the new version of the dataset is a transaction identifier (e.g. 704) of the transaction that successfully committed the new version of the dataset.

The build service 317 removes messages from the head of the message queue. For each such message removed from the message queue, the build service 317 determines from build dependency data 406 any datasets that directly depend on the dataset named in the message. The datasets that directly depend on the named dataset can be identified in the build dependency data 406 from any parent node(s) of the node corresponding to the named dataset in a build dependency graph, assuming each node in the build dependency graph is associated in the build dependency data 406 with the name or identifier of the dataset the node represents.

In some embodiments, the build service 317 then builds new version(s) of the dataset(s) that directly depend on the named dataset with the aid of the transaction service 318. Assuming the new version(s) of the dataset(s) are successfully committed to the data lake 313, this causes the transaction service 318 to add message(s) to the message queue for the new version(s) of the derived dataset(s) that directly depend on the named dataset. The build service 317 continuously removes messages from the head of the message queue and builds new versions of datasets in this way until the message queue becomes empty (e.g., after a dataset that has no dependencies on it is built).

In some embodiments, the build service 317 only builds a new version of a given dataset that depends on (i.e., is a parent of) a dataset named in a message obtained from the message queue if the current version of the given dataset is out-of-date with respect to the named dataset. As explained in greater detail elsewhere in this document, to determine whether the current version of a dataset is out-of-date with respect to a child dataset, the build service 317 consults build catalog entries 404 stored in the build database 314.

The build service 317 can receive a command from the dataset builder to build a specifically named derived dataset. Alternatively, the build service 317 can receive a command from the dataset builder to build all derived datasets. In the latter case, the build service 317 may treat the command to build all derived datasets as one or more commands to build each derived dataset corresponding to a root node in the build dependency data 406. In both cases, the build service 317 may rebuild a given derived dataset only if the dataset is out-of-date with respect to its build dependencies.

To determine whether a given derived data is out-of-date with respect to its build dependencies, the build service 317 traverses the build dependency graph starting at the node corresponding to the given derived dataset and visits at least every non-leaf node in the graph sub-tree that is rooted at the node corresponding to the given derived dataset. During the traversal, nodes are visited in post-order according to a depth-first traversal algorithm. For example, referring briefly to FIG. 8, if the given dataset is A, then a post-order depth-first recursive traversal of graph 800 would visit the node for dataset C and the node for dataset D before visiting the node for dataset B and would visit the node for dataset D and the node for dataset E before visiting the node for dataset A.

For each non-leaf node visited during the traversal, a determination is made whether the current version of the derived dataset corresponding to the visited non-leaf node is out-of-date with respect to any of its child datasets. As described in greater detail elsewhere in this document, to determine whether the current version of a dataset is out-of-date with respect to a child dataset, the build service 317 consults build catalog entries 404 stored in the build database 314. If the current version of the derived dataset is out-of-date with respect to any of its child datasets, then the build service 317 executes the current version of the derivation program for the derived dataset to generate a new version of the derived dataset. After executing the new version of the derived dataset has been generated, the build service 317 adds a new build catalog entry (e.g., 404A) to the build database 314 reflecting the new version of the derived dataset. In an embodiment, datasets are recursively rebuilt if dependencies of the dataset to be rebuilt are also out-of-date.

Build Catalog Entries

In an embodiment, as exemplified in FIG. 5, a build catalog entry (e.g., 404A) corresponding to a non-leaf node in the build dependency data 406 may comprise a dataset name 502, a dataset version 504, and build dependency information 506. Build service 317 adds a new build catalog entry (e.g., 404A) to build database 314 each time a new version of a derived dataset is built and committed to the data lake 313 in the context of a transaction facilitated by the transaction service 318. Thus, build database 314 may store a build catalog entry for each version of a derived dataset, including the current version of the derived dataset and any historical (prior) versions of the derived dataset.

The dataset name 502 is a unique identifier of a derived dataset. The dataset name 502 may be used to identify the derived dataset across all versions of the derived dataset. In other words, the dataset name 502 may be the same in all build catalog entries 404 for all versions of the derived dataset.

The dataset version 504 is a unique identifier of a version of the derived dataset. Typically, the dataset version 504 is an ordinal or other information that can be used to determine whether the version of the derived dataset represented by the dataset version 504 happened before or happened after other versions of the derived dataset represented by other build catalog entries 404 in the build database 314 with the same dataset name 502. In an embodiment, the dataset version 504 is an identifier (e.g., a transaction commit identifier) assigned by the transaction service 318 to a commit of a transaction that stored the version 504 of the derived dataset to the data lake 313.

The build dependencies 506 may comprises a list of one or more dataset build dependencies 508 and a derivation program build dependency 510. The list of dataset build dependencies 508 correspond to any child datasets input to the version of the derivation program used to build the version 504 of the derived dataset. If no such datasets were input, then the list of dataset build dependencies 508 may be an empty list.

In an embodiment, each dataset build dependency (e.g., 508A) specifies the name and the version of a dataset that the version 504 of the derived dataset was built (generated) from. For example, the name and the version of a dataset build dependency (e.g., 508B) may correspond to a dataset name 502 and a dataset version 504 of a build catalog entry (e.g., 404A) for a version of a dataset that the version 504 of the derived dataset was generated (built) from.

In an embodiment, the derivation program build dependency 510 specifies the name and the version of a derivation program that the build service 317 executed to generate (build) the version 504 of the derived dataset. For example, the name and the version of the derivation program dependency 510 may correspond to a derivation program entry (e.g., 408A) for the version of the derivation program that was executed by the build service 317 to generate (build) the version 504 of the derived dataset.

In an embodiment, the build service 317 identifies the current version of a derived dataset by querying build catalog entries 404 for the build catalog entry (e.g., 404A) comprising the latest (most recent) dataset version 504 and having a dataset name 502 matching a name for the derived dataset specified as a query parameter.

In an embodiment, the build service 317 determines whether the current version of a given dataset is out-of-date based on the build catalog entry (e.g., 404A) for the current version of the given dataset. The current version of the given dataset may be considered out-of-date for any one of a number of reasons including because: 1) there is a version of the derivation program that is newer than the version used to build the current version of the given dataset, 2) there is a version of a child dataset that is newer the version of the child dataset from which the current version of the given dataset was built, or 3) a dependency of the given dataset on another dataset was added or removed.

Derivation Program Entries

As shown in FIG. 6, a derivation program entry (e.g., 408A) in the build database 314 may comprise a derivation program name or other identifier 602, a derivation program version 604, a list 606 of dataset dependencies 608, and the executable code 610 of the version 604 of the derivation program itself.

The derivation program name 602 is a unique identifier of a derivation program. The derivation program name 602 may be used to identify the derivation program across all versions of the derivation program. In other words, the derivation program name 602 may be the same in all derivation program entries (e.g., 408A) for all versions of the derivation program.

The derivation program version 604 is a unique identifier of a version of the derivation program. Typically, the derivation program version 604 is an ordinal or other information that can be used to determine whether the version of the derivation program represented by the derivation program version 604 happened before or happened after other versions of the derivation program represented by other build catalog entries 408 in the build database 314 with the same derivation program name 602. For example, if there are three versions of a derivation program, then three derivation program entries 408 may be stored in build database 314 all with the same program name 602 and with different derivation program versions 604. For example, the derivation program version 604 in the three derivation program entries could be 1, 2, and 3, respectively.

The derivation program entry 408A may also comprises a list 606 of one or more dataset dependencies 608. The list 606 of dataset dependencies 608 correspond to any datasets that the version 604 of the derivation program depends on. If the version 604 of the derivation program does not depend on any other datasets, then the list 606 of dataset build dependencies 608 may be an empty list.

In an embodiment, each dataset dependency (e.g., 608A) specifies the name of a dataset that the version 604 of the derivation program depends on. For example, the name of a dataset dependency (e.g., 608B) may correspond to a dataset name 502 of one or more build catalog entries 404 in the build database 314.

The derivation program code 610 comprises the actual computer-executable instructions of the version 604 of the derivation program. Alternatively, the derivation program code 610 comprises a pointer or address to a storage location of the actual computer-executable instructions.

In an embodiment, a dataset in build dependency data 406 is associated with a derivation program the dataset depends on. Such association can be made in the data 406 between the name (e.g., 502) of the dataset and the name (e.g. 602) of the derivation program.

In an embodiment, when a new derivation program entry (e.g., 408A) is added to the build database 314 for a new version of the derivation program, the direct dependencies in the build dependency data 406 for any datasets that depend on the derivation program are updated based on the list 606 of dataset dependencies 608 in the new derivation program entry.

For example, consider the following sequence of events: 1) build dependency data 406 indicates that dataset A has direct dependencies on datasets B and C and on derivation program P, and 2) a new derivation program entry is added to the build database 314 for a new version of the derivation program P, the new derivation program entry has a list 606 of dataset dependencies 608 of indicating datasets B, C, and D. In response to the new derivation program entry for derivation program entry P being added to build database 314, the build dependency data 406 may be updated to indicate that dataset A now has direct dependencies on datasets B, C and D.

In an embodiment, the build service 317 identifies the current version of a derivation program by querying derivation program entries 408 for the derivation program entry (e.g., 404A) comprising the latest (most recent) dataset version 604 and having a dataset name 602 matching a name for the derivation program specified as a query parameter.

Transaction Service

As mentioned, data providers provide data to the data pipeline system 310 obtained from data sources 320 and data consumers obtain data from the data pipeline system 310 and provide it to data sinks 330. To do so, the data providers and the data consumers may invoke the services of the transaction service 318.

The transaction service 318 facilitates writing immutable and versioned datasets in the context of transactions. To do so, the transaction service 318 implements a transaction protocol that the data providers and data consumers can invoke to carry out a transaction on a dataset.

As shown in FIG. 9, the transaction protocol for conducting write transaction 900 on a dataset comprises a start transaction command 902, one or more write dataset commands 904, and a commit command 908.

The transaction commands are issued by a client of the transaction service 318. The client may issue the commands to the transaction service 318 via an interface offered to the client by the transaction service 318. The interface may be, for example, an application programming interface accessible (invoke-able) over a network or from within a process. In an embodiment, the client is one of the build service 317, a data provider, or a data consumer. At any given time, the transaction service 318 may be facilitating transactions on multiple datasets on behalf of multiple clients. For example, one client may write to a dataset in the context of a transaction while another client is reading from the dataset in the context of a transaction.

A transaction on a dataset is initiated by a client issuing a start transaction command 902 providing the name of the dataset. In response to receiving the start transaction command 902, the transaction service 318 assigns a transaction identifier to the transaction. The transaction identifier uniquely identifies the transaction at least for the dataset. After assigning a transaction identifier to the transaction on the dataset, the transaction identifier is returned to the client.

Once a transaction has been started, the client can perform a number of write operations on the dataset.

For a write command 904, the client provides the name of the dataset, the transaction identifier, and the data to write to the dataset. In response, the transaction service 318 writes the data to a container 402 in the data lake 313. The container 402 may be a file in a distributed file system, for example. To support immutable datasets, the transaction service 318 does not overwrite or otherwise delete or remove existing data from the dataset. In some embodiments, this is accomplished by storing differences between dataset data. For example, the data of a first version of a dataset may be stored in a first container 402 in the data lake 313 and the differences or deltas between the first version of the dataset and a second version of the dataset may be stored in a second container 402 in the data lake 313. This delta encoding approach can be more space-efficient in terms of space consumed in the data lake 313 when compared to an approach where all data of each version of a dataset is stored in a separate container 402. If the write to the data lake 313 is successful, the transaction service 318 returns an acknowledgement of the success to the client. Otherwise, the acknowledgement may indicate that the write failed in which case the client may abort the transaction.

Once the client has finished writing to the dataset, the client may commit any writes to the dataset by issuing a commit command 908 providing the dataset name and the transaction identifier. To commit the transaction, the transaction service assigns a commit identifier 710 to the transaction and 318 automatically updates a transaction entry (e.g., 410A) for the transaction in the transaction database 315. If the transaction is successfully committed, the transaction service returns an acknowledgement to the client indicating so. Otherwise, the acknowledgement indicates that the commit operation 908 was not successful.

While the transaction service 318 may be used to write data to a dataset in the context of a transaction, the transaction service 318 may also facilitate reading committed data from a dataset version. To do so, a client may issue a read command to the transaction service 318. In the read command, the client may specify the name and the version of the dataset version to read data from. In response to receiving the read command, the transaction service 318 may consult (read) the transaction entry in the transaction database 315 for the dataset name and version specified in the read command, if one exists. To identify this transaction entry, the transaction service 318 may query the transaction database 315 for a transaction entry having a dataset name (e.g., 702) equal to the dataset name specified in the read command and having a transaction commit identifier (e.g., 710) equal to the dataset version specified in the read command. The query may also exclude any transaction entries that do not have a value for the transaction committed flag (e.g., 708) that indicates that the corresponding transaction was successfully committed. Alternatively, the query may include only transaction entries that have a value for the transaction committed flag (e.g., 708) that indicate that the corresponding transaction was successfully committed.

If a transaction entry exists for a transaction that successfully committed the dataset name and version specified in the read command, then the transaction service 318 may provide data from the dataset version to the client or otherwise provide access to the client to data from the dataset version. If the transaction was not successfully committed, then the transaction service 318 may not provide data from the dataset version to the client. In this case, the transaction service 318 may also return an error or other indication that the dataset version was not successfully committed or that the read command failed.

In an embodiment, a read command from a client specifies a dataset name but does not specify any particular dataset version. The transaction service 318 may interpret this read command as a command to read data from the latest (more recent) successfully committed version of the dataset identified by the dataset name specified in the read command. The transaction service 318 can identify the latest version of the dataset by identifying the transaction entry in the transaction database 315 having a dataset name (e.g., 702) equal to the dataset name specified in the read command that has a value for the transaction committed flag (e.g., 708) that indicates the transaction represented by the transaction entry was successfully committed and that has the highest transaction commit identifier (e.g., 704) among all transactions successfully committed for the dataset.

Transaction Entries

In an embodiment, as shown in FIG. 7, a transaction entry (e.g., 410A) comprises a dataset name 702, a transaction identifier 704, a transaction start timestamp 706, a transaction committed flag 708, a transaction commit identifier 710, and a list 712 of data lake container identifiers 714. In other embodiments, a transaction entry comprises more or less information that is shown in FIG. 7. For example, a transaction entry may also have a transaction commit timestamp in addition to the transaction start timestamp 706.

A transaction entry (e.g., 410A) for a transaction on a dataset may be created at a first time and updated at a second time. The first time corresponds to when the transaction is started and the second time corresponds to when the transaction is committed. For example, a transaction entry (e.g., 410A) may be created in response to a start transaction command 902 and then subsequently updated in response to a commit transaction command 908.

When a transaction entry (e.g., 410A) is created in the transaction database 315 in response to a start transaction command 902, the dataset name 702, the transaction identifier 704 and the transaction start time stamp 706 may be populated in the entry. The transaction start time stamp 706 may be a system clock time corresponding to when the transaction was started. For example, the transaction service 318 may obtain a system clock time in response to receiving a start transaction command 902 to use to populate the transaction start time stamp 706 in the created entry. The transaction committed flag 708 may also be set when the entry is created to indicate that the transaction has not yet committed. To indicate this, the flag 708 can be a predefined value (e.g., N or 0) or left blank (NULL). The flag 708 may be checked to determine whether the transaction was committed. For example, if the flag 708 is present in an entry (e.g., 408A) for a transaction and has a certain predefined value (e.g., Y, TRUE, or 1) that indicates that the transaction was successfully committed, then the transaction is considered to have been successfully committed.

When a transaction entry (e.g., 408A) is updated in the transaction database 315 in response to a commit transaction command 908, the transaction committed flag 708, the transaction commit identifier 710, and the list 712 of data lake container identifiers 714 may be updated in the entry. The update to the entry to indicate that the transaction has been committed is preferably performed atomically to avoid putting the transaction database 315 in an incomplete or inconsistent state. For example, the transaction service 318 may attempt to update a transaction entry in the transaction database 315 with a put if absent operation.

As mentioned, the flag 708 may be updated to a predefined value that indicates that the transaction was committed.

The transaction commit identifier 710 provides a total ordering of all committed transactions on the dataset identified by the dataset name 702 of the entry. The transaction service 318 may assign a transaction commit identifier 710 to a transaction in response to a command (e.g., 908) to commit the transaction. For example, the transaction commit identifier 710 may be an integer or any other type of value (e.g. a timestamp) that can used for total ordering of transactions on a dataset.

The list 712 of data lake container identifiers 714 identify one or more data containers 402 in the data lake 313 in which any data written by the transaction is stored. The one or more data containers 402 may contain just the data written by the transaction, for example, in the form of differences or deltas to prior version(s) of the dataset. Alternatively, the one or more data containers 402 may contain all data of the version of the dataset resulting from the transaction.

Method for Preserving History of Derived Datasets

The following description presents method steps that may be implemented using computer-executable instructions, for directing operation of a device under processor control. The computer-executable instructions may be stored on a computer-readable storage medium, such, as CD, DVD, hard disk, flash memory, or the like. The computer-executable instructions may also be stored as a set of downloadable computer-executable instructions, for example, for downloading and installation from an Internet location (e.g., a Web server).

Turning now to FIG. 10, it illustrates an example process 1000 performed by history preserving data pipeline system 310 for preserving history of a derived dataset.

The example process 1000 illustrates immutable and versioned derived datasets. Because the derived datasets, like datasets generally, are immutable and versioned in the system 310, it is possible to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the derived dataset and even if the data source data is no longer available from the data source

The example process 1000 also illustrates how the history preserving data pipeline system 310 improves on existing data pipeline systems by providing the ability to trace dataset data to the data source data from which the dataset data was derived or obtained, even if the dataset data is no longer in the current version of the dataset and even if the data source data is no longer available from the data source.

The example process 1000 also illustrates how the system 310 provides the ability to trace dataset data to the data source data the dataset data is based on, but also, if the dataset is a derived dataset, to the version of the derivation program used to build the derived dataset, which can be useful for tracking down errors in dataset data caused by errors or “bugs” (i.e., programming errors) in the version of the derivation program that was executed to build the dataset.

The example process 1000 also illustrates how the system 310 alleviates human engineers from some manual tasks required by existing data pipeline systems related to maintaining and determining dataset build dependencies.

At step 1002, the data lake 313 stores a first version of a derived dataset in one or more containers 402. At the same time, a first transaction entry for a first transaction that committed the first version of the derived dataset to the data lake 313 is stored in the transaction database 315. The first transaction entry comprises the name 702 of the derived dataset, the identifier 704 of the first transaction, a timestamp 706 indicating when the first transaction was started, a flag 708 indicating that the first transaction was successfully committed, a transaction commit identifier 710 indicating when the first transaction was committed, and a list of 710 of one or more data lake container identifiers 712 identifying one or more containers 402 in the data lake 313 containing data of the first version of the derived dataset.

At step 1004, in response to the first version of the derived dataset being successfully committed to the data lake 313, the build service 317 stores a first build catalog entry in the build database 314. The first build catalog entry comprises the name 502 of the derived dataset, a version identifier 504 for the first version of the derived dataset which can be, for example, the transaction commit identifier 710 stored in the first transaction entry for the first version of the derived dataset, and build dependencies 506 reflecting any dataset dependencies 508 the first version of the derived dataset has on other datasets. For example, the first version of the derived dataset may depend on (i.e., may have been built based on) at least a first version of another dataset and this dependency may be reflected in the build dependencies 506 of the first build catalog entry. The build dependencies 506 of the first build catalog entry may also reflect, through the derivation program build dependency 510, a first version of a derivation program used to build the first version of the derived dataset.

At step 1006, the transaction service 318 updates the other dataset (i.e., a dataset the first version of the derived dataset depends on) to produce a second version of the other dataset resulting in the data lake 313 storing the second version of the other dataset.

At step 1008, the data lake 313 stores a second version of the derived dataset in one or more containers 402. At the same time, a second transaction entry for a second transaction that committed the second version of the derived dataset to the data lake 313 is stored in the transaction database 315. The second transaction entry comprises the name 702 of the derived dataset, the identifier 704 of the second transaction, a timestamp 706 indicating when the second transaction was started, a flag 708 indicating that the second transaction was successfully committed, a transaction commit identifier 710 indicating when the second transaction was committed, and a list of 712 of one or more data lake container identifiers 714 identifying one or more containers 402 in the data lake 313 containing data for the second version of the derived dataset.

At step 1010, in response to the second version of the derived dataset being successfully committed to the data lake 313, the build service 317 stores a second build catalog entry in the build database 314. The second build catalog entry comprises the name 502 of the derived dataset, a version identifier 504 for the second version of the derived dataset which can be, for example, the transaction commit identifier 710 stored in the second transaction entry for the second version of the derived dataset, and build dependencies 506 reflecting any dataset dependencies 508 the second version of the derived dataset has on other datasets. For example, the second version of the derived dataset may depend on (i.e., may have been built based on) at least the second version of the other dataset and this dependency may be reflected in the build dependencies 506 of the second build catalog entry. The build dependencies 506 of the second build catalog entry may also reflect, through the derivation program build dependency 510, the first version of the derivation program used to build the second version of the derived dataset.

Extensions and Alternatives

While the invention is described in some detail with specific reference to a single-preferred embodiment and certain alternatives, there is no intent to limit the invention to that particular embodiment or those specific alternatives. Therefore, those skilled in the art will appreciate that modifications may be made to the preferred embodiment without departing from the teachings of the present invention.

Claims

1. A method comprising: at one or more computing devices comprising one or more processors and one or more storage media storing one or more computer programs executed by the one or more processors to perform the method, performing operations comprising:maintaining a build catalog comprising a plurality of build catalog entries;wherein each build catalog entry, of the plurality of build catalog entries, comprises: an identifier of a version of a derived dataset corresponding to the build catalog entry,one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, anda derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry;creating a new version of a particular derived dataset in context of a successful transaction;adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction;wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program;wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program;wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; andwherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions.
2. One or more non-transitory storage media storing one or more computer programs, the one or more computer programs comprising instructions for performing operations comprising: maintaining a build catalog comprising a plurality of build catalog entries;wherein each build catalog entry, of the plurality of build catalog entries, comprises: an identifier of a version of a derived dataset corresponding to the build catalog entry,one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, anda derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry;creating a new version of a particular derived dataset in context of a successful transaction; andadding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction;wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program;wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program;wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; andwherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions.
3. A system comprising: one or more hardware processors;one or more computer programs; andone or more storage media storing the one or more computer programs for execution by the one or more hardware processors, the one or more computer programs comprising instructions for performing operations comprising:maintaining a build catalog comprising a plurality of build catalog entries;wherein each build catalog entry, of the plurality of build catalog entries, comprises: an identifier of a version of a derived dataset corresponding to the build catalog entry,one or more dataset build dependencies of the version of the derived dataset corresponding to the build catalog entry, each of the one or more dataset build dependencies comprising an identifier of a version of a child dataset from which the version of the derived dataset corresponding to the build catalog entry is derived, anda derivation program build dependency of the version of the derived dataset corresponding to the build catalog entry, the derivation program build dependency comprising an identifier of a version of a derivation program executed to generate the version of the derived dataset corresponding to the build catalog entry;creating a new version of a particular derived dataset in context of a successful transaction;adding a new build catalog entry to the build catalog, the new build catalog entry comprising an identifier of the new version of the particular derived dataset, the identifier of the new version of the particular derived dataset being a transaction commit identifier assigned to the successful transaction;wherein the creating the new version of the particular derived dataset is based on executing a particular version of a particular derivation program;wherein the new build catalog entry comprises an identifier of the particular version of the particular derivation program;wherein the creating the new version of the particular derived dataset is based on providing one or more particular child dataset versions as input to the executing the particular version of the particular derivation program; andwherein the new build catalog entry comprises an identifier of each of the one or more particular child dataset versions.

PRIORITY CLAIM

This application is a continuation of U.S. patent application Ser. No. 14/879,916, filed, filed Oct. 9, 2015, which is a continuation of U.S. patent application Ser. No. 14/533,433, filed Nov. 5, 2014, now U.S. Pat. No. 9,229,952 issued Jan. 5, 2016, the entire contents of each of which is hereby incorporated by reference for all purposes, as if fully set forth herein. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application(s).

US Referenced Citations (763)

Number	Name	Date	Kind
5109399	Thompson	Apr 1992	A
5329108	Lamoure	Jul 1994	A
5632009	Rao et al.	May 1997	A
5670987	Doi et al.	Sep 1997	A
5781704	Rossmo	Jul 1998	A
5798769	Chiu et al.	Aug 1998	A
5818737	Orr et al.	Oct 1998	A
5845300	Comer	Dec 1998	A
6057757	Arrowsmith et al.	May 2000	A
6091956	Hollenberg	Jul 2000	A
6094653	Li et al.	Jul 2000	A
6161098	Wallman	Dec 2000	A
6167405	Rosensteel, Jr. et al.	Dec 2000	A
6219053	Tachibana et al.	Apr 2001	B1
6232971	Haynes	May 2001	B1
6247019	Davies	Jun 2001	B1
6279018	Kudrolli et al.	Aug 2001	B1
6289338	Stoffel et al.	Sep 2001	B1
6341310	Leshem et al.	Jan 2002	B1
6366933	Ball et al.	Apr 2002	B1
6369835	Lin	Apr 2002	B1
6430305	Decker	Aug 2002	B1
6456997	Shukla	Sep 2002	B1
6463404	Appleby	Oct 2002	B1
6523172	Martinez-Guerra et al.	Feb 2003	B1
6539538	Brewster et al.	Mar 2003	B1
6549752	Tsukamoto	Apr 2003	B2
6549944	Weinberg et al.	Apr 2003	B1
6560620	Ching	May 2003	B1
6581068	Bensoussan et al.	Jun 2003	B1
6594672	Lampson et al.	Jul 2003	B1
6631496	Li et al.	Oct 2003	B1
6640231	Andersen et al.	Oct 2003	B1
6642945	Sharpe	Nov 2003	B1
6643613	McGee et al.	Nov 2003	B2
6714936	Nevin, III	Mar 2004	B1
6748481	Parry et al.	Jun 2004	B1
6775675	Nwabueze et al.	Aug 2004	B1
6820135	Dingman	Nov 2004	B1
6828920	Owen et al.	Dec 2004	B2
6839745	Dingari et al.	Jan 2005	B1
6877137	Rivette et al.	Apr 2005	B1
6976210	Silva et al.	Dec 2005	B1
6978419	Kantrowitz	Dec 2005	B1
6980984	Huffman et al.	Dec 2005	B1
6985950	Hanson et al.	Jan 2006	B1
7027974	Busch et al.	Apr 2006	B1
7028223	Kolawa et al.	Apr 2006	B1
7036085	Barros	Apr 2006	B2
7043702	Chi et al.	May 2006	B2
7055110	Kupka et al.	May 2006	B2
7089541	Ungar	Aug 2006	B2
7117430	Maguire et al.	Oct 2006	B2
7139800	Bellotti et al.	Nov 2006	B2
7158878	Rasmussen et al.	Jan 2007	B2
7162475	Ackerman	Jan 2007	B2
7168039	Bertram	Jan 2007	B2
7171427	Witowski et al.	Jan 2007	B2
7194680	Roy et al.	Mar 2007	B1
7237192	Stephenson et al.	Jun 2007	B1
7240330	Fairweather	Jul 2007	B2
7269786	Malloy et al.	Sep 2007	B1
7278105	Kitts	Oct 2007	B1
7290698	Poslinski et al.	Nov 2007	B2
7333998	Heckerman et al.	Feb 2008	B2
7370047	Gorman	May 2008	B2
7379811	Rasmussen et al.	May 2008	B2
7379903	Caballero et al.	May 2008	B2
7426654	Adams et al.	Sep 2008	B2
7451397	Weber et al.	Nov 2008	B2
7454466	Bellotti et al.	Nov 2008	B2
7467375	Tondreau et al.	Dec 2008	B2
7487139	Fraleigh et al.	Feb 2009	B2
7502786	Liu et al.	Mar 2009	B2
7525422	Bishop et al.	Apr 2009	B2
7529727	Arning et al.	May 2009	B2
7529734	Dirisala	May 2009	B2
7533069	Fairweather	May 2009	B2
7558677	Jones	Jul 2009	B2
7574409	Patinkin	Aug 2009	B2
7574428	Leiserowitz et al.	Aug 2009	B2
7579965	Bucholz	Aug 2009	B2
7596285	Brown et al.	Sep 2009	B2
7614006	Molander	Nov 2009	B2
7617232	Gabbert et al.	Nov 2009	B2
7620628	Kapur et al.	Nov 2009	B2
7627812	Chamberlain et al.	Dec 2009	B2
7634717	Chamberlain et al.	Dec 2009	B2
7685083	Fairweather	Mar 2010	B2
7703021	Flam	Apr 2010	B1
7706817	Bamrah et al.	Apr 2010	B2
7712049	Williams et al.	May 2010	B2
7716077	Mikurak	May 2010	B1
7725530	Sah et al.	May 2010	B2
7725547	Albertson et al.	May 2010	B2
7730082	Sah et al.	Jun 2010	B2
7730109	Rohrs et al.	Jun 2010	B2
7739246	Mooney et al.	Jun 2010	B2
7756843	Palmer	Jul 2010	B1
7761407	Stern	Jul 2010	B1
7770100	Chamberlain et al.	Aug 2010	B2
7805457	Viola et al.	Sep 2010	B1
7809703	Balabhadrapatruni et al.	Oct 2010	B2
7814084	Hallett et al.	Oct 2010	B2
7818658	Chen	Oct 2010	B2
7870493	Pall et al.	Jan 2011	B2
7877421	Berger et al.	Jan 2011	B2
7894984	Rasmussen et al.	Feb 2011	B2
7899611	Downs et al.	Mar 2011	B2
7899796	Borthwick et al.	Mar 2011	B1
7917376	Bellin et al.	Mar 2011	B2
7920963	Jouline et al.	Apr 2011	B2
7933862	Chamberlain et al.	Apr 2011	B2
7941321	Greenstein et al.	May 2011	B2
7962281	Rasmussen et al.	Jun 2011	B2
7962495	Jain et al.	Jun 2011	B2
7962848	Bertram	Jun 2011	B2
7970240	Chao et al.	Jun 2011	B1
7971150	Raskutti et al.	Jun 2011	B2
7984374	Caro et al.	Jun 2011	B2
8001465	Kudrolli et al.	Aug 2011	B2
8001482	Bhattiprolu et al.	Aug 2011	B2
8010545	Stefik et al.	Aug 2011	B2
8015487	Roy et al.	Sep 2011	B2
8024778	Cash et al.	Sep 2011	B2
8036632	Cona et al.	Oct 2011	B1
8036971	Aymeloglu et al.	Oct 2011	B2
8046283	Burns	Oct 2011	B2
8054756	Chand et al.	Nov 2011	B2
8103543	Zwicky	Jan 2012	B1
8117022	Linker	Feb 2012	B2
8132149	Shenfield et al.	Mar 2012	B2
8134457	Velipasalar et al.	Mar 2012	B2
8145703	Frishert et al.	Mar 2012	B2
8185819	Sah et al.	May 2012	B2
8196184	Amirov et al.	Jun 2012	B2
8214361	Sandler et al.	Jul 2012	B1
8214490	Vos et al.	Jul 2012	B1
8214764	Gemmell et al.	Jul 2012	B2
8225201	Michael	Jul 2012	B2
8229902	Vishniac et al.	Jul 2012	B2
8229947	Fujinaga	Jul 2012	B2
8230333	Decherd et al.	Jul 2012	B2
8271461	Pike et al.	Sep 2012	B2
8271948	Talozi et al.	Sep 2012	B2
8280880	Aymeloglu et al.	Oct 2012	B1
8290838	Thakur et al.	Oct 2012	B1
8290926	Ozzie et al.	Oct 2012	B2
8290942	Jones et al.	Oct 2012	B2
8301464	Cave et al.	Oct 2012	B1
8301904	Gryaznov	Oct 2012	B1
8302855	Ma et al.	Nov 2012	B2
8312367	Foster	Nov 2012	B2
8312546	Alme	Nov 2012	B2
8332354	Chatterjee	Dec 2012	B1
8352881	Champion et al.	Jan 2013	B2
8368695	Howell et al.	Feb 2013	B2
8397171	Klassen et al.	Mar 2013	B2
8412707	Mianji	Apr 2013	B1
8418085	Snook et al.	Apr 2013	B2
8447722	Ahuja et al.	May 2013	B1
8452790	Mianji	May 2013	B1
8463036	Ramesh et al.	Jun 2013	B1
8473454	Evanitsky et al.	Jun 2013	B2
8484115	Aymeloglu et al.	Jul 2013	B2
8489331	Kopf et al.	Jul 2013	B2
8489623	Jain et al.	Jul 2013	B2
8489641	Seefeld et al.	Jul 2013	B1
8510304	Briggs	Aug 2013	B1
8510743	Hackborn et al.	Aug 2013	B2
8514082	Cova et al.	Aug 2013	B2
8515207	Chau	Aug 2013	B2
8554579	Tribble et al.	Oct 2013	B2
8554653	Falkenborg et al.	Oct 2013	B2
8554709	Goodson et al.	Oct 2013	B2
8560494	Downing	Oct 2013	B1
8577911	Stepinski et al.	Nov 2013	B1
8589273	Creeden et al.	Nov 2013	B2
8595234	Siripuapu et al.	Nov 2013	B2
8601326	Kirn	Dec 2013	B1
8620641	Farnsworth et al.	Dec 2013	B2
8639757	Zang et al.	Jan 2014	B1
8646080	Williamson et al.	Feb 2014	B2
8676857	Adams et al.	Mar 2014	B1
8688573	Ruknoic et al.	Apr 2014	B1
8689108	Duffield et al.	Apr 2014	B1
8689182	Leithead et al.	Apr 2014	B2
8713467	Goldenberg et al.	Apr 2014	B1
8726379	Stiansen et al.	May 2014	B1
8739278	Varghese	May 2014	B2
8742934	Sarpy et al.	Jun 2014	B1
8744890	Bernier	Jun 2014	B1
8745516	Mason et al.	Jun 2014	B2
8781169	Jackson et al.	Jul 2014	B2
8787939	Papakipos et al.	Jul 2014	B2
8799799	Cervelli et al.	Aug 2014	B1
8799867	Peri-Glass et al.	Aug 2014	B1
8812960	Sun et al.	Aug 2014	B1
8830322	Nerayoff et al.	Sep 2014	B2
8832594	Thompson et al.	Sep 2014	B1
8838556	Reiner et al.	Sep 2014	B1
8855999	Elliot	Oct 2014	B1
8868537	Colgrove et al.	Oct 2014	B1
8903717	Elliot	Dec 2014	B2
8917274	Ma et al.	Dec 2014	B2
8924388	Elliot et al.	Dec 2014	B2
8924389	Elliot et al.	Dec 2014	B2
8924872	Bogomolov et al.	Dec 2014	B1
8930897	Nassar	Jan 2015	B2
8937619	Sharma et al.	Jan 2015	B2
8938434	Jain et al.	Jan 2015	B2
8938686	Erenrich et al.	Jan 2015	B1
8949164	Mohler	Feb 2015	B1
8954410	Chang et al.	Feb 2015	B2
9009171	Grossman et al.	Apr 2015	B1
9009827	Albertson et al.	Apr 2015	B1
9021260	Falk et al.	Apr 2015	B1
9021384	Beard et al.	Apr 2015	B1
9043696	Meiklejohn et al.	May 2015	B1
9043894	Dennison et al.	May 2015	B1
9069842	Melby	Jun 2015	B2
9092482	Harris et al.	Jul 2015	B2
9100428	Visbal	Aug 2015	B1
9116975	Shankar et al.	Aug 2015	B2
9129219	Robertson et al.	Sep 2015	B1
9146954	Boe et al.	Sep 2015	B1
9201920	Jain et al.	Dec 2015	B2
9208159	Stowe et al.	Dec 2015	B2
9223773	Isaacson	Dec 2015	B2
9229952	Meacham et al.	Jan 2016	B1
9230060	Friedlander et al.	Jan 2016	B2
9230280	Maag et al.	Jan 2016	B1
9280532	Cicerone	Mar 2016	B2
9576015	Tolnay et al.	Feb 2017	B1
20010056522	Satyanarayana	Dec 2001	A1
20020033848	Sciammarella et al.	Mar 2002	A1
20020065708	Senay et al.	May 2002	A1
20020091707	Keller	Jul 2002	A1
20020095360	Joao	Jul 2002	A1
20020095658	Shulman	Jul 2002	A1
20020103705	Brady	Aug 2002	A1
20020116120	Ruiz et al.	Aug 2002	A1
20020147805	Leshem et al.	Oct 2002	A1
20020174201	Ramer et al.	Nov 2002	A1
20020194058	Eldering	Dec 2002	A1
20020194119	Wright et al.	Dec 2002	A1
20030028560	Kudrolli et al.	Feb 2003	A1
20030036848	Sheha et al.	Feb 2003	A1
20030039948	Donahue et al.	Feb 2003	A1
20030074187	Ait-Mokhtar et al.	Apr 2003	A1
20030088438	Maughan et al.	May 2003	A1
20030126102	Borthwick	Jul 2003	A1
20030130993	Mendelevitch et al.	Jul 2003	A1
20030140106	Raguseo	Jul 2003	A1
20030144868	MacIntyre et al.	Jul 2003	A1
20030163352	Surpin et al.	Aug 2003	A1
20030171942	Gaito	Sep 2003	A1
20030172053	Fairweather	Sep 2003	A1
20030177112	Gardner	Sep 2003	A1
20030225755	Iwayama et al.	Dec 2003	A1
20030229848	Arend et al.	Dec 2003	A1
20040032432	Baynger	Feb 2004	A1
20040034570	Davis	Feb 2004	A1
20040044992	Muller et al.	Mar 2004	A1
20040064256	Barinek et al.	Apr 2004	A1
20040083466	Dapp et al.	Apr 2004	A1
20040085318	Hassler et al.	May 2004	A1
20040095349	Bito et al.	May 2004	A1
20040103124	Kupkova	May 2004	A1
20040111410	Burgoon et al.	Jun 2004	A1
20040111480	Yue	Jun 2004	A1
20040117387	Civetta et al.	Jun 2004	A1
20040126840	Cheng et al.	Jul 2004	A1
20040143602	Ruiz et al.	Jul 2004	A1
20040143796	Lerner et al.	Jul 2004	A1
20040153418	Hanweck	Aug 2004	A1
20040153837	Preston et al.	Aug 2004	A1
20040163039	Gorman	Aug 2004	A1
20040193600	Kaasten et al.	Sep 2004	A1
20040205524	Richter et al.	Oct 2004	A1
20040221223	Yu et al.	Nov 2004	A1
20040236688	Bozeman	Nov 2004	A1
20040260702	Cragun et al.	Dec 2004	A1
20040267746	Marcjan et al.	Dec 2004	A1
20050010472	Quatse et al.	Jan 2005	A1
20050027705	Sadri et al.	Feb 2005	A1
20050028094	Allyn	Feb 2005	A1
20050039119	Parks et al.	Feb 2005	A1
20050065811	Chu et al.	Mar 2005	A1
20050078858	Yao et al.	Apr 2005	A1
20050080769	Gemmell	Apr 2005	A1
20050086207	Heuer et al.	Apr 2005	A1
20050091420	Snover et al.	Apr 2005	A1
20050102328	Ring et al.	May 2005	A1
20050125715	Di Franco et al.	Jun 2005	A1
20050143602	Yada et al.	Jun 2005	A1
20050154628	Eckart et al.	Jul 2005	A1
20050154769	Eckart et al.	Jul 2005	A1
20050162523	Darrell et al.	Jul 2005	A1
20050166144	Gross	Jul 2005	A1
20050180330	Shapiro	Aug 2005	A1
20050182793	Keenan et al.	Aug 2005	A1
20050183005	Denoue et al.	Aug 2005	A1
20050210409	Jou	Sep 2005	A1
20050246327	Yeung et al.	Nov 2005	A1
20050251786	Citron et al.	Nov 2005	A1
20060026120	Carolan et al.	Feb 2006	A1
20060026170	Kreitler et al.	Feb 2006	A1
20060059139	Robinson	Mar 2006	A1
20060074881	Vembu et al.	Apr 2006	A1
20060080619	Carlson et al.	Apr 2006	A1
20060095521	Patinkin	May 2006	A1
20060106847	Eckardt et al.	May 2006	A1
20060129746	Porter	Jun 2006	A1
20060129992	Oberholtzer et al.	Jun 2006	A1
20060139375	Rasmussen et al.	Jun 2006	A1
20060142949	Helt	Jun 2006	A1
20060143034	Rothermel	Jun 2006	A1
20060143075	Carr et al.	Jun 2006	A1
20060143079	Basak et al.	Jun 2006	A1
20060149596	Surpin et al.	Jul 2006	A1
20060161558	Tamma et al.	Jul 2006	A1
20060184889	Molander	Aug 2006	A1
20060203337	White	Sep 2006	A1
20060209085	Wong et al.	Sep 2006	A1
20060218405	Ama et al.	Sep 2006	A1
20060218637	Thomas et al.	Sep 2006	A1
20060241974	Chao et al.	Oct 2006	A1
20060242040	Rader	Oct 2006	A1
20060242630	Koike et al.	Oct 2006	A1
20060271277	Hu et al.	Nov 2006	A1
20060271838	Carro	Nov 2006	A1
20060279630	Aggarwal et al.	Dec 2006	A1
20070000999	Kubo et al.	Jan 2007	A1
20070011150	Frank	Jan 2007	A1
20070011304	Error	Jan 2007	A1
20070016363	Huang et al.	Jan 2007	A1
20070038646	Thota	Feb 2007	A1
20070038962	Fuchs et al.	Feb 2007	A1
20070057966	Ohno et al.	Mar 2007	A1
20070074169	Chess et al.	Mar 2007	A1
20070078832	Ott et al.	Apr 2007	A1
20070078872	Cohen	Apr 2007	A1
20070083541	Fraleigh et al.	Apr 2007	A1
20070094389	Nussey et al.	Apr 2007	A1
20070112714	Fairweather	May 2007	A1
20070150369	Zivin	Jun 2007	A1
20070150801	Chidlovskii et al.	Jun 2007	A1
20070156673	Maga	Jul 2007	A1
20070174760	Chamberlain et al.	Jul 2007	A1
20070185850	Walters et al.	Aug 2007	A1
20070185867	Maga	Aug 2007	A1
20070192265	Chopin et al.	Aug 2007	A1
20070198571	Ferguson et al.	Aug 2007	A1
20070208497	Downs et al.	Sep 2007	A1
20070208498	Barker et al.	Sep 2007	A1
20070208736	Tanigawa et al.	Sep 2007	A1
20070233709	Abnous	Oct 2007	A1
20070240062	Christena et al.	Oct 2007	A1
20070266336	Nojima et al.	Nov 2007	A1
20070284433	Domenica et al.	Dec 2007	A1
20070294643	Kyle	Dec 2007	A1
20080034327	Cisler	Feb 2008	A1
20080040275	Paulsen et al.	Feb 2008	A1
20080040684	Crump	Feb 2008	A1
20080051989	Welsh	Feb 2008	A1
20080052142	Bailey et al.	Feb 2008	A1
20080069081	Chand et al.	Mar 2008	A1
20080077597	Butler	Mar 2008	A1
20080077642	Carbone et al.	Mar 2008	A1
20080103996	Forman et al.	May 2008	A1
20080104019	Nath	May 2008	A1
20080104060	Abhyankar et al.	May 2008	A1
20080104407	Horne et al.	May 2008	A1
20080126951	Sood et al.	May 2008	A1
20080140387	Linker	Jun 2008	A1
20080148398	Mezack et al.	Jun 2008	A1
20080155440	Trevor et al.	Jun 2008	A1
20080162616	Gross et al.	Jul 2008	A1
20080195417	Surpin et al.	Aug 2008	A1
20080195608	Clover	Aug 2008	A1
20080201339	McGrew	Aug 2008	A1
20080215546	Baum et al.	Sep 2008	A1
20080222295	Robinson et al.	Sep 2008	A1
20080228467	Womack et al.	Sep 2008	A1
20080243711	Aymeloglu et al.	Oct 2008	A1
20080249983	Meisels et al.	Oct 2008	A1
20080255973	El Wade et al.	Oct 2008	A1
20080263468	Cappione et al.	Oct 2008	A1
20080267107	Rosenberg	Oct 2008	A1
20080276167	Michael	Nov 2008	A1
20080278311	Grange et al.	Nov 2008	A1
20080281580	Zabokritski	Nov 2008	A1
20080288306	MacIntyre et al.	Nov 2008	A1
20080301643	Appleton et al.	Dec 2008	A1
20080313132	Hao et al.	Dec 2008	A1
20090002492	Velipasalar et al.	Jan 2009	A1
20090027418	Maru et al.	Jan 2009	A1
20090030915	Winter et al.	Jan 2009	A1
20090037417	Shankar et al.	Feb 2009	A1
20090055251	Shah et al.	Feb 2009	A1
20090076845	Bellin et al.	Mar 2009	A1
20090088964	Schaaf et al.	Apr 2009	A1
20090094166	Aymeloglu et al.	Apr 2009	A1
20090106178	Chu	Apr 2009	A1
20090112745	Stefanescu	Apr 2009	A1
20090119309	Gibson et al.	May 2009	A1
20090125359	Knapic	May 2009	A1
20090125369	Kloosstra et al.	May 2009	A1
20090125459	Norton et al.	May 2009	A1
20090132921	Hwangbo et al.	May 2009	A1
20090132953	Reed et al.	May 2009	A1
20090143052	Bates et al.	Jun 2009	A1
20090144262	White et al.	Jun 2009	A1
20090144274	Fraleigh et al.	Jun 2009	A1
20090150854	Elaalsar et al.	Jun 2009	A1
20090164934	Bhattiprolu et al.	Jun 2009	A1
20090171939	Athsani et al.	Jul 2009	A1
20090172511	Decherd et al.	Jul 2009	A1
20090172669	Bobak et al.	Jul 2009	A1
20090172821	Daira et al.	Jul 2009	A1
20090177962	Gusmorino et al.	Jul 2009	A1
20090179892	Tsuda et al.	Jul 2009	A1
20090187464	Bai et al.	Jul 2009	A1
20090187546	Whyte et al.	Jul 2009	A1
20090187548	Ji et al.	Jul 2009	A1
20090199047	Vaitheeswaran et al.	Aug 2009	A1
20090222400	Kupershmidt et al.	Sep 2009	A1
20090222760	Halverson et al.	Sep 2009	A1
20090228507	Jain et al.	Sep 2009	A1
20090234720	George et al.	Sep 2009	A1
20090240664	Dinker et al.	Sep 2009	A1
20090249244	Robinson et al.	Oct 2009	A1
20090254970	Agarwal et al.	Oct 2009	A1
20090254971	Herz	Oct 2009	A1
20090271343	Vaiciulis et al.	Oct 2009	A1
20090281839	Lynn et al.	Nov 2009	A1
20090282097	Alberti et al.	Nov 2009	A1
20090287470	Farnsworth et al.	Nov 2009	A1
20090292626	Oxford	Nov 2009	A1
20090307049	Elliott et al.	Dec 2009	A1
20090310816	Freire et al.	Dec 2009	A1
20090313463	Pang et al.	Dec 2009	A1
20090319418	Herz	Dec 2009	A1
20090319891	MacKinlay	Dec 2009	A1
20090327208	Bittner et al.	Dec 2009	A1
20100011282	Dollard et al.	Jan 2010	A1
20100030722	Goodson et al.	Feb 2010	A1
20100031141	Summers et al.	Feb 2010	A1
20100042922	Bradateanu et al.	Feb 2010	A1
20100057622	Faith et al.	Mar 2010	A1
20100057716	Stefik et al.	Mar 2010	A1
20100070489	Aymeloglu et al.	Mar 2010	A1
20100070523	Delgo et al.	Mar 2010	A1
20100070842	Aymeloglu et al.	Mar 2010	A1
20100070845	Facemire et al.	Mar 2010	A1
20100070897	Aymeloglu et al.	Mar 2010	A1
20100082532	Shaik et al.	Apr 2010	A1
20100098318	Anderson	Apr 2010	A1
20100100963	Mahaffey	Apr 2010	A1
20100103124	Kruzeniski et al.	Apr 2010	A1
20100114629	Adler et al.	May 2010	A1
20100114887	Conway et al.	May 2010	A1
20100122152	Chamberlain et al.	May 2010	A1
20100125470	Chisholm	May 2010	A1
20100131457	Heimendinger	May 2010	A1
20100131502	Fordham	May 2010	A1
20100161735	Sharma	Jun 2010	A1
20100162176	Dunton	Jun 2010	A1
20100191563	Schlaifer et al.	Jul 2010	A1
20100198684	Eraker et al.	Aug 2010	A1
20100199225	Coleman et al.	Aug 2010	A1
20100204983	Chung et al.	Aug 2010	A1
20100211550	Daniello et al.	Aug 2010	A1
20100228786	Torok	Sep 2010	A1
20100228812	Uomini	Sep 2010	A1
20100235915	Memon et al.	Sep 2010	A1
20100250412	Wagner	Sep 2010	A1
20100257015	Molander	Oct 2010	A1
20100257515	Bates	Oct 2010	A1
20100262688	Hussain et al.	Oct 2010	A1
20100280857	Liu et al.	Nov 2010	A1
20100293174	Bennett et al.	Nov 2010	A1
20100306285	Shah et al.	Dec 2010	A1
20100306713	Geisner et al.	Dec 2010	A1
20100312837	Bodapati et al.	Dec 2010	A1
20100313119	Baldwin et al.	Dec 2010	A1
20100318838	Katano et al.	Dec 2010	A1
20100318924	Frankel et al.	Dec 2010	A1
20100321399	Ellren et al.	Dec 2010	A1
20100325526	Ellis et al.	Dec 2010	A1
20100325581	Finkelstein et al.	Dec 2010	A1
20100330801	Rouh	Dec 2010	A1
20110004498	Readshaw	Jan 2011	A1
20110029526	Knight et al.	Feb 2011	A1
20110047159	Baid et al.	Feb 2011	A1
20110047540	Williams et al.	Feb 2011	A1
20110060753	Shaked et al.	Mar 2011	A1
20110061013	Bilicki et al.	Mar 2011	A1
20110074811	Hanson et al.	Mar 2011	A1
20110078055	Faribault et al.	Mar 2011	A1
20110078173	Seligmann et al.	Mar 2011	A1
20110093327	Fordyce, III et al.	Apr 2011	A1
20110099133	Chang et al.	Apr 2011	A1
20110117878	Barash et al.	May 2011	A1
20110119100	Ruhl et al.	May 2011	A1
20110131547	Elaasar	Jun 2011	A1
20110137766	Rasmussen et al.	Jun 2011	A1
20110153384	Horne et al.	Jun 2011	A1
20110153592	DeMarcken	Jun 2011	A1
20110161096	Buehler et al.	Jun 2011	A1
20110161132	Goel et al.	Jun 2011	A1
20110170799	Carrino et al.	Jul 2011	A1
20110173032	Payne et al.	Jul 2011	A1
20110173093	Psota et al.	Jul 2011	A1
20110181598	O'Neall et al.	Jul 2011	A1
20110185316	Reid et al.	Jul 2011	A1
20110208565	Ross et al.	Aug 2011	A1
20110208724	Jones et al.	Aug 2011	A1
20110213655	Henkin	Sep 2011	A1
20110213791	Jain et al.	Sep 2011	A1
20110218934	Elser	Sep 2011	A1
20110218955	Tang	Sep 2011	A1
20110219321	Gonzalez et al.	Sep 2011	A1
20110219450	McDougal et al.	Sep 2011	A1
20110225198	Edwards et al.	Sep 2011	A1
20110238553	Raj et al.	Sep 2011	A1
20110258158	Resende et al.	Oct 2011	A1
20110258216	Supakkul et al.	Oct 2011	A1
20110270604	Qi et al.	Nov 2011	A1
20110270705	Parker	Nov 2011	A1
20110270834	Sokolan et al.	Nov 2011	A1
20110289397	Eastmond et al.	Nov 2011	A1
20110289407	Naik et al.	Nov 2011	A1
20110289420	Morioka et al.	Nov 2011	A1
20110291851	Whisenant	Dec 2011	A1
20110295649	Fine	Dec 2011	A1
20110295795	Venkatasubramanian et al.	Dec 2011	A1
20110310005	Chen et al.	Dec 2011	A1
20110314007	Dassa et al.	Dec 2011	A1
20110314024	Chang et al.	Dec 2011	A1
20120004904	Shin et al.	Jan 2012	A1
20120011238	Rathod	Jan 2012	A1
20120011245	Gillette et al.	Jan 2012	A1
20120019559	Siler et al.	Jan 2012	A1
20120022945	Falkenborg et al.	Jan 2012	A1
20120036013	Neuhaus et al.	Feb 2012	A1
20120036434	Oberstein	Feb 2012	A1
20120050293	Carlhian et al.	Mar 2012	A1
20120054284	Rakshit	Mar 2012	A1
20120059853	Jagota	Mar 2012	A1
20120066166	Curbera et al.	Mar 2012	A1
20120066296	Appleton et al.	Mar 2012	A1
20120072825	Sherkin et al.	Mar 2012	A1
20120075324	Cardno et al.	Mar 2012	A1
20120079363	Folting et al.	Mar 2012	A1
20120084117	Tavares et al.	Apr 2012	A1
20120084118	Bai et al.	Apr 2012	A1
20120084287	Lakshminarayan et al.	Apr 2012	A1
20120102006	Larson	Apr 2012	A1
20120106801	Jackson	May 2012	A1
20120117082	Koperda et al.	May 2012	A1
20120123989	Yu et al.	May 2012	A1
20120124179	Cappio et al.	May 2012	A1
20120131512	Takeuchi et al.	May 2012	A1
20120137235	Ts et al.	May 2012	A1
20120144335	Abeln et al.	Jun 2012	A1
20120159307	Chung et al.	Jun 2012	A1
20120159362	Brown et al.	Jun 2012	A1
20120159399	Bastide et al.	Jun 2012	A1
20120170847	Tsukidate	Jul 2012	A1
20120173381	Smith	Jul 2012	A1
20120173985	Peppel	Jul 2012	A1
20120191446	Binsztok et al.	Jul 2012	A1
20120196557	Reich et al.	Aug 2012	A1
20120196558	Reich et al.	Aug 2012	A1
20120197651	Robinson et al.	Aug 2012	A1
20120203708	Psota et al.	Aug 2012	A1
20120208636	Feige	Aug 2012	A1
20120215784	King et al.	Aug 2012	A1
20120221511	Gibson et al.	Aug 2012	A1
20120221553	Wittmer et al.	Aug 2012	A1
20120221580	Barney	Aug 2012	A1
20120226523	Weiss	Sep 2012	A1
20120245976	Kumar et al.	Sep 2012	A1
20120246148	Dror	Sep 2012	A1
20120254129	Wheeler et al.	Oct 2012	A1
20120284345	Costenaro et al.	Nov 2012	A1
20120290527	Yalamanchilli	Nov 2012	A1
20120290879	Shibuya et al.	Nov 2012	A1
20120296907	Long et al.	Nov 2012	A1
20120304150	Leithead et al.	Nov 2012	A1
20120311684	Paulsen et al.	Dec 2012	A1
20120323888	Osann, Jr.	Dec 2012	A1
20120330973	Ghuneim et al.	Dec 2012	A1
20130006426	Healey et al.	Jan 2013	A1
20130006725	Simanek et al.	Jan 2013	A1
20130006916	McBride et al.	Jan 2013	A1
20130006947	Akinyemi et al.	Jan 2013	A1
20130016106	Yip et al.	Jan 2013	A1
20130018796	Kolhatkar et al.	Jan 2013	A1
20130024268	Manickavelu	Jan 2013	A1
20130024731	Shochat et al.	Jan 2013	A1
20130046635	Grigg et al.	Feb 2013	A1
20130046842	Muntz et al.	Feb 2013	A1
20130050217	Armitage	Feb 2013	A1
20130054306	Bhalla	Feb 2013	A1
20130057551	Ebert et al.	Mar 2013	A1
20130060742	Chang et al.	Mar 2013	A1
20130060786	Serrano et al.	Mar 2013	A1
20130061169	Pearcy et al.	Mar 2013	A1
20130073377	Heath	Mar 2013	A1
20130073454	Busch	Mar 2013	A1
20130078943	Biage et al.	Mar 2013	A1
20130086482	Parsons	Apr 2013	A1
20130091084	Lee	Apr 2013	A1
20130096988	Grossman et al.	Apr 2013	A1
20130097130	Bingo et al.	Apr 2013	A1
20130097482	Marantz et al.	Apr 2013	A1
20130101159	Chao et al.	Apr 2013	A1
20130110746	Ahn	May 2013	A1
20130110822	Ikeda et al.	May 2013	A1
20130110877	Bonham et al.	May 2013	A1
20130111320	Campbell et al.	May 2013	A1
20130117011	Ahmed et al.	May 2013	A1
20130117651	Waldman et al.	May 2013	A1
20130124193	Holmberg	May 2013	A1
20130150004	Rosen	Jun 2013	A1
20130151148	Parundekar et al.	Jun 2013	A1
20130151388	Falkenborg et al.	Jun 2013	A1
20130151453	Bhanot et al.	Jun 2013	A1
20130157234	Gulli et al.	Jun 2013	A1
20130166348	Scotto	Jun 2013	A1
20130166480	Popescu et al.	Jun 2013	A1
20130166550	Buchmann et al.	Jun 2013	A1
20130176321	Mitchell et al.	Jul 2013	A1
20130179420	Park et al.	Jul 2013	A1
20130185245	Anderson	Jul 2013	A1
20130185307	El-Yaniv et al.	Jul 2013	A1
20130198565	Mancoridis et al.	Aug 2013	A1
20130224696	Wolfe et al.	Aug 2013	A1
20130225212	Khan	Aug 2013	A1
20130226318	Procyk	Aug 2013	A1
20130226879	Talukder et al.	Aug 2013	A1
20130226953	Markovich et al.	Aug 2013	A1
20130238616	Rose et al.	Sep 2013	A1
20130246170	Gross et al.	Sep 2013	A1
20130246316	Zhao et al.	Sep 2013	A1
20130246537	Gaddala	Sep 2013	A1
20130246560	Feng et al.	Sep 2013	A1
20130246597	Iizawa et al.	Sep 2013	A1
20130251233	Yang et al.	Sep 2013	A1
20130262403	Milousheff	Oct 2013	A1
20130262527	Hunter et al.	Oct 2013	A1
20130263019	Castellanos et al.	Oct 2013	A1
20130267207	Hao et al.	Oct 2013	A1
20130268520	Fisher et al.	Oct 2013	A1
20130275446	Jain et al.	Oct 2013	A1
20130279757	Kephart	Oct 2013	A1
20130282696	John et al.	Oct 2013	A1
20130290011	Lynn et al.	Oct 2013	A1
20130290825	Arndt et al.	Oct 2013	A1
20130297619	Chandrasekaran et al.	Nov 2013	A1
20130304770	Boero et al.	Nov 2013	A1
20130311375	Priebatsch	Nov 2013	A1
20140012796	Petersen et al.	Jan 2014	A1
20140019423	Leinsberger et al.	Jan 2014	A1
20140019936	Cohanoff	Jan 2014	A1
20140032506	Hoey et al.	Jan 2014	A1
20140033010	Richardt et al.	Jan 2014	A1
20140040371	Gurevich et al.	Feb 2014	A1
20140047319	Eberlein	Feb 2014	A1
20140047357	Alfaro et al.	Feb 2014	A1
20140058914	Song et al.	Feb 2014	A1
20140059038	McPherson et al.	Feb 2014	A1
20140067611	Adachi et al.	Mar 2014	A1
20140068487	Steiger et al.	Mar 2014	A1
20140095273	Tang et al.	Apr 2014	A1
20140095509	Patton	Apr 2014	A1
20140108068	Williams	Apr 2014	A1
20140108380	Gotz et al.	Apr 2014	A1
20140108985	Scott et al.	Apr 2014	A1
20140123279	Bishop et al.	May 2014	A1
20140129261	Bothwell et al.	May 2014	A1
20140136285	Carvalho	May 2014	A1
20140143009	Brice et al.	May 2014	A1
20140149436	Bahrami et al.	May 2014	A1
20140156527	Grigg et al.	Jun 2014	A1
20140156617	Tomkins	Jun 2014	A1
20140157172	Peery et al.	Jun 2014	A1
20140164502	Khodorenko et al.	Jun 2014	A1
20140181833	Bird et al.	Jun 2014	A1
20140189536	Lange et al.	Jul 2014	A1
20140195515	Baker et al.	Jul 2014	A1
20140195887	Ellis et al.	Jul 2014	A1
20140222521	Chait	Aug 2014	A1
20140222793	Sadkin et al.	Aug 2014	A1
20140229554	Grunin et al.	Aug 2014	A1
20140244388	Manouchehri et al.	Aug 2014	A1
20140258246	Lo Faro et al.	Sep 2014	A1
20140267294	Ma	Sep 2014	A1
20140267295	Sharma	Sep 2014	A1
20140279824	Tamayo	Sep 2014	A1
20140279979	Yost et al.	Sep 2014	A1
20140310266	Greenfield	Oct 2014	A1
20140316911	Gross	Oct 2014	A1
20140324876	Konik et al.	Oct 2014	A1
20140324929	Mason	Oct 2014	A1
20140333651	Cervelli et al.	Nov 2014	A1
20140337772	Cervelli et al.	Nov 2014	A1
20140344230	Krause et al.	Nov 2014	A1
20140351070	Christner et al.	Nov 2014	A1
20140358829	Hurwitz	Dec 2014	A1
20140366132	Stiansen et al.	Dec 2014	A1
20150012509	Kirn	Jan 2015	A1
20150019394	Unser et al.	Jan 2015	A1
20150039886	Kahol et al.	Feb 2015	A1
20150046481	Elliot	Feb 2015	A1
20150046870	Goldenberg et al.	Feb 2015	A1
20150073929	Psota et al.	Mar 2015	A1
20150073954	Braff	Mar 2015	A1
20150089353	Folkening	Mar 2015	A1
20150089424	Duffield et al.	Mar 2015	A1
20150095773	Gonsalves et al.	Apr 2015	A1
20150100559	Nassar	Apr 2015	A1
20150100897	Sun et al.	Apr 2015	A1
20150100907	Erenrich et al.	Apr 2015	A1
20150106379	Elliot et al.	Apr 2015	A1
20150112641	Faraj	Apr 2015	A1
20150112998	Shankar	Apr 2015	A1
20150134666	Gattiker et al.	May 2015	A1
20150135256	Hoy et al.	May 2015	A1
20150142766	Jain et al.	May 2015	A1
20150169709	Kara et al.	Jun 2015	A1
20150169726	Kara et al.	Jun 2015	A1
20150170077	Kara et al.	Jun 2015	A1
20150178877	Bogomolov et al.	Jun 2015	A1
20150186821	Wang et al.	Jul 2015	A1
20150187036	Wang et al.	Jul 2015	A1
20150188715	Castelluci et al.	Jul 2015	A1
20150188872	White	Jul 2015	A1
20150212663	Papale et al.	Jul 2015	A1
20150213043	Ishii et al.	Jul 2015	A1
20150213134	Nie et al.	Jul 2015	A1
20150242397	Zhuang	Aug 2015	A1
20150261817	Harris et al.	Sep 2015	A1
20150261847	Ducott et al.	Sep 2015	A1
20150324868	Kaftan et al.	Nov 2015	A1
20150338233	Cervelli et al.	Nov 2015	A1
20150341467	Lim et al.	Nov 2015	A1
20150347903	Saxena et al.	Dec 2015	A1
20150378996	Kesin et al.	Dec 2015	A1
20150379413	Robertson et al.	Dec 2015	A1
20160004667	Chakerian et al.	Jan 2016	A1
20160004764	Chakerian et al.	Jan 2016	A1
20160034545	Shankar et al.	Feb 2016	A1
20160062555	Ward et al.	Mar 2016	A1
20160098173	Slawinski et al.	Apr 2016	A1
20160147730	Cicerone	May 2016	A1
20160179828	Ellis	Jun 2016	A1
20170039253	Bond	Feb 2017	A1
20170068698	Tolnay et al.	Mar 2017	A1
20170083595	Tolnay et al.	Mar 2017	A1

Foreign Referenced Citations (79)

Number	Date	Country
2014206155	Dec 2015	AU
2014250678	Feb 2016	AU
2666364	Jan 2015	CA
102546446	Jul 2012	CN
103167093	Jun 2013	CN
102054015	May 2014	CN
102014103482	Sep 2014	DE
102014204827	Sep 2014	DE
102014204830	Sep 2014	DE
102014204834	Sep 2014	DE
102014204840	Sep 2014	DE
102014213036	Jan 2015	DE
102014215621	Feb 2015	DE
0652513	May 1995	EP
1566758	Aug 2005	EP
1672527	Jun 2006	EP
1962222	Aug 2008	EP
2221725	Aug 2010	EP
2487610	Aug 2012	EP
2551799	Jan 2013	EP
2560134	Feb 2013	EP
2778913	Sep 2014	EP
2778914	Sep 2014	EP
2778977	Sep 2014	EP
2778986	Sep 2014	EP
2835745	Feb 2015	EP
2835770	Feb 2015	EP
2838039	Feb 2015	EP
2846241	Mar 2015	EP
2851852	Mar 2015	EP
2858014	Apr 2015	EP
2858018	Apr 2015	EP
2863326	Apr 2015	EP
2863346	Apr 2015	EP
2869211	May 2015	EP
2881868	Jun 2015	EP
2884439	Jun 2015	EP
2884440	Jun 2015	EP
2889814	Jul 2015	EP
2891992	Jul 2015	EP
2892197	Jul 2015	EP
2897051	Jul 2015	EP
2911078	Aug 2015	EP
2963595	Jan 2016	EP
2993595	Mar 2016	EP
29993595	Mar 2016	EP
3018553	May 2016	EP
3128447	Feb 2017	EP
3142027	Mar 2017	EP
2366498	Mar 2002	GB
2513007	Oct 2014	GB
2516155	Jan 2015	GB
2517582	Feb 2015	GB
2518745	Apr 2015	GB
2012778	Nov 2014	NL
2013134	Jan 2015	NL
2013306	Feb 2015	NL
2011642	Aug 2015	NL
624557	Dec 2014	NZ
WO 2000009529	Feb 2000	WO
WO 2002035376	May 2002	WO
WO 2002065353	Aug 2002	WO
WO 2003060751	Jul 2003	WO
WO 2005010685	Feb 2005	WO
WO 2005104736	Nov 2005	WO
WO 2005116851	Dec 2005	WO
WO 2008064207	May 2008	WO
WO 2009061501	May 2009	WO
WO 2010000014	Jan 2010	WO
WO 2010030913	Mar 2010	WO
WO 20100098958	Sep 2010	WO
WO 2011017289	May 2011	WO
WO 2011071833	Jun 2011	WO
WO 2012025915	Mar 2012	WO
WO 2012079836	Jun 2012	WO
WO2012079836	Jun 2012	WO
WO 2013010157	Jan 2013	WO
WO 2013067077	May 2013	WO
WO 2013102892	Jul 2013	WO

Non-Patent Literature Citations (395)

Entry
U.S. Appl. No. 13/196,788, filed Aug. 2, 2011, Notice of Allowance, dated Nov. 25, 2015.
U.S. Appl. No. 14/746,671, filed Jun. 22, 2015, First Office Action Interview, dated Sep. 28, 2015.
U.S. Appl. No. 14/734,772, filed Jun. 9, 2015, First Office Action Interview, dated Oct. 30, 2015.
U.S. Appl. No. 14/278,963, filed May 15, 2014, Notice of Allowance, dated Sep. 2, 2015.
U.S. Appl. No. 14/578,389, filed Dec. 20, 2014, Office Action, dated Apr. 22, 2016.
U.S. Appl. No. 14/734,772, filed Jun. 9, 2015, Notice of Allowance, dated Apr. 27, 2016.
U.S. Appl. No. 14/746,671, filed Jun. 22, 2015, Notice of Allowance, dated Jan. 21, 2016.
U.S. Appl. No. 13/196,788, filed Aug. 2, 2011, Notice of Allowance, dated Dec. 18, 2015.
U.S. Appl. No. 14/996,179, filed Jan. 14, 2016, First Office Action Interview, dated May 20, 2016.
U.S. Appl. No. 14/961,830, filed Dec. 7, 2015, Office Action, dated May 20, 2016.
U.S. Appl. No. 14/849,454, filed Sep. 9, 2015, Notice of Allowance, dated May 25, 2016.
U.S. Appl. No. 14/841,338, filed Aug. 31, 2015, Office Action, dated Feb. 18, 2016.
U.S. Appl. No. 14/726,211, filed May 29, 2015, Office Action, dated Apr. 5, 2016.
Official Communication for European Patent Application No. 15183721.8 dated Nov. 23, 2015.
Official Communication for European Patent Application No. 14159629.6 dated Jul. 31, 2014.
Official Communication for Australian Patent Application No. 2014201580 dated Feb. 27, 2015.
Sirotkin et al., “Chapter 13: The Processing of Biological Sequence Data at NCBI,” The NCBI Handbook, Oct. 2002, pp. 1-11.
Official Communication for New Zealand Patent Application No. 622414 dated Mar. 24, 2014.
Delcher et al., “Identifying Bacterial Genes and Endosymbiont DNA with Glimmer,” BioInformatics, vol. 23, No. 6, 2007, pp. 673-679.
Madden, Tom, “Chapter 16: The BLAST Sequence Analysis Tool,” The NCBI Handbook, Oct. 2002, pp. 1-15.
Kahan et al., “Annotea: An Open RDF Infrastructure for Shared Web Annotations”, Computer Networks, Elsevier Science Publishers B.V., vol. 39, No. 5, dated Aug. 5, 2002.
Mizrachi, Ilene, “Chapter 1: GenBank: The Nuckeotide Sequence Database,” The NCBI Handbook, Oct. 2002, pp. 1-14.
“A Tour of Pinboard,” <http://pinboard.in/tour> as printed May 15, 2014 in 6 pages.
Official Communication for Great Britain Patent Application No. 1404574.4 dated Dec. 18, 2014.
Official Communication for New Zealand Patent Application No. 622484 dated Apr. 2, 2014.
Kitts, Paul, “Chapter 14: Genome Assembly and Annotation Process,” The NCBI Handbook, Oct. 2002, pp. 1-21.
“A Quick Guide to UniProtKB Swiss-Prot & TrEMBL,” Sep. 2011, pp. 2.
“The FASTA Program Package,” fasta-36.3.4, Mar. 25, 2011, pp. 29.
Wollrath et al., “A Distributed Object Model for the Java System,” Proceedings of the 2nd Conference on USENEX, Conference on Object-Oriented Technologies (COOTS), 17.
Official Communication for European Patent Application No. 14180321.3 dated Apr. 17, 2015.
Official Communication for European Patent Application No. 14187996.5 dated Feb. 12, 2015.
Official Communication for New Zealand Patent Application No. 628263 dated Aug. 12, 2014.
Palantir, “Extracting and Transforming Data with Kite,” Palantir Technologies, Inc., Copyright 2010, pp. 38.
Palermo, Christopher J., “Memorandum,” [Disclosure relating to U.S. Appl. No. 13/916,447, filed Jun. 12, 2013, and related applications], Jan. 31, 2014 in 3 pages.
Official Communication for European Patent Application No. 14158958.0 dated Jun. 3, 2014.
Palmas et al., “An Edge-Bunding Layout for Interactive Parallel Coordinates” 2014 IEEE Pacific Visualization Symposium, pp. 57-64.
Vose et al., “Help File for ModelRisk Version 5,” 2007, Vose Software, pp. 349-353. [Uploaded in 2 Parts].
Official Communication for New Zealand Patent Application No. 622473 dated Jun. 19, 2014.
Manske, “File Saving Dialogs,” <http://www.mozilla.org/editor/ui_specs/FileSaveDialogs.html>, Jan. 20, 1999, pp. 7.
Johnson, Maggie, “Introduction to YACC and Bison”.
Gorr et al., “Crime Hot Spot Forecasting: Modeling and Comparative Evaluation”, Grant 98-1J-CX-K005, May 6, 2002, 37 pages.
Official Communication for European Patent Application No. 15192965.0 dated Mar. 17, 2016.
Apsalar, “Data Powered Mobile Advertising,” “Free Mobile App Analytics” and various analytics related screen shots <http://apsalar.com> Printed Jul. 18, 2013 in 8 pages.
Official Communication for Great Britain Patent Application No. 1404499.4 dated Jun. 11, 2015.
Chen et al., “Bringing Order to the Web: Automatically Categorizing Search Results,” CHI 2000, Proceedings of the SIGCHI conference on Human Factors in Computing Systems, Apr. 1-6, 2000, The Hague, The Netherlands, pp. 145-152.
Keylines.com, “An Introduction to KeyLines and Network Visualization,” Mar. 2014, <http://keylines.com/wp-content/uploads/2014/03/KeyLines-White-Paper.pdf> downloaded May 12, 2014 in 8 pages.
Morrison et al., “Converting Users to Testers: An Alternative Approach to Load Test Script Creation, Parameterization and Data Corellation,” CCSC: Southeastern Conference, JCSC 28, Dec. 2, 2012, pp. 188-196.
Yang et al., “HTML Page Analysis Based on Visual Cues”, A129, pp. 859-864, 2001.
Li et al., “Interactive Multimodal Visual Search on Mobile Device,” IEEE Transactions on Multimedia, vol. 15, No. 3, Apr. 1, 2013, pp. 594-607.
Official Communication for Netherlands Patent Application No. 2012421 dated Sep. 18, 2015.
Gesher, Ari, “Palantir Screenshots in the Wild: Swing Sightings,” The Palantir Blog, Sep. 11, 2007, pp. 1-12, retrieved from the internet https://www.palantir.com/2007/09/palantir-screenshots/ retrieved on Aug. 18, 2015.
Official Communication for European Patent Application No. 14158958.0 dated Apr. 16, 2015.
Official Communication for Australian Patent Application No. 2014210604 dated Jun. 5, 2015.
Official Communication for European Patent Application No. 16182336.4 dated Dec. 23, 2016.
Official Communication for European Patent Application No. 15181419.1 dated Sep. 29, 2015.
Official Communication in New Zealand Application No. 627962 dated Aug. 5, 2014.
Official Communication for European Patent Application No. 14158977.0 dated Apr. 16, 2015.
Official Communication for Great Britain Patent Application No. 1404479.6 dated Aug. 12, 2014.
Olanoff, Drew, “Deep Dive with the New Google Maps for Desktop with Google Earth Integration, It's More than Just a Utility,” May 15, 2013, pp. 1-6, retrieved from the internet: http://web.archive.org/web/20130515230641/http://techcrunch.com/2013/05/15/deep-dive-with-the-new-google-maps-for-desktop-with-google-earth-integration-its-more-than-just-a-utility/.
Official Communication for European Patent Application No. 14187739.9 dated Jul. 6, 2015.
Official Communication for European Patent Application No. 15184764.7 dated Dec. 14, 2015.
Griffith, Daniel A., “A Generalized Huff Model,” Geographical Analysis, Apr. 1982, vol. 14, No. 2, pp. 135-144.
Official Communication for European Patent Application No. 14197938.5 dated Apr. 28, 2015.
Map of San Jose, CA. Retrieved Oct. 2, 2013 from http://maps.bing.com.
Official Communication for Great Britain Patent Application No. 1404489.5 dated May 21, 2015.
Niepert et al., “A Dynamic Ontology for a Dynamic Reference Work”, Joint Conference on Digital Libraries, Jun. 17-22, 2007, Vancouver, British Columbia, Canada, pp. 1-10.
Official Communication for European Patent Application No. 14200246.8 dated May 29, 2015.
Official Communciation for Australian Patent Application No. 2014201506 dated Feb. 27, 2015.
Palantir Technologies, “Palantir Labs—Timeline,” Oct. 1, 2010, retrieved from the internet https://www.youtube.com/watch?v=JCgDW5bru9M retrieved on Aug. 19, 2015.
Official Communication for Australian Patent Application No. 2014213553 dated May 7, 2015.
Official Communication in New Zealand Application No. 628840 dated Aug. 28, 2014.
Official Communication for European Patent Application No. 14158977.0 dated Jun. 10, 2014.
Valentini et al., “Ensembles of Learning Machines”, M. Marinaro and R. Tagliaferri (Eds.): Wirn Vietri 2002, LNCS 2486, pp. 3-20.
Anonymous, “BackTult—JD Edwards One World Version Control System,” printed Jul. 23, 2007 in 1 page.
Ananiev et al., “The New Modality API,” http://web.archive.org/web/20061211011958/http://java.sun.com/developer/technicalArticles/J2SE/Desktop/javase6/modality/ Jan. 21, 2006, pp. 8.
Chung, Chin-Wan, “Dataplex: An Access to Heterogeneous Distributed Databases,” Communications of the ACM, Association for Computing Machinery, Inc., vol. 33, No. 1, Jan. 1, 1990, pp. 70-80.
“Potential Money Laundering Warning Signs,” snapshot taken 2003, https://web.archive.org/web/20030816090055/http:/finsolinc.com/ANTI-MONEY%20LAUNDERING%20TRAINING%20GUIDES.pdf.
Official Communication for Great Britain Application No. 1404457.2 dated Aug. 14, 2014.
GIS-NET 3 Public_Department of Regional Planning. Planning & Zoning Information for Unincorporated LA County. Retrieved Oct. 2, 2013 from http://gis.planning.lacounty.gov/GIS-NET3_Public/Viewer.html.
Official Communication for Australian Patent Application No. 2014203669 dated May 29, 2015.
Delicious, <http://delicious.com/> as printed May 15, 2014 in 1 page.
Definition “Overlay”, downloaded Jan. 22, 2015, 1 page.
Official Communication for Great Britain Patent Application No. 1413935.6 dated Dec. 21, 2015.
Manno et al., “Introducing Collaboration in Single-user Applications through the Centralized Control Architecture,” 2010, pp. 10.
Kokossi et al., “D7-Dynamic Ontology Management System (Design)”, Information Societies Technology Programme, pp. 1-27.
Official Communication for Netherlands Patent Application No. 2013134 dated Apr. 20, 2015.
Wang et al., “Research on a Clustering Data De-Duplication Mechanism Based on Bloom Filter,” IEEE 2010, 5 pages.
Pythagoras Communications Ltd., “Microsoft CRM Duplicate Detection,” Sep. 13, 2011, https://www.youtube.com/watch?v=j-7Qis0D0Kc.
Celik, Tantek, “CSS Basic User Interface Module Level 3 (CSS3 UI),” Section 8 Resizing and Overflow, Jan. 17, 2012, retrieved from internet http://www.w3.org/TR/2012/WD-c553-ui-20120117/#resizing-amp-overflow retrieved on May 18, 2015.
Liu, Tianshun, “Combining GIS and the Huff Model to Analyze Suitable Locations for a New Asian Supermarket in the Minneapolis and St. Paul, Minnesota USA,” Papers in Resource Analysis, 2012, vol. 14, pp. 8.
Official Communication for New Zealand Patent Application No. 622517 dated Apr. 3, 2014.
Keylines.com, “Visualizing Threats: Improved Cyber Security Through Network Visualization,” Apr. 2014, <http://keylines.com/wp-content/uploads/2014/04/Visualizing-Threats1.pdf> downloaded May 12, 2014 in 10 pages.
Official Communication for Great Britain Patent Application No. 1404499.4 dated Sep. 29, 2014.
Keylines.com, “KeyLines Datasheet,” Mar. 2014, <http://keylines.com/wp-content/uploads/2014/03/KeyLines-datasheet.pdf> downloaded May 12, 2014 in 2 pages.
“HunchLab: Heat Map and Kernel Density Calculation for Crime Analysis,” Azavea Journal, printed from www.azavea.com/blogs/newsletter/v4i4/kernel-density-capabilities-added-to-hunchlab/ on Sep. 9, 2014, 2 pages.
Thompson, Mick, “Getting Started with GEO,” Getting Started with GEO, Jul. 26, 2011.
Official Communication for European Patent Application No. 14180432.8 dated Jun. 23, 2015.
Official Communication for New Zealand Patent Application No. 628495 dated Aug. 19, 2014.
Umagandhi et al., “Search Query Recommendations Using Hybrid User Profile with Query Logs,” International Journal of Computer Applications, vol. 80, No. 10, Oct. 1, 2013, pp. 7-18.
Official Communication for New Zealand Patent Application No. 622513 dated Apr. 3, 2014.
Official Communication for Great Britain Patent Application No. 1404486.1 dated Aug. 27, 2014.
Hogue et al., “Thresher: Automating the Unwrapping of Semantic Content from the World Wide Web,” 14th International Conference on World Wide Web, WWW 2005: Chiba, Japan, May 10-14, 2005, pp. 86-95.
Notice of Acceptance for Australian Patent Application No. 2014203669 dated Jan. 21, 2016.
Official Communication for Great Britain Patent Application No. 1404499.4 dated Aug. 20, 2014.
Official Communication for Great Britain Patent Application No. 1404489.5 dated Oct. 6, 2014.
Wright et al., “Palantir Technologies VAST 2010 Challenge Text Records—Investigations into Arms Dealing,” Oct. 29, 2010, pp. 1-10, retrieved from the internet http://hcil2.cs.umd.edu/newvarepository/VAST%20Challenge%202010/challenges/MC1%20-%20Investigations%20into%20Arms%20Dealing/entries/Palantir%20Technologies/ retrieved on Aug. 20, 2015.
Hibbert et al., “Prediction of Shopping Behavior Using a Huff Model Within a GIS Framework,” Healthy Eating in Context, Mar. 18, 2011, pp. 16.
Microsoft Office—Visio, “Add and glue connectors with the Connector tool,” <http://office.microsoft.com/en-us/visio-help/add-and-glue-connectors-with-the-connector-tool-HA010048532.aspx?CTT=1> printed Aug. 4, 2011 in 1 page.
Symantec Corporation, “E-Security Begins with Sound Security Policies,” Announcement Symantec, Jun. 14, 2001.
Official Communication for New Zealand Patent Application No. 622473 dated Mar. 27, 2014.
Google Analytics Official Website—Web Analytics & Reporting, <http://www.google.com/analytics.index.html> Printed Jul. 18, 2013 in 22 pages.
Official Communication for Great Britain Patent Application No. 1408025.3 dated Nov. 6, 2014.
“A First Look: Predicting Market Demand for Food Retail using a Huff Analysis,” TRF Policy Solutions, Jul. 2012, pp. 30.
Miklau et al., “Securing History: Privacy and Accountability in Database Systems,” 3 rd Biennial Conference on Innovative Data Systems Research (CIDR), Jan. 7-10, 2007, Asilomar, California, pp. 387-396.
Wikipedia, “Multimap,” Jan. 1, 2013, https://en.wikipedia.org/w/index.php?title=Multimap&oldid=530800748.
Piwik—Free Web Analytics Software. <http://piwik.org/> Printed Jul. 19, 2013 in18 pages.
Official Communication for Australian Patent Application No. 2014201511 dated Feb. 27, 2015.
Official Communication for European Patent Application No. 14197879.1 dated Apr. 28, 2015.
Glaab et al., “EnrichNet: Network-Based Gene Set Enrichment Analysis,” Bioinformatics 28.18 (2012): pp. i451-i457.
Official Communication for Great Britain Patent Application No. 1404457.2 dated Aug. 14, 2014.
Hansen et al., “Analyzing Social Media Networks with NodeXL: Insights from a Connected World”, Chapter 4, pp. 53-67 and Chapter 10, pp. 143-164, published Sep. 2010.
Amnet, “5 Great Tools for Visualizing Your Twitter Followers,” posted Aug. 4, 2010, http://www.amnetblog.com/component/content/article/115-5-grate-tools-for-visualizing-your-twitter-followers.html.
Official Communication for Netherlands Patent Application No. 2012438 dated Sep. 21, 2015.
Official Communication for European Patent Application No. 14199180.2 dated Jun. 22, 2015.
Official Communication for Netherlands Patent Application No. 2012436 dated Nov. 6, 2015.
Official Communication for New Zealand Patent Application No. 622404 dated Mar. 20, 2014.
Geiger, Jonathan G., “Data Quality Management, The Most Critical Initiative You Can Implement”, Data Warehousing, Management and Quality, Paper 098-29, SUGI 29, Intelligent Solutions, Inc., Bounder, CO, pp. 14, accessed Oct. 3, 2013.
Definition “Identify”, downloaded Jan. 22, 2015, 1 page.
Official Communication for European Patent Application No. 14180142.3 dated Feb. 6, 2015.
Microsoft—Developer Network, “Getting Started with VBA in Word 2010,” Apr. 2010, <http://msdn.microsoft.com/en-us/library/ff604039%28v=office.14%29.aspx> as printed Apr. 4, 2014 in 17 pages.
Official Communication for Israel Patent Application No. 198253 dated Nov. 24, 2014.
Official Communication for European Patent Application No. 14159464.8 dated Feb. 18, 2016.
Bugzilla@Mozilla, “Bug 18726—[feature] Long-click means of invoking contextual menus not supported,” http://bugzilla.mozilla.org/show_bug.cgi?id=18726 printed Jun. 13, 2013 in 11 pages.
Official Communication for European Patent Application No. 14199182.8 dated Mar. 13, 2015.
Official Communication for Netherlands Patent Application No. 2013306 dated Apr. 24, 2015.
Jelen, Bill, “Excel 2013 in Depth, Video Enhanced Edition,” Jan. 25, 2013.
About 80 Minutes, “Palantir in a Number of Parts—Part 6—Graph,” Mar. 21, 2013, pp. 1-6, retrieved from the internet.
Localytics—Mobile App Marketing & Analytics, <http://www.localytics.com/> Printed Jul. 18, 2013 in 12 pages.
Hua et al., “A Multi-attribute Data Structure with Parallel Bloom Filters for Network Services”, HiPC 2006, LNCS 4297, pp. 277-288, 2006.
Official Communication for European Patent Application No. 16188060.4 dated Feb. 6, 2017.
Dramowicz, Ela, “Retail Trade Area Analysis Using the Huff Model,” Directions Magazine, Jul. 2, 2005 in 10 pages, http://www.directionsmag.com/articles/retail-trade-area-analysis-using-the-huff-model/123411.
Official Communication for European Patent Application No. 14158861.6 dated Jun. 16, 2014.
Quest, “Toad for Oracle 11.6—Guide to Using Toad”, pp. 1-162, Sep. 24, 2012.
Official Communication for Australian Patent Application No. 2014210614 dated Jun. 5, 2015.
Capptain—Pilot Your Apps, <http://www.capptain.com> Printed Jul. 18, 2013 in 6 pages.
Palantir, “Kite Data-Integration Process Overview,” Palantir Technologies, Inc., Copyright 2010, pp. 48.
Johnson, Steve, “Access 2013 on demand,” Access 2013 on Demand, May 9, 2013, Que Publishing.
Official Communication for European Patent Application No. 14159464.8 dated Jul. 31, 2014.
Official Communication for European Patent Application No. 14191540.5 dated May 27, 2015.
Rouse, Margaret, “OLAP Cube,” <http://searchdatamanagement.techtarget.com/definition/OLAP-cube>, Apr. 28, 2012.
Goswami, Gautam, “Quite Writly Said!,” One Brick at a Time, Aug. 21, 2005, pp. 7.
Conner, Nancy, “Google Apps: The Missing Manual,” May 1, 2008, pp. 15.
StatCounter—Free Invisible Web Tracker, Hit Counter and Web Stats, <http://statcounter.com/> Printed Jul. 19, 2013 in 17 pages.
Official Communication for European Patent Application No. 15166137.8 dated Sep. 14, 2015.
Official Communication for European Patent Application No. 14159464.8 dated Aug. 20, 2014.
Official Communication for Netherlands Patent Application No. 2012434 dated Jan. 8, 2016.
Official Communication for European Patent Application No. 15165244.3 dated Aug. 27, 2015.
UserMetrix, <http://usermetrix.com/android-analytics> printed Jul. 18, 2013 in 3 pages.
Official Communication for New Zealand Patent Application No. 628840 dated Aug. 28, 2014.
Official Communication for Great Britain Patent Application No. 1413935.6 dated Jan. 27, 2015.
Official Communication for European Patent Application No. 14159464.8 dated Sep. 22, 2014.
Official Communication for Great Britain Patent Application No. 1404489.5 dated Aug. 27, 2014.
Official Communication for Australian Patent Application No. 2014250678 dated Jun. 17, 2015.
Palantir, https://docs.palantir.com/gotham/3.11.1.0/dataguide/baggage/KiteSchema.xsd printed Apr. 4, 2014 in 4 pages.
Hardesty, “Privacy Challenges: Analysis: It's Surprisingly Easy to Identify Individuals from Credit-Card Metadata,” MIT News on Campus and Around the World, MIT News Office.
Official Communication for Netherlands Patent Application No. 2012417 dated Sep. 18, 2015.
Nierman, “Evaluating Structural Similarity in XML Documents”, 6 pages, 2002.
Official Communication for European Patent Application No. 14186225.0 dated Feb. 13, 2015.
Cohn, et al., “Semi-supervised clustering with user feedback,” Constrained Clustering: Advances in Algorithms, Theory, and Applications 4.1 (2003): 17-32.
Official Communication for Canadian Patent Application No. 2666364 dated Jun. 4, 2012.
“A Word About Banks and the Laundering of Drug Money,” Aug. 18, 2012, http://www.golemxiv.co.uk/2012/08/a-word-about-banks-and-the-laundering-of-drug-money/.
Official Communication for Israel Patent Application No. 198253 dated Jan. 12, 2016.
Official Communication for New Zealand Patent Application No. 622389 dated Mar. 20, 2014.
Klemmer et al., “Where Do Web Sites Come From? Capturing and Interacting with Design History,” Association for Computing Machinery, CHI 2002, Apr. 20-25, 2002, Minneapolis, MN, pp. 8.
Microsoft Office—Visio, “About connecting shapes,” <http://office.microsoft.com/en-us/visio-help/about-connecting-shapes-HP085050369.aspx> printed Aug. 4, 2011 in 6 pages.
Map of San Jose, CA. Retrieved Oct. 2, 2013 from http://maps.yahoo.com.
Trak.io, <http://trak.io/> printed Jul. 18, 2013 in 3 pages.
Boyce, Jim, “Microsoft Outlook 2010 Inside Out,” Aug. 1, 2010, retrieved from the internet https://capdtron.files.wordpress.com/2013/01/outlook-2010-inside_out.pdf.
Official Communication for Great Britain Patent Application No. 1404486.1 dated May 21, 2015.
Official Communication for New Zealand Patent Application No. 628585 dated Aug. 26, 2014.
Official Communication for European Patent Application No. 14189344.6 dated Feb. 20, 2015.
Official Communication for European Patent Application No. 14199180.2 dated Aug. 31, 2015.
Official Communication for New Zealand Patent Application No. 622497 dated Jun. 19, 2014.
Official Communication for European Patent Application No. 14158977.0 dated Mar. 11, 2016.
Wikipedia, “Federated Database System,” Sep. 7, 2013, retrieved from the internet on Jan. 27, 2015 http://en.wikipedia.org/w/index.php?title=Federated_database_system&oldid=571954221.
Gu et al., “Record Linkage: Current Practice and Future Directions,” Jan. 15, 2004, pp. 32.
Palantir, “Kite,” https://docs.palantir.com/gotham/3.11.1.0/adminreference/datasources.11 printed Aug. 30, 2013 in 2 pages.
Official Communication for New Zealand Patent Application No. 627061 dated Jul. 14, 2014.
Hur et al., “SciMiner: web-based literature mining tool for target identification and functional enrichment analysis,” Bioinformatics 25.6 (2009): pp. 838-840.
Flurry Analytics, <http://www.flurry.com/> Printed Jul. 18, 2013 in 14 pages.
Nivas, Tuli, “Test Harness and Script Design Principles for Automated Testing of non-GUI or Web Based Applications,” Performance Lab, Jun. 2011, pp. 30-37.
Huff et al., “Calibrating the Huff Model Using ArcGIS Business Analyst,” ESRI, Sep. 2008, pp. 33.
Official Communication for New Zealand Patent Application No. 624557 dated May 14, 2014.
Acklen, Laura, “Absolute Beginner's Guide to Microsoft Word 2003,” Dec. 24, 2003, pp. 15-18, 34-41, 308-316.
Official Communication for European Patent Application No. 14200298.9 dated May 13, 2015.
Official Communication for European Patent Application No. 14158958.0 dated Mar. 11, 2016.
TestFlight—Beta Testing on the Fly, <http://testflightapp.com/> Printed Jul. 18, 2013 in 3 pages.
DISTIMO—App Analytics, <http://www.distimo.com/app-analytics> Printed Jul. 18, 2013 in 5 pages.
Kontagent Mobile Analytics, <http://www.kontagent.com/> Printed Jul. 18, 2013 in 9 pages.
Official Communication for European Patent Application No. 14189347.9 dated Mar. 4, 2015.
Official Communication for Australian Patent Application No. 2014201507 dated Feb. 27, 2015.
Chaudhuri et al., “An Overview of Business Intelligence Technology,” Communications of the ACM, Aug. 2011, vol. 54, No. 8.
Official Communication for New Zealand Patent Application No. 628161 dated Aug. 25, 2014.
Palantir, “The Repository Element,” https://docs.palantir.com/gotham/3.11.1.0/dataguide/kite_config_file.04 printed Aug. 30, 2013 in 2 pages.
European Claims in application No. 16182336.4-1952, dated Dec. 2016, 3 pages.
European Claims in application No. 16194936.7-1871, dated Mar. 3, 2017, 3 pages.
European Patent Office, “Search Report” in application No. 16182336.4-1952, dated December.
European Patent Office, “Search Report” in application No. 16194936.7-1871, dated Mar. 9.
Huff, David L., “Parameter Estimation in the Huff Model,” ESRI, ArcUser, Oct.-Dec. 2003, pp. 34-36.
Palantir, “Kite Operations,” Palantir Technologies, Inc., Copyright 2010, p. 1.
Official Communication for European Patent Application No. 15155845.9 dated Oct. 6, 2015.
Palantir, https://docs.palantir.com/gotham/3.11.1.0/dataguide/baggage/KiteSchema printed Aug. 30, 2013 in 1 page.
“Refresh CSS Ellipsis When Resizing Container—Stack Overflow,” Jul. 31, 2013, retrieved from internet http://stackoverflow.com/questions/17964681/refresh-css-ellipsis-when-.
Bluttman et al., “Excel Formulas and Functions for Dummies,” 2005, Wiley Publishing, Inc., pp. 280, 284-286.
Mixpanel—Mobile Analytics, <https://mixpanel.com/> Printed Jul. 18, 2013 in 13 pages.
Open Web Analytics (OWA), <http://www.openwebanalytics.com/> Printed Jul. 19, 2013 in 5 pages.
Zheng et al., “GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis,” Nucleic acids research 36.suppl 2 (2008): pp. W385-W363.
Official Communication for New Zealand Patent Application No. 622497 dated Mar. 26, 2014.
Official Communication for Great Britain Patent Application No. 1404479.6 dated Jul. 9, 2015.
Official Communication for European Patent Application No. 14180281.9 dated Jan. 26, 2015.
Map of San Jose, CA. Retrieved Oct. 2, 2013 from http://maps.google.com.
Palantir, “Write a Kite Configuration File in Eclipse,” Palantir Technologies, Inc., Copyright 2010, pp. 2.
Countly Mobile Analytics, <http://count.ly/> Printed Jul. 18, 2013 in 9 pages.
Appacts, “Smart Thinking for Super Apps,” <http://www.appacts.com> Printed Jul. 18, 2013 in 4 pages.
Sigrist, et al., “PROSITE, a Protein Domain Database for Functional Characterization and Annotation,” Nucleic Acids Research, 2010, vol. 38, pp. D161-D166.
U.S. Appl. No. 14/025,653, filed Sep. 12, 2013, Office Action Interview, dated Oct. 6, 2015.
U.S. Appl. No. 14/134,558, filed Dec. 19, 2013, Office Action, dated Oct. 7, 2015.
U.S. Appl. No. 13/831,791, filed Mar. 15, 2013, Office Action, dated Mar. 4, 2015.
U.S. Appl. No. 14/025,653, filed Sep. 12, 2013, Interview Summary, dated Mar. 3, 2016.
U.S. Appl. No. 13/839,026, filed Mar. 15, 2013, Restriction Requirement, dated Apr. 2, 2015.
U.S. Appl. No. 13/827,491, filed Mar. 14, 2013, Office Action, dated Oct. 9, 2015.
U.S. Appl. No. 14/319,765, filed Jun. 30, 2014, Advisory Action, dated Sep. 10, 2015.
U.S. Appl. No. 14/306/147, filed Jun. 16, 2014, Final Office Action, dated Feb. 19, 2015.
U.S. Appl. No. 14/225,006, filed Mar. 25, 2014, First Office Action Interview, dated Feb. 27, 2015.
U.S. Appl. No. 14/306,154, filed Jun. 16, 2014, Advisory Action, dated May 15, 2015.
U.S. Appl. No. 14/451,221, filed Aug. 4, 2014, Office Action, dated Oct. 21, 2014.
U.S. Appl. No. 14/306,138, filed Jun. 16, 2014, Office Action, dated May 26, 2015.
U.S. Appl. No. 14/225,160, filed Mar. 25, 2014, Final Office Action, dated Feb. 11, 2015.
U.S. Appl. No. 14/473,552, filed Aug. 29, 2014, Notice of Allowance, dated Jul. 24, 2015.
U.S. Appl. No. 14/874,690, filed Oct. 5, 2015, First Action Interview, dated Dec. 21, 2015.
U.S. Appl. No. 14/877,229, filed Oct. 7, 2015, Office Action, dated Mar. 22, 2016.
U.S. Appl. No. 14/533,433, filed Nov. 5, 2014, Notice of Allowance, dated Sep. 1, 2015.
U.S. Appl. No. 14/508,696, filed Oct. 7, 2014, Office Action, dated Mar. 2, 2015.
U.S. Appl. No. 14/148,568, filed Jan. 6, 2014, Office Action, dated Mar. 26, 2015.
U.S. Appl. No. 14/225,160, filed Mar. 25, 2014, First Office Action Interview, dated Jul. 29, 2014.
U.S. Appl. No. 15/504,103, filed Oct. 1, 2014, First Office Action Interview, dated Feb. 5, 2015.
U.S. Appl. No. 14/533,433, filed Nov. 5, 2014, Office Action, dated Feb. 26, 2015.
U.S. Appl. No. 14/306,147, filed Jun. 16, 2014, First Office Action Interview, dated Sep. 9, 2014.
U.S. Appl. No. 14/639,606, filed Mar. 5, 2015, First Office Action Interview, dated Jul. 24, 2015.
U.S. Appl. No. 14/319,765, filed Jun. 30, 2014, Notice of Allowance, dated Nov. 25, 2014.
U.S. Appl. No. 14/571,098, filed Dec. 15, 2014, First Office Action Interview, dated Aug. 5, 2015.
U.S. Appl. No. 14/323,935, filed Jul. 3, 2014, Office Action, dated Jun. 22, 2015.
U.S. Appl. No. 14/616,080, filed Feb. 6, 2015, Notice of Allowance, dated Apr. 2, 2015.
U.S. Appl. No. 14/225,006, filed Mar. 25, 2014, First Office Action Interview, dated Sep. 10, 2014.
U.S. Appl. No. 14/306,147, filed Jun. 16, 2014, Office Action, dated Aug. 7, 2015.
U.S. Appl. No. 14/879,916, filed Oct. 9, 2015, Notice of Allowance, dated Jun. 22, 2016.
U.S. Appl. No. 14/948,009, filed Nov. 20, 2015, Notice of Allowance, dated May 6, 2016.
U.S. Appl. No. 14/225,084, filed Mar. 25, 2014, First Office Action Interview, dated Sep. 2, 2014.
U.S. Appl. No. 14/486,991, filed Sep. 15, 2014, Notice of Allowance, dated May 1, 2015.
U.S. Appl. No. 14/196,814, filed Mar. 4, 2014, Office Action, dated May 5, 2015.
U.S. Appl. No. 14/225,084, filed Mar. 25, 2014, Office Action, dated Sep. 11, 2015.
U.S. Appl. No. 14/483,527, filed Sep. 11, 204, First Office Action Interview, dated Jan. 28, 2015.
U.S. Appl. No. 12/556,318, filed Jun. 16, 2014, Office Action, dated Jul. 2, 2015.
U.S. Appl. No. 14/526,066, filed Mar. 25, 2014, Final Office Action, dated May 6, 2016.
U.S. Appl. No. 14/326,738, filed Jul. 9, 2014, First Office Action Interview, dated Mar. 31, 2015.
U.S. Appl. No. 14/306,154, filed Jun. 16, 2014, Final Office Action, dated Mar. 11, 2015.
U.S. Appl. No. 14/268,964, filed May 2, 2014, First Office Action, dated Sep. 3, 2014.
U.S. Appl. No. 13/922,437, filed Jun. 20, 2013, Notice of Allowance, dated Jul. 3, 2014.
U.S. Appl. No. 13/835,688, filed Mar. 15, 2013, First Office Action Interview, dated Jun. 17, 2015.
U.S. Appl. No. 14/746,671, filed Jun. 22, 2015, First Office Action Interview, dated Nov. 12, 2015.
U.S. Appl. No. 14/323,935, filed Jul. 3, 2014, First Office Action Interview, dated Nov. 28, 2014.
U.S. Appl. No. 14/319,765, filed Jun. 30, 2014, First Office Action Interview, dated Feb. 4, 2015.
U.S. Appl. No. 14/294,098, filed Jun. 2, 2014, Final Office Action, dated Nov. 6, 2014.
U.S. Appl. No. 14/225,084, filed Mar. 25, 2014, Notice of Allowance, dated May 4, 2015.
U.S. Appl. No. 14/134,558, filed Dec. 19, 2013, Final Office Action, dated May 16, 2016.
U.S. Appl. No. 14/579,752, filed Dec. 22, 2014, First Office Action Interview, dated May 26, 2015.
U.S. Appl. No. 13/827,491, filed Mar. 14, 2013, Office Action, dated Dec. 1, 2014.
U.S. Appl. No. 14/225,160, filed Mar. 25, 2014, Office Action, dated Aug. 12, 2015.
U.S. Appl. No. 14/225,084, filed Mar. 25, 2014, Interview Summary, dated Jan. 4, 2016.
U.S. Appl. No. 14/094,418, filed Dec. 2, 2013, Notice of Allowance, dated Jan. 25, 2016.
U.S. Appl. No. 14/225,084, filed Mar. 25, 2014, First Office Action Interview, dated Feb. 20, 2015.
U.S. Appl. No. 14/562,524, filed Dec. 5, 2014, First Office Action Interview, dated Sep. 14, 2015.
U.S. Appl. No. 14/473,860, filed Aug. 29, 2014, Notice of Allowance, dated Jan. 5, 2015.
U.S. Appl. No. 14/842,734, filed Sep. 1, 2015, First Office Action Interview, dated Nov. 19, 2015.
U.S. Appl. No. 14/294,098, filed Jun. 2, 2014, First Office Action Interview, dated Aug. 15, 2014
U.S. Appl. No. 14/473,552, filed Aug. 29, 2014, Interview Summary, dated Feb. 24, 2015.
U.S. Appl. No. 14/326,738, filed Jul. 9, 2014, First Office Action Interview, dated Dec. 2, 2014.
U.S. Appl. No. 14/562,524, filed Dec. 5, 2014, First Office Action Interview, dated Nov. 10, 2015.
U.S. Appl. No. 14/874,690, filed Oct. 5, 2015, Office Action, dated Jun. 1, 2016.
U.S. Appl. No. 14/849,545, filed Sep. 9, 2015, Office Action, dated Jan. 29, 2016.
U.S. Appl. No. 14/813,749, filed Jul. 30, 2015, Office Action, dated Sep. 28, 2015.
U.S. Appl. No. 14/504,103, filed Oct. 1, 2014, Notice of Allowance, dated May 18, 2015.
U.S. Appl. No. 14/326,738, filed Jul. 9, 2014, Notice of Allowance, dated Nov. 18, 2015.
U.S. Appl. No. 14/579,752, filed Dec. 22, 2014, Final Office Action, dated Aug. 19, 2015.
U.S. Appl. No. 14/580,218, filed Dec. 23, 2014, Office Action, dated Jun. 7, 2016.
U.S. Appl. No. 14/306,154, filed Jun. 16, 2014, Office Action, dated Mar. 17, 2016.
U.S. Appl. No. 14/504,103, filed Oct. 1, 2014, Notice of Allowance, dated Sep. 9, 2014.
U.S. Appl. No. 13/247,987, filed Sep. 28, 2011, Notice of Allowance, dated Mar. 17, 2016.
U.S. Appl. No. 14/849,454, filed Sep. 9, 2015, Interview Summary, dated Feb. 24, 2016.
U.S. Appl. No. 14/490,612, filed Sep. 18, 2014, First Office Action Interview, dated Jan. 27, 2015.
U.S. Appl. No. 14/319,161, filed Jun. 30, 2014, Office Action, dated Sep. 25, 2014.
U.S. Appl. No. 14/676,621, filed Apr. 1, 2015, Final Office, dated Oct. 29, 2015.
U.S. Appl. No. 14/306,154, filed Jun. 16, 2014, Final Office Action, dated Nov. 16, 2015.
U.S. Appl. No. 14/326,738, filed Jul. 9, 2014, First Office Action, dated Dec. 2, 2014.
U.S. Appl. No. 13/247,987, filed Sep. 28, 2011, Office Action, dated Apr. 2, 2015.
U.S. Appl. No. 14/849,454, filed Sep. 9, 2015, Notice of Allowance, dated Nov. 3, 2015.
U.S. Appl. No. 14/552,336, filed Nov. 24, 2014, Notice of Allowance, dated Nov. 3, 2015.
U.S. Appl. No. 14/639,606, filed Mar. 5, 2015, First Office Action, dated May 18, 2015.
U.S. Appl. No. 14/306,138, filed Jun. 16, 2014, First Office Action, dated Sep. 23, 2015.
U.S. Appl. No. 14/463,615, filed Aug. 19, 2014, Advisory Action, dated Sep. 10, 2015.
U.S. Appl. No. 14/225,160, filed Mar. 25, 2014, First Office Action Interview, dated Oct. 22, 2014.
U.S. Appl. No. 14/225,160, filed Mar. 25, 2014, Advisory Action, dated May 20, 2015.
U.S. Appl. No. 14/508,696, filed Oct. 7, 2014, Notice of Allowance, dated Jul. 27, 2015.
U.S. Appl. No. 14/223,918, filed Mar. 24, 2014, Notice of Allowance, dated Jan. 6, 2016.
U.S. Appl. No. 14/294,098, filed Jun. 2, 2014, First Office Action Interview, dated Aug. 15, 2014.
U.S. Appl. No. 14/486,991, filed Sep. 15, 2014, Office Action, dated Mar. 10, 2015.
U.S. Appl. No. 14/526,066, filed Oct. 28, 2014, Office Action, dated Jan. 21, 2016.
U.S. Appl. No. 13/839,026, filed Mar. 15, 2013, Office Action, dated Aug. 4, 2015.
U.S. Appl. No. 14/504,103, filed Oct. 1, 2014, First Office Action Interview, dated Mar. 31, 2015.
U.S. Appl. No. 14/306,138, filed Jun. 16, 2014, Office Action, dated Mar. 17, 2016.
U.S. Appl. No. 14/319,765, filed Jun. 30, 2014, Final Office Action, dated Jun. 16, 2015.
U.S. Appl. No. 14/323,935, filed Jul. 3, 2014, First Office Action Interview, dated Mar. 31, 2015.
U.S. Appl. No. 14/148,568, filed Jan. 6, 2014, Notice of Allowance, dated Aug. 26, 2015.
U.S. Appl. No. 14/879,916, filed Oct. 9, 2015, First Office Action Interview, dated Apr. 15, 2016.
U.S. Appl. No. 14/954,680, filed Nov. 30, 2015, Office Action, dated May 12, 2016.
U.S. Appl. No. 14/289,596, filed May 28, 2014, Advisory Action, dated Apr. 30, 2015.
U.S. Appl. No. 14/289,596, filed May 28, 2014, Final Office Action, dated Jan. 26, 2015.
U.S. Appl. No. 14/571,098, filed Dec. 15, 2014, First Office Action, dated Nov. 10, 2015.
U.S. Appl. No. 14/463,615, filed Aug. 19, 2014, Final Office Action, dated May 21, 2015.
U.S. Appl. No. 14/225,160, filed Mar. 25, 2014, First Office Action, dated Oct. 22, 2014.
U.S. Appl. No. 14/631,633, filed Feb. 25, 2015, First Office Action, dated Sep. 10, 2015.
U.S. Appl. No. 14/800,447, filed Jul. 15, 2012, First Office Action, dated Dec. 10, 2010.
U.S. Appl. No. 14/102,394, filed Dec. 10, 2013, Notice of Allowance, dated Aug. 25, 2014.
U.S. Appl. No. 14/044,800, filed Oct. 2, 2013, Notice of Allowance, dated Sep. 2, 2014.
U.S. Appl. No. 14/948,009, filed Nov. 20, 2015, First Action Interview, dated Feb. 25, 2016.
U.S. Appl. No. 14/645,304, filed Mar. 11, 2015, Office Action, filed Jan. 25, 2016.
U.S. Appl. No. 14/294,098, filed Jun. 2, 2014, Notice of Allowance, dated Dec. 24, 2015.
U.S. Appl. No. 14/306,147, filed Jun. 16, 2014, Final Office Action, dated Dec. 24, 2015.
U.S. Appl. No. 14/479,863, filed Sep. 8, 2014, First Office Action Interview, dated Dec. 26, 2014.
U.S. Appl. No. 14/289,599, filed May 28, 2014, First Office Action Interview, dated Jul. 22, 2014.
U.S. Appl. No. 14/463,615, filed Aug. 19, 2014, First Office Action Interview, dated Nov. 13, 2014.
U.S. Appl. No. 14/148,568, filed Jan. 6, 2014, Final Office Action, dated Oct. 22, 2014.
U.S. Appl. No. 14/483,527, filed Sep. 11, 2014, Final Office Action, dated Jun. 22, 2015.
U.S. Appl. No. 14/326,738, filed Jul. 9, 2014, First Office Action, dated Mar. 31, 2015.
U.S. Appl. No. 13/557,100, filed Jul. 24, 2012, Final Office Action, dated Apr. 7, 2016.
U.S. Appl. No. 14/306,138, filed Jun. 16, 2014, Final Office Action, dated Sep. 14, 2015.
U.S. Appl. No. 14/225,006, filed Mar. 25, 2014, Advisory Action, dated Dec. 21, 2015.
U.S. Appl. No. 14/479,863, filed Sep. 8, 2014, Notice of Allowance, dated Mar. 31, 2015.
U.S. Appl. No. 14/289,596, filed May 28, 2014, First Office Action Interview, dated Jul. 18, 2014.
U.S. Appl. No. 14/192,767, filed Feb. 27, 2014, Notice of Allowance, dated Dec. 16, 2014.
U.S. Appl. No. 14/319,161, filed Jun. 30, 2014, Notice of Allowance, dated May 4, 2015.
U.S. Appl. No. 14/306,138, filed Jun. 16, 2014, Interview Summary, dated Dec. 3, 2015.
U.S. Appl. No. 14/306,154, filed Jun. 16, 2014, Office Action, dated Jul. 6, 2015.
U.S. Appl. No. 14/571,098, filed Dec. 15, 2014, First Office Action Interview, dated Mar. 11, 2015.
U.S. Appl. No. 14/323,935, filed Jul. 30, 2014, Notice of Allowance, dated Oct. 1, 2015.
U.S. Appl. No. 14/108,187, filed Dec. 16, 2013, Notice of Allowance, dated Aug. 29, 2014.
U.S. Appl. No. 14/268,964, filed May 2, 2014, Notice of Allowance, dated Dec. 3, 2014.
U.S. Appl. No. 14/306,138, filed Jun. 16, 2014, Final Office Action, dated Feb. 18, 2015.
U.S. Appl. No. 14/578,389, filed Dec. 20, 2014, Office Action, dated Oct. 21, 2015.
U.S. Appl. No. 14/552,336, filed Nov. 24, 2014, First Office Action, dated Jul. 20, 2015.
U.S. Appl. No. 14/571,098, filed Dec. 15, 2014, First Office Action Interview, dated Aug. 24, 2015.
U.S. Appl. No. 13/196,788, filed Aug. 2, 2011, Office Action, dated Oct. 23, 2015.
U.S. Appl. No. 14/631,633, filed Feb. 25, 2015, First Office Action Interview, dated Feb. 3, 2016.
U.S. Appl. No. 14/463,615, filed Aug. 19, 2014, First Office Action Interview, dated Jan. 28, 2015.
U.S. Appl. No. 14/306,154, filed Jun. 16, 2014, First Office Action Interview, dated Sep. 9, 2014.
U.S. Appl. No. 13/827,491, filed Mar. 14, 2013, Final Office Action, dated Jun. 22, 2015.
U.S. Appl. No. 14/319,161, filed Jun. 30, 2014, Final Office Action, dated Jan. 23, 2015.
U.S. Appl. No. 14/676,621, filed Apr. 1, 2015, First Office Action Interview, dated Jul. 30, 2015.
U.S. Appl. No. 14/319,765, filed Jun. 30, 2014, Office Action, dated Feb. 1, 2016.
U.S. Appl. No. 15/262,207, filed Sep. 12, 2016, Final Office Action, dated Jun. 8, 2017.
U.S. Appl. No. 15/262,207, filed Sep. 12, 2016, Office Action, dated Feb. 21, 2017.
Canese et al., “Chapter 2: PubMed: The Bibliographic Database,” The NCBI Handbook, Oct. 2002, pp. 1-10.
Official Communication for Australian Patent Application No. 2014202442 dated Mar. 19, 2015.
Wollrath et al., “A Distributed Object Model for the Java System”, Conference on Object-Oriented Technologies and Systems, pp. 219-231, Jun. 17-21, 1996.
Zaharia et al., “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing” dated 2012, 14 pages.
Official Communication for European Patent Application No. 14189802.3 dated May 11, 2015.
Dean et al., “MapReduce: Simplified Data Processing on Large Clusters”, OSDI 2004, 13 pages.
Official Communication for European Patent Application No. 14197895.7 dated Apr. 28, 2015.
Official Communication for Great Britain Patent Application No. 1411984.6 dated Jan. 8, 2016.
Osterweil et al., “Capturing, Visualizing and Querying Scientific Data Provenance”, http://www.mtholyoke.edu/-blerner/dataprovenance/ddg.html, dated May 20, 2015, 3 pages.
Official Communication for New Zealand Patent Application No. 627962 dated Aug. 5, 2014.
Official Communication for Great Britain Patent Application No. 1411984.6 dated Dec. 22, 2014.
Official Communication for Australian Patent Application No. 2014201506 dated Feb. 27, 2015.
European Claims in application No. 16 182 336.4-1222, dated Jan. 2018, 4 pages.
European Patent Office, “Search Report” in application No. 16 182 336.4-1222, dated Jan. 11, 2018, 12 pages.
U.S. Appl. No. 14/874,690, filed Oct. 5, 2015, Notice of Allowance, dated Oct. 5, 2016.
U.S. Appl. No. 14/225,006, filed Mar. 25, 2014, Final Office Action, dated Sep. 2, 2015.
U.S. Appl. No. 13/922,437, filed Jun. 20, 2013, Notice of Allowance, dated May 8, 2014.
U.S. Appl. No. 14/483,527, filed Sep. 11, 2014, Office Action, dated Oct. 28, 2015.
U.S. Appl. No. 14/135,289, filed Oct. 5, 2015, Notice of Allowance, dated Oct. 14, 2014.
U.S. Appl. No. 14/816,264, filed Aug. 3, 2015, Pre Office Action Interview, dated Oct. 19, 2017.
U.S. Appl. No. 14/326,738, filed Jul. 9, 2014, Final Office Action, dated Jul. 31, 2015.
U.S. Appl. No. 14/490,612, filed Sep. 18, 2014, Pre Office Action Interview, dated Jan. 27, 2015.
U.S. Appl. No. 14/141,252, filed Oct. 8, 2015, Office Action, dated Oct. 8, 2015.
U.S. Appl. No. 14/816,264, filed Aug. 3, 2015 Notice of Allowance, dated Jan. 30, 2018.

Related Publications (1)

	Number	Date	Country
	20170097950 A1	Apr 2017	US

Continuations (2)

	Number	Date	Country
Parent	14879916	Oct 2015	US
Child	15287715		US
Parent	14533433	Nov 2014	US
Child	14879916		US

Universal data pipeline

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications

Disclaimer

Abstract