Processing a database query using a shared metadata store

Description

FIELD OF THE INVENTION

This invention relates generally to parallel processing database, and more particularly to systems and methods for processing queries in parallel using a shared metadata store.

BACKGROUND OF THE INVENTION

Database systems are used to house digital information for a variety of applications and users. These systems may house thousands of terabytes or petabytes of information, all of which may need to be quickly searched and analyzed at a user's request. Occasionally, these search and analysis requests may be computationally intensive for a single machine, and the query tasks may be distributed among multiple nodes in a cluster

Massively parallel processing (“MPP”) databases may be used to execute complex database queries in parallel by distributing the queries to nodes in a cluster. Each node may receive a portion of the query and execute it using a local metadata store. Occasionally, data may be replicated between the nodes in a cluster, thereby reducing consistency and increasing maintenance costs.

There is a need, therefore, for an improved method, article of manufacture, and apparatus for performing queries on a distributed database system.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:

FIG. 1 illustrates a parallel processing database architecture in accordance with some embodiments of the present disclosure.

FIG. 2 illustrates a parallel processing database have a shared metadata catalog in accordance with some embodiments of the present invention.

FIG. 3 is a flowchart of a method for executing a query using a shared metadata catalog in accordance with some embodiments of the present invention.

FIG. 4 illustrates a flowchart of a method for executing a query in parallel on a parallel processing database using a shared metadata catalog in accordance with some embodiments of the present invention.

FIG. 5 illustrates a system architecture for locating execution metadata using a tree structure in accordance with some embodiments of the present invention.

FIG. 6 illustrates a flowchart of a method for locating execution metadata using a tree structure in accordance with some embodiments of the present invention.

DETAILED DESCRIPTION

A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. While the invention is described in conjunction with such embodiment(s), it should be understood that the invention is not limited to any one embodiment. On the contrary, the scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example, and the present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.

It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium such as a computer readable storage medium or a computer network wherein computer program instructions are sent over optical or electronic communication links. Applications may take the form of software executing on a general purpose computer or be hardwired or hard coded in hardware. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.

An embodiment of the invention will be described with reference to a data storage system in the form of a storage system configured to store files, but it should be understood that the principles of the invention are not limited to this configuration. Rather, they are applicable to any system capable of storing and handling various types of objects, in analog, digital, or other form. Although terms such as document, file, object, etc. may be used by way of example, the principles of the invention are not limited to any particular form of representing and storing data or other information; rather, they are equally applicable to any object capable of representing information.

With reference to FIG. 1, a parallel processing database architecture consistent with an embodiment of the present disclosure is discussed. Client 100 may submit a query, such as an SQL database query, to master node 102. Master node 102 may comprise processor 104 and non-transitory computer readable medium 106. Master node 102 may derive one or more query plans based on the query received from client 100, and thereafter transmit the query plans to worker nodes 108. A query plan may be, for example, a set of instructions for performing a data operation on a database. In an embodiment, worker nodes 108 may include processors 112 and non-transitory computer readable storage mediums 114. Worker nodes 108 may process the query plans in parallel, and then return their results to master node 102. Master node 102 may compile all the received results, and return a final query result to client 100.

In some embodiments, worker nodes may need query metadata to execute the received query plans. Query metadata may include, for example, database table definitions, user or system defined database functions, database views, and/or database indexes. In some embodiments, this metadata may be maintained by catalogs on every worker node in the system. For example, these catalogs may be stored in non-transitory computer readable mediums 114.

While maintaining the catalogs on every node may be manageable on smaller systems, such as system with one or two machines, such maintenance may not be scalable as the database cluster grows. For example, if a database cluster comprises ten thousand nodes, and if each node contains a local metadata catalog, maintaining those catalogs may be unwieldy or impossible. Even a minor change may need to be replicated among the ten thousand different nodes, and each replication presents a risk of error. As the cluster size grows, this risk increases. Further, storing the catalog on every node in the system may not be an efficient use of storage resources. Even if the catalog only consumes a small amount of storage space, this storage space may be significant when aggregated over thousands of nodes.

In order to address these challenges, a database system may use the master node/worker node architecture shown in FIG. 2. Client 200 may submit a query, such as an SQL query, to master node 202. Master node 202 may develop a query plan from the query, and forward that plan to worker node 204 for execution. In an embodiment, client 200 may be similar to client 100, master node 202 may be similar to master node 102, and worker node 204 may be substantially similar to worker nodes 108. While only one worker node is shown in FIG. 2, any number of nodes may be used in the database cluster.

The query from client 200 may be received by query dispatcher 206. In an embodiment, query dispatcher 206 develops query plans from the received query. Query dispatcher may also determine what metadata may be necessary for the execution of the query plans, and retrieve that metadata from database catalog server 208 (the “metadata catalog” or “database catalog”). This metadata may be identified while interpreting the received query and developing the query plans. In an embodiment, the database catalog may be stored on a non-transitory computer readable medium, such as storage 210. Query dispatcher may then transmit both the retrieved metadata and the query plan to worker node 204.

Transmitting the metadata data along with the query plan from master node 202 allows the database catalog to be maintained at a single location; namely, master node 202. Since worker node 204 receives the query plan along with the metadata, it does not need to maintain a local metadata catalog. When a change is made to the catalog, it may be made a single location and may not need to be propagated to other nodes in the cluster. This may decrease maintenance costs, improve reliability, increase the amount of available space in the cluster, and improve scalability.

In an embodiment, the query plan is annotated to include the metadata, and the plan and metadata are transmitted at the same time. Additionally or alternatively, the query plan and metadata may be transmitted separately. For example, the metadata may be transmitted to worker node 204 before or after the query plan.

Once worker node 204 has received the plan and the metadata, query executer 212 may execute the query plan. In some embodiments, this execution may involve a performing a data operation on data 214. Data 214 may be stored on a computer readable medium, such as medium 114. In some embodiments, the metadata received from master node 202 may not be sufficient to fully execute the query plan. Should query executor 212 need additional metadata, it may send a request back to database catalog server 208. Catalog server 208 may retrieve the additional metadata, transmit it back to query executor 212, and the query executor may complete the query.

In an embodiment, a separate catalog server session is established for each query request. For example, when a request is received a catalog server session may be initiated, where that server session includes a snapshot of the metadata catalog. In an embodiment, this snapshot is taken when the query is executed. The metadata initially transmitted to the worker nodes may be retrieved from that session, and any incoming request for additional metadata may retrieve the additional metadata from the same session. This may ensure that the metadata remains consistent throughout query execution. For example, if a session is not used, the query dispatcher may distribute query plans with the metadata, the metadata may then change on the database catalog server or computer readable medium, and a worker node may make a request for additional metadata. In response, the catalog server may distribute the modified metadata which is not consistent with the original query. Initiating separate catalog server processes may alleviate this problem.

Turning now to FIG. 3, a method for executing a query on a system substantially similar to FIG. 2 is discussed. At 300, a query is received at a master node. This query could be received, for example, from client 100. The master node may comprise a database catalog which includes metadata defining database objects. This database catalog may be managed by database catalog server 208, and stored on storage 210. In an embodiment, the metadata may include database table definitions, user or system defined database functions, database views, and/or database indexes.

At 302, a query plan and query metadata are transmitted to a worker node for execution. The query plan may be based on the received query, and may comprise an execution strategy for completing all or a portion of the query. The query metadata may include metadata needed for executing the query plan. For example, if the query plan involves a user defined function, that function may be included in the transmitted metadata.

At 304, the metadata may be stored a local cache on the worker node. This cache could exist, for example, in a memory such as random access memory (“RAM”). Storing the metadata in cache allows for rapid retrieval during the execution process and reduces the number of call backs from the worker node to the metadata catalog on the master node.

At block 306, the query plan is executed on the worker node. The query execution may require use of metadata, and that metadata may be retrieved from the worker cache.

At 308, the worker may determine that it needs additional metadata to execute the query, and may transmit a request for that metadata back to the master node. In some embodiments, this transmission may be received by a catalog server, such as metadata catalog server 208. Additionally or alternatively, the transmission may be received and processed by a catalog server session as discussed above.

At 310, the additional requested metadata may be transmitted from the master to the worker, and the query execution may continue. At block 312, once the execution is complete, the cache may be cleared and the query result may be returned to the master node.

FIG. 4 depicts a method similar to FIG. 3 for executing query plans in parallel on a database cluster. At 400 a query is received at a master node, where the master node comprises a metadata catalog. The catalog may include metadata defining database objects, as discussed above.

At block 402, the master node may generate a plurality of query plans. These plans could be generated, for example, using query dispatcher 206. At 404, these plans may be distributed to a plurality of worker nodes in the database cluster, and at 406 the plans may be executed.

Turning now to FIG. 5, a system for locating query metadata is shown. As previously discussed, a worker node may transmit a request to the master node when the worker does not have all the necessary metadata for executing a query plan. When there are only a few nodes in the cluster, this may be an efficient way of obtaining the missing metadata. As the cluster size increases, however, this approach may become more costly. For example, if there are ten thousand nodes in a cluster, additional metadata may be requested from up to ten thousand locations. The system may not have sufficient bandwidth, and the master node may not have enough processing resources, to handle this number of consecutive connections.

The architecture shown in FIG. 5 may help overcome these issues. Worker nodes 504, 505, 506, 508, and 510 may be configured in a tree structure, and master node 502 may be a root node. Master node 502 may receive a query from client 500, may develop query plans for that query, and may distribute the query plans and needed metadata to the worker nodes. This process may be substantially similar to the processes discussed above. In an embodiment, master node 502 may distribute the query plans and metadata directly to each worker node. In other words, master node 502 has a connection to each worker node and may transmit the query plans and metadata without using the shown tree structure.

Once a worker node has received a query plan and some associated metadata, that node may begin processing the plan. In an embodiment, a worker node may need additional metadata that was not included in the original transmission from the master node. As discussed above, worker node may send a request to a master node for the additional metadata. This may, however, result in an unmanageable number of connections to the master node if multiple worker nodes make similar requests.

In some embodiments, rather than transmitting a request directly to the master node, the worker node may request additional metadata from a parent in the tree structure. Since the master node distributes metadata to all the nodes in the cluster, a parent of the present worker node may have the additional metadata stored in cache. If the immediate parent does not have the additional metadata, the successive parents may be queried until the metadata is found or the master node is reached. Once the additional metadata is found, whether on an ancestor or the master node, it may be transmitted back to the requesting worker node. This may allow a very large number of nodes in a cluster to request additional metadata, without opening an unmanageable number of connections to the master node.

For example, master node 502 may transmit a query plan and some metadata to worker node 505. Worker node 505 may determine that additional metadata is necessary to execute the query plan. Rather than requesting the additional metadata directly from master node 502 (which contains the metadata catalog), worker node 505 may request the metadata from its parent worker node 508. Worker node 508 may check its cache and return the additional metadata to node 505 if the metadata is found. If the additional metadata is not found, worker node 508 may forward the request to the next parent, which is master node 502. Master node 502 may retrieve the additional metadata from the metadata catalog and transmit it to the original requesting worker node 505.

In some embodiments, requests for additional metadata may be forwarded up the tree structure as just described. Each node may know its parent, and if the metadata is not found in the local cache the node may forward the request to that parent. The tree structure may be particularly beneficial because new nodes can be added or removed without updating information on every node in the cluster. In some embodiments, however, each worker node may be responsible for maintaining its own ancestry. For example, worker node 505 may know its parents are worker node 508 and master node 502. If a request for additional metadata is sent to worker node 508 and the metadata is not found, worker node 505 may submit the request to master node 502 directly rather than having the request forwarded by worker node 508.

Additionally or alternatively, no tree structure may be used. Each worker node may maintain a list or directory of other worker nodes. If additional metadata is needed, the worker node may iterate through this list and make calls to the other worker nodes. The master node may only be called once the list is exhausted without locating the additional metadata. The requests may be sent to the nodes on the list one at a time, or a request may be sent to all the nodes at the same time.

In some embodiments, requests for additional metadata may be transmitted throughout the system as a multicast request. In such an embodiment, a request may only be made to the master node if no other node responds within a defined time frame.

Turning now to FIG. 6, a method for locating additional metadata using a tree structure is discussed. At block 600, a query is received at a master node. The master node may comprise a database catalog that includes metadata defining database objects. The master node may be the root node in a tree structure, and in an embodiment may be substantially similar to master node 502.

At 602 a plurality of query plans may be derived from the query, and at 604 these plans may be distributed to a plurality of worker nodes. In an embodiment, the worker nodes may be similar to worker nodes 504, 505, 506, 508, and 510. Query metadata may be distributed with the plans, where the query metadata includes metadata necessary for executing the plans.

At 606, one or more of the worker nodes may determine they need additional metadata to execute the query plan, and at 608 this worker node may query a parent for the additional metadata. In an embodiment, this parent node may be another worker node, and may comprise a metadata cache. This metadata cache may be substantially similar to the cache discussed in reference to FIG. 3.

At 610, the cache on the parent node is checked for the metadata. If the metadata is found, it may be transmitted to the worker node making the request. If the metadata is not found, successive ancestor nodes may be queried at 612 until the additional metadata is found in a parent worker node's cache, or the master node is reached.

For the sake of clarity, the processes and methods herein have been illustrated with a specific flow, but it should be understood that other sequences may be possible and that some may be performed in parallel, without departing from the spirit of the invention. Additionally, steps may be subdivided or combined. As disclosed herein, software written in accordance with the present invention may be stored in some form of computer-readable medium, such as memory or CD-ROM, or transmitted over a network, and executed by a processor.

All references cited herein are intended to be incorporated by reference. Although the present invention has been described above in terms of specific embodiments, it is anticipated that alterations and modifications to this invention will no doubt become apparent to those skilled in the art and may be practiced within the scope and equivalents of the appended claims. More than one computer may be used, such as by using multiple computers in a parallel or load-sharing arrangement or distributing tasks across multiple computers such that, as a whole, they perform the functions of the components identified herein; i.e. they take the place of a single computer. Various functions described above may be performed by a single process or groups of processes, on a single computer or distributed over several computers. Processes may invoke other processes to handle certain tasks. A single storage device may be used, or several may be used to take the place of a single storage device. The disclosed embodiments are illustrative and not restrictive, and the invention is not to be limited to the details given herein. There are many alternative ways of implementing the invention. It is therefore intended that the disclosure and following claims be interpreted as covering all such alterations and modifications as fall within the true spirit and scope of the invention.

Claims

1. A method, comprising: receiving a query at a master node, the master node having access to a database catalog that comprises metadata defining database objects;in response to receiving the query, initiating a catalog server session, taking a snapshot of the metadata, and associating the snapshot of the metadata with the catalog server session, wherein a separate catalog server session is initiated and a separate snapshot of the metadata is taken for a separate query that is not the same as the query received at the master node;communicating a query plan and query metadata to a worker node, wherein the query plan is generated based at least in part on the query, the query metadata includes metadata to be used in connection with execution of the query plan, the query metadata is obtained based at least in part on the snapshot of the metadata associated with the catalog server session, and the query metadata is communicated to the worker node contemporaneous with respect to the query plan;in response to receiving the query plan and the query metadata, determining, by the worker node, that additional metadata is required for the worker node to execute the query plan;in response to determining that additional metadata is required, requesting, by the worker node, the additional metadata, wherein the worker node queries a parent in a tree structure of a plurality of worker nodes in a parallel processing database system for the additional metadata, and the parent node is a node between the master node and the worker node in relation to the tree structure;receiving the additional metadata from another worker node, wherein the additional metadata is retrieved from a same session as the catalog server session corresponding to the query;executing the query plan on the worker node; andreturning, to the master node, a result associated with the execution of the query plan on the worker node.
2. The method of claim 1, wherein the query metadata includes database table definitions that define database objects.
3. The method of claim 1, further comprising generating a plurality of query plans based at least in part on the query, the plurality of query plans comprising the query plan that is communicated to the worker node.
4. The method of claim 3, further comprising transmitting the plurality of query plans to the plurality of worker nodes.
5. The method of claim 4, further comprising executing the plurality of query plans in parallel.
6. The method of claim 1, further comprising storing the query metadata in a cache on the worker node.
7. The method of claim 6, further comprising clearing the cache after executing the query plan.
8. The method of claim 6, further comprising retrieving the query metadata from the cache while executing the query plan.
9. The method of claim 8, wherein the querying a parent in the tree structure of the plurality of worker nodes comprises successively querying one or more parent nodes for the additional metadata before querying the master node for the additional metadata.
10. The method of claim 9, wherein the successively querying one or more parent nodes for the additional metadata before querying the master node for the additional metadata comprises at least one of the one or more parent nodes forwarding a request for the additional metadata to another of the one or more parent nodes.
11. The method of claim 8, wherein the query metadata includes one or more of a user defined database function, a system defined database function, a database view, and a database index.
12. The method of claim 8, further comprising: receiving another query at the master node;in response to receiving the other query, initiating another catalog server session, taking, another snapshot of the metadata as the metadata existed when the other catalog server session is initiated, and associating the other snapshot of the metadata with the other catalog server session; andtransmitting another query plan based on the other query and other query metadata to the worker node, wherein the other query metadata is retrieved from the other snapshot of the metadata associated with the other catalog server session.
13. The method of claim 8, further comprising: compiling the result associated with the execution of the query plan on the worker node with another result associated with the query; andreturning the compiled result and other result as a final query result to a client.
14. The method of claim 8, wherein the query plan comprises the query metadata.
15. The method of claim 8, wherein the request for the additional metadata by the worker node is transmitted is a multicast request.
16. The method of claim 8, wherein the worker node maintains a list of other worker nodes, and the another worker node is selected from the list.
17. A computer program product for executing queries in a parallel processing database system, comprising a non-transitory computer readable medium having program instructions embodied therein for: receiving a query at a master node, the master node having access to a database catalog that comprises metadata defining database objects;in response to receiving the query, initiating a catalog server session, taking a snapshot of the metadata, and associating the snapshot of the metadata with the catalog server session, wherein a separate catalog server session is initiated and a separate snapshot of the metadata is taken for a separate query that is not the same as the query received at the master node;communicating a query plan and query metadata to a worker node, wherein the query plan is generated based at least in part on the query, the query metadata includes metadata to be used in connection with execution of the query plan, the query metadata is obtained based at least in part on the snapshot of the metadata associated with the catalog server session, and the query metadata is communicated to the worker node contemporaneous with respect to the query plan;in response to receiving the query plan and the query metadata, determining, by the worker node, that additional metadata is required for the worker node to execute the query plan;in response to determining that additional metadata is required, requesting, by the worker node, the additional metadata, wherein the worker node queries a parent in a tree structure of a plurality of worker nodes in a parallel processing database system for the additional metadata, and the parent node is a node between the master node and the worker node in relation to the tree structure;receiving the additional metadata from another worker node, wherein the additional metadata is retrieved from a same session as the catalog server session corresponding to the query;executing the query plan on the worker node; andreturning, to the master node, a result associated with the execution of the query plan on the worker node.
18. A system for executing queries in a parallel processing database, comprising a non-transitory computer readable medium and a processor configured to: receive a query at a master node, the master node having access to a database catalog that comprises metadata defining database objects;in response to receiving the query, initiate a catalog server session, taking a snapshot of the metadata, and associating the snapshot of the metadata with the catalog server session, wherein a separate catalog server session is initiated and a separate snapshot of the metadata is taken for a separate query that is not the same as the query received at the master node;communicate a query plan and query metadata to a worker node, wherein the query plan is generated based at least in part on the query, the query metadata includes metadata to be used in connection with execution of the query plan, the query metadata is obtained based at least in part on the snapshot of the metadata associated with the catalog server session, and the query metadata is communicated to the worker node contemporaneous with respect to the query plan;in response to receiving the query plan and the query metadata, determining, by the worker node, that additional metadata is required for the worker node to execute the query plan;in response to determining that additional metadata is required, requesting, by the worker node, the additional metadata, wherein the worker node queries a parent in a tree structure of a plurality of worker nodes in a parallel processing database system for the additional metadata, and the parent node is a node between the master node and the worker node in relation to the tree structure;receiving the additional metadata from another worker node, wherein the additional metadata is retrieved from a same session as the catalog server session corresponding to the query;executing the query plan on the worker node; andreturning, to the master node, a result associated with the execution of the query plan on the worker node.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. patent application Ser. No. 13/838,955, entitled PROCESSING A DATABASE QUERY USING A SHARED METADATA STORE filed Mar. 15, 2013 which is incorporated herein by reference for all purposes, which claims priority to U.S. Provisional Application No. 61/769,043, entitled INTEGRATION OF MASSIVELY PARALLEL PROCESSING WITH A DATA INTENSIVE SOFTWARE FRAMEWORK filed Feb. 25, 2013 which is incorporated herein by reference for all purposes.

US Referenced Citations (104)

Number	Name	Date	Kind
5454102	Tang	Sep 1995	A
5857180	Hallmark	Jan 1999	A
6219692	Stiles	Apr 2001	B1
6363375	Hoshino	Mar 2002	B1
6678695	Bonneau	Jan 2004	B1
6928451	Mogi	Aug 2005	B2
7051034	Ghosh	May 2006	B1
7072934	Helgeson	Jul 2006	B2
7373350	Arone	May 2008	B1
7406461	Chapman	Jul 2008	B1
7447786	Loaiza	Nov 2008	B2
7487405	Surlaker	Feb 2009	B1
7693826	Chapman	Apr 2010	B1
7849073	Young-Lai	Dec 2010	B2
7873650	Chapman	Jan 2011	B1
7877379	Waingold	Jan 2011	B2
7984043	Waas	Jul 2011	B1
8051052	Jogand-Coulomb	Nov 2011	B2
8060522	Birdwell	Nov 2011	B2
8171018	Zane	May 2012	B2
8195705	Calvignac	Jun 2012	B2
8209697	Kobayashi	Jun 2012	B2
8239417	Gu	Aug 2012	B2
8370394	Atta	Feb 2013	B2
8640137	Bostic	Jan 2014	B1
8788464	Lola	Jul 2014	B1
8832078	Annapragada	Sep 2014	B2
8886631	Abadi	Nov 2014	B2
8935232	Abadi	Jan 2015	B2
9002813	Gruschko	Apr 2015	B2
9002824	Sherry	Apr 2015	B1
9177008	Sherry	Nov 2015	B1
9229979	Shankar	Jan 2016	B2
9262479	Deshmukh	Feb 2016	B2
10242052	Mandre	Mar 2019	B2
20030037160	Wall	Feb 2003	A1
20030145047	Upton	Jul 2003	A1
20030200218	Tijare	Oct 2003	A1
20030204427	Gune	Oct 2003	A1
20030208458	Dettinger	Nov 2003	A1
20030212668	Hinshaw	Nov 2003	A1
20030229627	Carlson	Dec 2003	A1
20030229639	Carlson	Dec 2003	A1
20030229640	Carlson	Dec 2003	A1
20040039729	Boger	Feb 2004	A1
20040103087	Mukherjee	May 2004	A1
20040128290	Haas	Jul 2004	A1
20040177319	Horn	Sep 2004	A1
20040215626	Colossi	Oct 2004	A1
20050193035	Byrne	Sep 2005	A1
20050209988	Cunningham	Sep 2005	A1
20050278290	Bruce	Dec 2005	A1
20060004691	Sifry	Jan 2006	A1
20060149799	Wong	Jul 2006	A1
20060248045	Toledano	Nov 2006	A1
20070022100	Kitsuregawa	Jan 2007	A1
20070094269	Mikesell	Apr 2007	A1
20070203893	Krinsky	Aug 2007	A1
20080016080	Korn	Jan 2008	A1
20080027920	Schipunov	Jan 2008	A1
20080178166	Hunter	Jul 2008	A1
20090019007	Niina	Jan 2009	A1
20090100147	Igarashi	Apr 2009	A1
20090254916	Bose	Oct 2009	A1
20090327242	Brown	Dec 2009	A1
20100094716	Ganesan	Apr 2010	A1
20100198855	Ranganathan	Aug 2010	A1
20110041006	Fowler	Feb 2011	A1
20110060732	Bonneau	Mar 2011	A1
20110209007	Feng	Aug 2011	A1
20110228668	Pillai	Sep 2011	A1
20110302151	Abadi	Dec 2011	A1
20110302226	Abadi	Dec 2011	A1
20110302583	Abadi	Dec 2011	A1
20120011098	Yamada	Jan 2012	A1
20120030220	Edwards	Feb 2012	A1
20120117120	Jacobson	May 2012	A1
20120166390	Merriman	Jun 2012	A1
20120166417	Chandramouli	Jun 2012	A1
20120191699	George	Jul 2012	A1
20120203765	Ackerman	Aug 2012	A1
20120254215	Miyata	Oct 2012	A1
20120303669	Chmiel	Nov 2012	A1
20120310916	Abadi	Dec 2012	A1
20130041872	Aizman	Feb 2013	A1
20130086039	Salch	Apr 2013	A1
20130144878	James	Jun 2013	A1
20130166588	Gruschko	Jun 2013	A1
20130166589	Baeumges	Jun 2013	A1
20130179474	Charlet	Jul 2013	A1
20130282650	Zhang	Oct 2013	A1
20130297646	Watari	Nov 2013	A1
20130326215	Leggette	Dec 2013	A1
20130332478	Bornea	Dec 2013	A1
20140032528	Mandre	Jan 2014	A1
20140067792	Erdogan	Mar 2014	A1
20140108861	Abadi	Apr 2014	A1
20140114952	Robinson	Apr 2014	A1
20140114994	Lindblad	Apr 2014	A1
20140156636	Bellamkonda	Jun 2014	A1
20140188825	Muthukkaruppan	Jul 2014	A1
20140196115	Pelykh	Jul 2014	A1
20140280032	Kornacker	Sep 2014	A1
20150168591	Beekman	Jun 2015	A1

Non-Patent Literature Citations (25)

Entry
“Greenplum Database 4.1 Administrator Guide”, 2011 (1 of 3).
“Greenplum Database 4.1 Administrator Guide”, 2011 (2 of 3).
“Greenplum Database 4.1 Administrator Guide”, 2011 (3 of 3).
“Greenplum Database: Critical Mass Innovation”, 2010.
“Parallel Processing & Parallel Database”, 1997, Oracle.
Abouzeid et al., “HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads”, 2009, ACM.
B. Hedlund, “Understanding Hadoop Clusters and the Network”, 2011, bradhedlund.com/2011/09/10/understanding-hadoop-clusters and the network.
Borthakur et al., 'Apache Hadoop Goes Realtime at Facebook', SIGMOD '11 Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 1071-1080.
C. Zhang, “Enhancing Data Processing on Clouds with Hadoop/HBase”, 2011, University of Waterloo, Waterloo, Ontario,Canada, 2011. www.uwspace.uwaterloo.ca/handle/10012/6361.
Hsu et al., “A Cloud Computing Implementation of XML Indexing Method Using Hadoop”, 2012, Springer-Verlag.
Jin et al., “Design of a Trusted File System Based on Hadoop”, Jun. 2012, presented at the International Conference, ISCTCS2012, Springer, pp. 673-680. (1 of 5).
Jin et al., “Design of a Trusted File System Based on Hadoop”, Jun. 2012, presented at the International Conference, ISCTCS 2012, Springer, pp. 673-680. (2 of 5).
Jin et al., “Design of a Trusted File System Based on Hadoop”, Jun. 2012, presented at the International Conference, ISCTCS 2012, Springer, pp. 673-680. (3 of 5).
Jin et al., “Design of a Trusted File System Based on Hadoop”, Jun. 2012, presented at the International Conference, ISCTCS 2012, Springer, pp. 673-680. (4 of 5).
Jin et al., “Design of a Trusted File System Based on Hadoop”, Jun. 2012, presented at the International Conference, ISCTCS 2012, Springer, pp. 673-680. (5 of 5).
K. Elmeleegy, “Piranha: Optimizing Short Jobs in Hadoop”, Aug. 30, 2013, Proceedings of the VLDB Endowment.
Nguyen et al., “A MapReduce Workflow System for Architecting Scientific Data Intensive Applications”, 2011, ACM.
Shafer et al., “The Hadoop Distributed Filesystem: Balancing Portability and Performance”, 2010, IEEE.
Shvachko et al., ‘The Hadoop Distributed File System’, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 1-10.
Wang et al. “Hadoop High Availability through Metadata Replication”, 2009, ACM.
Zaharia et al., “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing”, 2011, Princeton, cs.princeton.edu.
Zhao et al., “Research of P2P Architecture based on Cloud Computing”, 2010, IEEE.
Friedman et al., “SQL/Map Reduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions”, 2009, ACM. (Year: 2009).
Hunt et al., “ZooKeeper: Wait-free coordination for Internet-scale systems”, 2010, USENIX. (Year: 2010).
Krishnamurthy et al., “Early Measurements of a Cluster-based Architecture for P2P Systems”, 2001, ACM. (Year: 2001).

Related Publications (1)

	Number	Date	Country
	20190005093 A1	Jan 2019	US

Provisional Applications (1)

	Number	Date	Country
	61769043	Feb 2013	US

Continuations (1)

	Number	Date	Country
Parent	13838955	Mar 2013	US
Child	16123981		US

Processing a database query using a shared metadata store

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

CPC

International Classifications

Disclaimer

Abstract