Pluggable storage system for parallel query engines

Description

FIELD OF THE INVENTION

This invention relates generally to databases, and more particularly to systems and methods for managing datasets in databases.

BACKGROUND OF THE INVENTION

With the large amounts of data generated in recent years, data mining and machine learning are playing an increasingly important role in today's computing environment. For example, businesses may utilize either data mining or machine learning to predict the behavior of users. This predicted behavior may then be used by businesses to determine which plan to proceed with, or how to grow the business.

The data used in data mining and analytics is typically not stored in a uniform data storage system. Many data storage systems utilize different file systems, and those different file systems are typically not compatible with each other. Further, the data may reside in geographically diverse locations.

One conventional method to performing data analytics across different databases includes copying data from one databatase to a central database, and performing the data analytics on the central database. However, this results in an inefficient use of storage space, and creates issues with data consistency between the two databases.

There is a need, therefore, for an improved method, article of manufacture, and apparatus for managing data.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:

FIG. 1 illustrates a database system in accordance with some embodiments.

FIG. 2 is a flowchart of a method to manage data in accordance with some embodiments.

FIG. 3 is a flowchart of a method to manage data in accordance with some embodiments.

DETAILED DESCRIPTION

A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. While the invention is described in conjunction with such embodiment(s), it should be understood that the invention is not limited to any one embodiment. On the contrary, the scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example, and the present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.

It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium such as a computer readable storage medium or a computer network wherein computer program instructions are sent over optical or electronic communication links. Applications may take the form of software executing on a general purpose computer or be hardwired or hard coded in hardware. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.

An embodiment of the invention will be described with reference to a data storage system in the form of a storage system configured to store files, but it should be understood that the principles of the invention are not limited to this configuration. Rather, they are applicable to any system capable of storing and handling various types of objects, in analog, digital, or other form. Although terms such as document, file, object, etc. may be used by way of example, the principles of the invention are not limited to any particular form of representing and storing data or other information; rather, they are equally applicable to any object capable of representing information.

FIG. 1 illustrates a database system in accordance with some embodiments. Client 100 connects to Universal Namenode 102 when attempting to access data. Universal Namenode 102 access Catalog Service 104 to determine where the data is, and what type of file system the data resides in. Universal Namenode 102 may natively understand the hadoop file system (HDFS), and may readily communicate to HDFS Namenode 106. HDFS Namenode 106 is connected to Datanodes 108. Universal Namenode 102 may have several Universal Protocols 110, which allow Universal Namenode 102 to access storage systems utilizing different File Systems 112. File Systems 112 may be a diverse set of storage systems, which may utilize different file systems, such as the HDFS and NFS, among others.

Catalog 104, in some embodiments, may be a table that includes a file name and file location. For example, a simple table may include:

File A
nfs://a/b/File A

File B
hdfs://xyz/File B

This may be stored as a text file, a spreadsheet file, or any other data object capable of storing data in tabular form.

In some embodiments, each datanode, Hadoop datanode or otherwise, also includes a data node job tracker (not shown in FIG. 1). Data node job trackers track the jobs that are run as part of a query on their specific data node. These data node job trackers report back to a universal job tracker in the Universal Namenode 102 (not shown in FIG. 1). The universal job tracker tracks the status of all jobs for a given query received from Client 100. For example, suppose a client wants to sort all word files by creation date. To the client, all the word files may look as if they were stored in one location. However, unbeknownst to the client, the word files are scattered across different storage systems. Some word files which are accessed frequently may be stored in a high speed storage system. Some word files which are rarely accessed may be stored on a slower storage system. Suppose there are three storage systems: A, B, and C. The word files reside in storage systems A and B. The universal namenode would receive the client's sort query, and then ask the catalog to determine the location of the word files. Having determined that the word files are in A and B, the universal namenode would send out the jobs to the appropriate storage systems. Storage systems A and B would each run a sort job, and each would have a job tracker to keep track of the progress of the job.

By utilizing a Universal Node 102, Client 100 has a unified view across all data sources from a single namespace. In some embodiments, this namespace may be uss://. This is also helpful if Client 100 wants to perform Hadoop jobs on data that is not stored in HDFS. Instead of copying data from a non-HDFS to a HDFS storage system and running the Hadoop job, the data can remain on their respective storage systems, and the jobs will run on the storage system. The universal protocols allow the universal namenode to connect with different file systems. In some embodiments, the universal protocol may be stored in universal namenode. Following the above example, suppose storage system A runs file system A, and storage system B runs file system B. In order to interact with both file systems, universal namenode may have a protocol plugin A for file system A, and a protocol plugin B for file system B. These two plugins allow the universal namenode to communicate with the two different file systems.

As long as the universal namenode has the correct universal protocol plugin, any type of file system may be added to the system. Following the above example, suppose a storage system D with file system D was added. As long as the universal namenode has a universal protocol plugin for file system D, the storage system D can be added and used.

FIG. 2 is a flowchart of a method to manage data in accordance with some embodiments. In step 200, a query is received from a client. In step 202, a catalog is analyzed for location information based on the received query. In step 204, a first storage system, an associated first filing system, and an associated first protocol are determined based on the analysis. In step 206, the first protocol is used to communicate with the first storage system. In step 208, at least a portion of the query is performed on the first storage system.

Having a diverse array of storage systems allows for a system with multiple tiers of file storage. Although the client only sees one namespace (the universal namenode), many namespaces may reside under the universal namenode. These different namespaces may correspond to different types of storage systems—some with very high performance file systems, and some with low performance file systems. In some embodiments, it may be preferable to have multiple tiers of storage systems. For example, frequently accessed files may be stored on high performance file systems. Less frequently accessed files may be stored on file systems that are more optimized for storage and less for performance.

The level of activity may change for files. Frequently accessed files may be less frequently accessed, and vice versa. For example, a Q2 end report might be accessed very frequently during Q2 and Q3, but the report might not be accessed at all in Q4. In such cases, it may be preferable to move the file from one higher tier to a lower tier. With the universal namenode and catalog, moving the file from one tier to another is transparent to the client. Once the file has been moved, the catalog changes the location of the file. Previously, the location for the file may have been high_file_system://FileA. After the move, the location for the file may be low_file_system://FileA. The catalog only changes the location entry for the file. No other changes are necessary. The next time the client wants to access the file, the client will still use uss://FileA (the universal namespace), but the universal namenode will look at the catalog and determine that FileA is in the low_file_system namespace. The client does not need to keep track of which namespace the file is in.

In some embodiments, it may be preferable to copy some of the data from one storage system to another, even though the copy is not necessary to perform the query. For example, suppose storage system A and storage system B have some data that is required to run a query. Storage system A is connected via a high speed network connection and is also a high speed storage device. Storage system B is connected via a slower network connection, and is also a slower storage device. If the client wanted to perform the query as fast as possible, in may be preferable to temporarily copy some of the data on storage system B to storage system A. After the query has finished, the copied data may be removed from storage system A.

The usage of files may also be used to determine when and where to move data. For example, suppose File 1 is always accessed at 1 pm every Tuesday. Otherwise, it is never used. In some embodiments, this may constitute an inactive file, so File 1 is stored in a low performance storage system. However, File 1 may also be very large. When it is accessed at 1 pm every Tuesday, it takes a significant amount of time for the query to finish. With this statistic, it may be preferable to move File 1 to a high performance storage system at 12:30 pm every Tuesday, and move the file back to the low performance storage system after the query is complete. After the move, the catalog updates the location with the new location, and the universal namenode will now point to the new location. Similarly, after the query is complete, the catalog updates the location with the original location, and the universal namenode will now point to the original location. Since the client doesn't have to keep track of where the file is (e.g. what namespace to use), it makes no difference to the client running the query whether or not the file is moved.

FIG. 3 is a flowchart of a method to manage data in accordance with some embodiments. In step 300, the usage level of a file is determined, wherein the file is stored in a first storage system. In step 302, the file is moved to a second storage system based on the determined usage level of the file. In step 304, location information in a catalog is updated based on the movement of the file. In step 306, at least a portion of the query is performed after updating location information in the catalog.

For the sake of clarity, the processes and methods herein have been illustrated with a specific flow, but it should be understood that other sequences may be possible and that some may be performed in parallel, without departing from the spirit of the invention. Further, though the techniques herein teach creating one SwR sample in parallel, those with ordinary skill in the art will readily appreciate that the techniques are easily extendable to generate many SwR samples. Additionally, steps may be subdivided or combined. As disclosed herein, software written in accordance with the present invention may be stored in some form of computer-readable medium, such as memory or CD-ROM, or transmitted over a network, and executed by a processor.

All references cited herein are intended to be incorporated by reference. Although the present invention has been described above in terms of specific embodiments, it is anticipated that alterations and modifications to this invention will no doubt become apparent to those skilled in the art and may be practiced within the scope and equivalents of the appended claims. More than one computer may be used, such as by using multiple computers in a parallel or load-sharing arrangement or distributing tasks across multiple computers such that, as a whole, they perform the functions of the components identified herein; i.e. they take the place of a single computer. Various functions described above may be performed by a single process or groups of processes, on a single computer or distributed over several computers. Processes may invoke other processes to handle certain tasks. A single storage device may be used, or several may be used to take the place of a single storage device. The disclosed embodiments are illustrative and not restrictive, and the invention is not to be limited to the details given herein. There are many alternative ways of implementing the invention. It is therefore intended that the disclosure and following claims be interpreted as covering all such alterations and modifications as fall within the true spirit and scope of the invention.

Claims

1. A method for managing data, comprising: receiving, by a universal namenode, a query from a client;based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query;based at least in part on a result of the search of the catalog, determining to move at least one of one or more files from a second storage system to a first storage system, and determining to communicate with the first storage system in connection with the one or more files responsive to the queries;in response to a determination to move the at least one of the one or more files from the second storage system to the first storage system, moving the at least one of the one or more of files to the first storage system;performing at least a portion of the query on the first storage system;providing, to the client, results of the query such that in response to a determination that various portions of the results correspond to query results stored on a set of a plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace, wherein the plurality of storage systems comprises at least the first storage system and the second storage system; andin response to a determination that performing the query has finished, moving the at least one of the one or more files from the first storage system to the second storage system.
2. The method of claim 1, further comprising determining a first protocol for communication with the storage system, wherein the universal namenode communicates with the first storage system using the first protocol.
3. The method of claim 1, wherein the catalog stores a mapping of a plurality of files stored on a plurality of storage systems to a location at which the plurality of files are respectively stored on the corresponding plurality of storage systems.
4. The method of claim 1, wherein a first protocol for communication with the first storage system is determined, based at least in part, on a first file system associated with the first storage system.
5. The method of claim 1, further comprising: determining whether to move a first file that is stored on the first storage system to another storage system based at least in part on a frequency with which the first file is accessed;in response to a determination that the first file is determined to be moved from the first storage system to the other storage system, moving the first file to the other storage system; andupdating location information associated with the first file stored in the catalog to point from a location at which the first file was stored in the first storage system to a location at which the first file is stored in the other storage system, wherein, from the perspective of the client, the a location of the first file is the same in relation to the singe namespace regardless of whether the first file is moved from the first storage system to the other storage system.
6. The method of claim 5, wherein the updating of the location information comprises updating the mapping of the one or more files stored on the plurality of storage systems to the location at which the one or more files so as to update a record corresponding to the first file.
7. The method of claim 1, wherein the query received from the client is a query to identify one or more files stored across a distributed storage system comprising the plurality of storage systems.
8. The method of claim 1, wherein the determining to move the at least one of the one or more files from the second storage system to the first storage system based at least in part on the search of the catalog comprises: determining to move the at least one of the one or more files in response to the receiving the query from the client.
9. The method of claim 1, further comprising: tracking, by a universal job tracker of the universal namenode, a status of one or more portions of the query being respectively performed by corresponding ones of the plurality of storage systems.
10. The method of claim 1, further comprising the first storage system reporting back the results of at least a portion of the query.
11. The method of claim 1, further comprising adding a new storage system.
12. The method of claim 11, further comprising determining a new protocol for communication with the new storage system.
13. The method of claim 12, wherein a new file system associated with the new storage system is the same as a first file system associated with the first storage system.
14. The method of claim 12, wherein a new file system associated with the new storage system is not the same as a first file system associated with the first storage system.
15. The method of claim 12, further comprising updating the catalog based on data stored in the new storage system.
16. The method of claim 1, further comprising determining to communicate with the second storage system in connection with the one or more files responsive to the queries, and determining a second protocol for communication with the second storage system.
17. The method of claim 1, wherein the determining to move at least one of the one or more files is further based at least in part on information pertaining to a historical usage of files stored among the plurality of storage systems.
18. The method of claim 1, wherein the universal namenode has one or more protocols configured to connect with the plurality of storage systems.
19. The method of claim 1, wherein the moving the at least the one of the one or more files from the second storage system to the first storage system is transparent to the client.
20. A system for managing data, comprising one or more processors configured to: receive a query from a client;based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query;based at least in part on a result of the search of the catalog, determine to move at least one of the one or more files from a second storage system to a first storage system, and determine to communicate with the first storage system in connection with the one or more files responsive to the queries;in response to a determination to move the at least one of the one or more files from the second storage system to the first storage system, move the at least one of the one or more files to the first storage system;perform at least a portion of the query on the first storage system;provide, to the client, results of the query such that in response to a determination that various portions of the results correspond to query results stored on a set of a plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace, wherein the plurality of storage systems comprises at least the first storage system and the second storage system; andin response to a determination that performing the query has finished, move the at least one of the one or more files from the first storage system to the second storage system.
21. A computer program product for processing data, comprising a non-transitory computer readable medium having program instructions embodied therein for: receiving a query from a client;based at least in part on the received query, accessing a catalog service and searching a catalog provided by the catalog service for location information of one or more files responsive to the query;based at least in part on a result of the search of the catalog, determining to move at least one of the one or more files from a second storage system to a first storage system, and determining to communicate with the first storage system in connection with the one or more files responsive to the queries;in response to a determination to move the at least one of the one or more files from the second storage system to the first storage system, moving the at least one of the one or more files to the first storage system;performing at least a portion of the query on the first storage system;providing, to the client, results of the query such that in response to a determination that various portions of the results correspond to query results stored on a set of a plurality of storage systems, the results of the query are presented in a unified view across the set of the plurality of storage systems and appear, from a perspective of the client, to exist from a single namespace, wherein the plurality of storage systems comprises at least the first storage system and the second storage system; andin response to a determination that performing the query has finished, moving the at least one of the one or more files from the first storage system to the second storage system.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of co-pending U.S. patent application Ser. No. 13/842,816, entitled PLUGGABLE STORAGE SYSTEM FOR PARALLEL QUERY ENGINES filed Mar. 15, 2013 which is incorporated herein by reference for all purposes, which claims priority to U.S. Provisional Application No. 61/769,043, entitled INTEGRATION OF MASSIVELY PARALLEL PROCESSING WITH A DATA INTENSIVE SOFTWARE FRAMEWORK filed Feb. 25, 2013 which is incorporated herein by reference for all purposes.

US Referenced Citations (189)

Number	Name	Date	Kind
5191611	Lang	Mar 1993	A
5495607	Pisello	Feb 1996	A
5655116	Kirk	Aug 1997	A
5706514	Bonola	Jan 1998	A
5922030	Shank	Jul 1999	A
6266682	Lamarca	Jul 2001	B1
6269380	Terry	Jul 2001	B1
6564252	Hickman	May 2003	B1
6718372	Bober	Apr 2004	B1
6745385	Lupu	Jun 2004	B1
6907414	Parnell	Jun 2005	B1
6912482	Kaiser	Jun 2005	B2
6947925	Newcombe	Sep 2005	B2
6996582	Daniels	Feb 2006	B2
7035931	Zayas	Apr 2006	B1
7069421	Yates, Jr.	Jun 2006	B1
7177823	Lam	Feb 2007	B2
7194513	Sharif	Mar 2007	B2
7254636	O'Toole, Jr.	Aug 2007	B1
7313512	Traut	Dec 2007	B1
7346751	Prahlad	Mar 2008	B2
7415038	Ullmann	Aug 2008	B2
7475199	Bobbitt	Jan 2009	B1
7493311	Cutsinger	Feb 2009	B1
7574443	Bahar	Aug 2009	B2
7593938	Lemar	Sep 2009	B2
7613947	Coatney	Nov 2009	B1
7653699	Colgrove	Jan 2010	B1
7689535	Bernard	Mar 2010	B2
7689609	Lango	Mar 2010	B2
7702625	Peterson	Apr 2010	B2
7716261	Black	May 2010	B2
7720841	Gu	May 2010	B2
7739316	Thompson	Jun 2010	B2
7761678	Bodmer	Jul 2010	B1
7774335	Scofield	Aug 2010	B1
7827201	Gordon	Nov 2010	B1
7949693	Mason	May 2011	B1
7958303	Shuster	Jun 2011	B2
7978544	Bernard	Jul 2011	B2
7984043	Waas	Jul 2011	B1
8010738	Chilton	Aug 2011	B1
8028290	Rymarczyk	Sep 2011	B2
8041735	Lacapra	Oct 2011	B1
8051113	Shekar	Nov 2011	B1
8131739	Wu	Mar 2012	B2
8180813	Goodson	May 2012	B1
8185488	Chakravarty	May 2012	B2
8195769	Miloushev	Jun 2012	B2
8200723	Sears	Jun 2012	B1
8219681	Glade	Jul 2012	B1
8255430	Dutton	Aug 2012	B2
8255550	Becher	Aug 2012	B1
8301822	Pinto	Oct 2012	B2
8312037	Bacthavachalu	Nov 2012	B1
8332526	Kruse	Dec 2012	B2
8352429	Mamidi	Jan 2013	B1
8392400	Ransil	Mar 2013	B1
8417681	Miloushev	Apr 2013	B1
8452821	Shankar	May 2013	B2
8484259	Makkar	Jul 2013	B1
8533183	Hokanson	Sep 2013	B2
8543564	Conrad	Sep 2013	B2
8543596	Kostamaa	Sep 2013	B1
8577911	Stepinski	Nov 2013	B1
8578096	Malige	Nov 2013	B2
8595237	Chaudhary	Nov 2013	B1
8682853	Zane	Mar 2014	B2
8682922	Figueiredo	Mar 2014	B2
8700875	Barron	Apr 2014	B1
8751533	Dhavale	Jun 2014	B1
8762330	Kick	Jun 2014	B1
8825752	Madhavan	Sep 2014	B1
8832154	Srinivasan	Sep 2014	B1
8856286	Barsness	Oct 2014	B2
8971916	Joyce	Mar 2015	B1
9026559	Bernbo	May 2015	B2
9118697	Kishore	Aug 2015	B1
9146766	Shaikh	Sep 2015	B2
9323758	Stacey	Apr 2016	B1
9449007	Wood	Sep 2016	B1
9628438	Hardin	Apr 2017	B2
9674294	Gonthier	Jun 2017	B1
9684571	Modukuri	Jun 2017	B2
9727588	Ostapovicz	Aug 2017	B1
9805053	Tiwari	Oct 2017	B1
9886217	Tsuchiya	Feb 2018	B2
9984083	Tiwari	May 2018	B1
20020002638	Obara	Jan 2002	A1
20020049782	Herzenberg	Apr 2002	A1
20020133810	Giles	Sep 2002	A1
20020146035	Tyndall	Oct 2002	A1
20020156840	Ulrich	Oct 2002	A1
20030126120	Faybishenko	Jul 2003	A1
20030172094	Lauria	Sep 2003	A1
20030191745	Jiang	Oct 2003	A1
20030229637	Baxter	Dec 2003	A1
20040054748	Ackaouy	Mar 2004	A1
20040078467	Grosner	Apr 2004	A1
20040088282	Xu	May 2004	A1
20040098415	Bone	May 2004	A1
20040143571	Bjornson	Jul 2004	A1
20040205143	Uemura	Oct 2004	A1
20050071338	Sugioka	Mar 2005	A1
20050091287	Sedlar	Apr 2005	A1
20050144254	Kameda	Jun 2005	A1
20050165777	Hurst-Hiller	Jul 2005	A1
20050198401	Chron	Sep 2005	A1
20050216788	Mani-Meitav	Sep 2005	A1
20060005188	Vega	Jan 2006	A1
20060010433	Neil	Jan 2006	A1
20060037069	Fisher	Feb 2006	A1
20060080465	Conzola	Apr 2006	A1
20060136653	Traut	Jun 2006	A1
20060146057	Blythe	Jul 2006	A1
20060149793	Kushwah	Jul 2006	A1
20060173751	Schwarze	Aug 2006	A1
20060212457	Pearce	Sep 2006	A1
20060248528	Oney	Nov 2006	A1
20070282951	Selimis	Dec 2007	A1
20080059746	Fisher	Mar 2008	A1
20080172281	Probst	Jul 2008	A1
20080281802	Peterson	Nov 2008	A1
20080313183	Cunningham	Dec 2008	A1
20080320151	McCanne	Dec 2008	A1
20090007105	Fries	Jan 2009	A1
20090007161	Sheehan	Jan 2009	A1
20090089344	Brown	Apr 2009	A1
20090106255	Lacapra	Apr 2009	A1
20090132609	Barsness	May 2009	A1
20090210431	Marinkovic	Aug 2009	A1
20090222569	Frick	Sep 2009	A1
20090222896	Ichikawa	Sep 2009	A1
20090254916	Bose	Oct 2009	A1
20090259665	Howe	Oct 2009	A1
20090265400	Pudipeddi	Oct 2009	A1
20090328225	Chambers	Dec 2009	A1
20100036840	Pitts	Feb 2010	A1
20100042655	Tse	Feb 2010	A1
20100145917	Bone	Jun 2010	A1
20100241673	Wu	Sep 2010	A1
20100274772	Samuels	Oct 2010	A1
20100287170	Liu	Nov 2010	A1
20110113052	Hornkvist	May 2011	A1
20110137966	Srinivasan	Jun 2011	A1
20110153662	Stanfill	Jun 2011	A1
20110153697	Nickolov	Jun 2011	A1
20110179250	Matsuzawa	Jul 2011	A1
20110213928	Grube	Sep 2011	A1
20110238814	Pitts	Sep 2011	A1
20110302583	Abadi	Dec 2011	A1
20110313973	Srivas	Dec 2011	A1
20120023145	Brannon	Jan 2012	A1
20120036107	Miloushev	Feb 2012	A1
20120066274	Stephenson	Mar 2012	A1
20120089470	Barnes, Jr.	Apr 2012	A1
20120095952	Archambeau	Apr 2012	A1
20120095992	Cutting	Apr 2012	A1
20120101991	Srivas	Apr 2012	A1
20120166483	Choudhary	Jun 2012	A1
20120185913	Martinez	Jul 2012	A1
20120278471	Labowicz	Nov 2012	A1
20120310916	Abadi	Dec 2012	A1
20120311572	Falls	Dec 2012	A1
20120317388	Driever	Dec 2012	A1
20120323844	Chatley	Dec 2012	A1
20130007091	Rao	Jan 2013	A1
20130036272	Nelson	Feb 2013	A1
20130091094	Nelke	Apr 2013	A1
20130132967	Soundararajan	May 2013	A1
20130151884	Hsu	Jun 2013	A1
20130166543	MacDonald	Jun 2013	A1
20130185735	Farrell	Jul 2013	A1
20130198716	Huang	Aug 2013	A1
20130246347	Sorenson	Sep 2013	A1
20130262443	Leida	Oct 2013	A1
20130275653	Ranade	Oct 2013	A1
20130282650	Zhang	Oct 2013	A1
20140059310	Du	Feb 2014	A1
20140136483	Chaudhary	May 2014	A1
20140136779	Guha	May 2014	A1
20140149392	Wang	May 2014	A1
20140188825	Muthukkaruppan	Jul 2014	A1
20140188845	Ah-Soon	Jul 2014	A1
20140195558	Murthy	Jul 2014	A1
20140201234	Lee	Jul 2014	A1
20140337323	Soep	Nov 2014	A1
20150120711	Liensberger	Apr 2015	A1
20160150019	Klinkner	May 2016	A1

Non-Patent Literature Citations (2)

Entry
Jiong Xie et al., “Improving MapReduce performance through data placement in heterogeneous Hadoop clusters,” 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), Atlanta, GA, 2010, pp. 1-9. (Year: 2021).
Liao et al. Multi-dimensional Index on Hadoop Distributed File System, 2010, Fifth IEEE International Conference on Networking, Architecture, and Storage, pp. 240-249.

Related Publications (1)

	Number	Date	Country
	20180025024 A1	Jan 2018	US

Provisional Applications (1)

	Number	Date	Country
	61769043	Feb 2013	US

Continuations (1)

	Number	Date	Country
Parent	13842816	Mar 2013	US
Child	15714651		US

Pluggable storage system for parallel query engines

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Term Extension

Abstract