This invention relates generally to distributed file systems, and more particularly to systems and methods for accessing distributed file systems using content delivery networks.
Distributed file systems manage files and folders spread across multiple computers. They may serve a similar function as traditional file systems, but are designed to provide file/folder storage and controlled access over local and wide area networks. Some individuals and/or enterprises may rely on distributed file systems to manage their personal and/or organizational data.
There is a need, therefore, for an improved method, article of manufacture, and apparatus for accessing a distributed file system.
The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. While the invention is described in conjunction with such embodiment(s), it should be understood that the invention is not limited to any one embodiment. On the contrary, the scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example, and the present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.
It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium such as a computer readable storage medium or a computer network wherein computer program instructions are sent over optical or electronic communication links. Applications may take the form of software executing on a general purpose computer or be hardwired or hard coded in hardware. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.
An embodiment of the invention will be described with reference to a data storage system in the form of a storage system configured to store files, but it should be understood that the principles of the invention are not limited to this configuration. Rather, they are applicable to any system capable of storing and handling various types of objects, in analog, digital, or other form. Although terms such as document, file, object, etc. may be used by way of example, the principles of the invention are not limited to any particular form of representing and storing data or other information; rather, they are equally applicable to any object capable of representing information.
Embodiments of the present disclosure enable accessing a distributed file system operating in a cloud environment using a content delivery network (“CDN”). Distributed files systems may be used to manage files, folders, and other data spread across multiple computing systems. They may be presented to users, applications, or other clients as traditional file systems, but may actually provide access to data over local and wide area networks. For example, the data could be store in a cloud based object stores, such as Amazon S3, Microsoft Azure, Google Drive, a private object store, and/or a hybrid object store. Access to the data on these object stores may be managed by a metadata server, which could be a local or remote server from the client.
While cloud based object stores may create the appearance of a single object store, the data may actually be physically stored across multiple datacenters that are geographically separate. For example, portions of data may be stored at datacenters in both California and Arizona, while still being part of the same logical object store. A client wishing to access the data may therefore need to access both datacenters. If a client is physically located in California, however, the client may wish to read as much data as possible from the California datacenter to achieve optimal performance. CDNs may help provide these performance benefits.
In an embodiment, a CDN is a distributed system of servers deployed in multiple datacenters in a cloud environment. In the above example, the CDN may comprise servers deployed in both the California and Arizona datacenters. Additionally or alternatively, a service provider unrelated to the cloud service provider may provide CDN servers. In some embodiments, CDN servers operate as both a proxy and a cache in the distributed file system. If a client reads data from an object store, a CDN server may first process the read request. If the data is on the CDN server, it may be returned to the client without accessing the remote datacenter. If the data is not on the CDN server, the request may be forwarded to the remote data center and the data may be returned through the CDN to the client. In an embodiment, when the data is returned to the client it may also be stored on the CDN server and/or a local datacenter associated with the CDN for future access. Similarly, a client may attempt to write data to the CDN. The CDN may dynamically determine the optimal datacenter for the client, and forward the write request to that datacenter.
Client 100 may be any general purpose computing device. For example, client 100 may be a personal computer, workstation, handheld computer, smart phone, and/or tablet computer. Additionally or alternatively, client 100 may be a software module or application running on a general purpose computing device. Client 100 may be in communication with a MDS 102 and object store 104 over a network connection, such as a local area network (“LAN”) or wide are network (“WAN”), or via any other form of communication. Client computer 100 may interact with the distributed file system as it would with a traditional file system, such as by writing data to and reading data from the distributed file system.
MDS 102 may be a general purpose computing device managing distributed file system metadata. This metadata could include, for example, the location of data stored in the distributed file system. MDS 102 may be a physical or a virtual machine, and may operate in an environment local to or remote from client 100. For example, MDS 102 may be a virtual machine operating in the same datacenter as client 100. Additionally or alternatively, MDS 102 may operate in a third party cloud environment, such as Amazon Web Services (“AWS”). In some embodiments, MDS 102 may operate in the same third party cloud environment as object store 104.
Object store 104 may comprise a storage location for storing data in the distributed file system. Object store 104 may be a private, public, or hybrid cloud environment capable of storing data. A private cloud may be an object store only available to clients belonging to a particular enterprise. For example, a private cloud may be a Microsoft Azure install operating in a datacenter completely under the control of an enterprise. The install, including the associated data and services, may not be accessible to anyone outside of the enterprise. A public cloud may be any object store accessible to the public that requires authentication to access certain data. For example, Amazon S3 is available to members of the public but data stored in the object store is only accessible by authorized clients. A hybrid cloud may be a combination of a private and public cloud, such that some data is stored in the private cloud and other data is stored in the public cloud.
In some embodiments, client 100 may transmit communications to and receive responses from MDS 102. Similarly, client 100 may transmit communications to and receive responses from object store 104. Typically these communications will be IO requests and responses, such as read/write communications, though any other type of communication is consistent with the present disclosure.
For example, client 100 may decide to read data from the distributed file system. Client 100 may first mount the distributed file system by transmitting a mount request and/or intent to MDS 102. Similarly, if the distributed file system has already been mounted, client 100 may transmit a change location/directory request to MDS 102. In response, MDS 102 may consult a metadata table to determine data objects located at the root of the mount or in the new location, and transmit information related to the data back to client 100. This data could be, for example, a list of files and/or directories located at the root or new location. The data may also include a unique identifier for each data object, such as a hash and/or path of the object.
Once client 100 has a list of files and/or directories, client 100 may select a data object to read. Client 100 may transmit a read request identifying the desired data object back to MDS 102. In some embodiments, this read request may include a path or hash identifier for the data object the client desires. Once MDS 102 receives the request, it may attempt to locate the data object on the distributed file system.
In an embodiment, MDS 102 maintains location data for all of the data objects in the distributed file system. This location data may be maintained with other data object metadata in a database on MDS 102. For example, the database may comprise a table mapping a data object to one or more object store locations. These object store locations could reside, for example, on object store 104.
In response to the read request received from client 100, MDS 102 may consult the database table to determine the object location. MDS 102 may then return the object location back to client 100. In an embodiment, the object location returned might be a URL the client may use to access all or part of the data object. For example, the URL may comprise “http://<object store domain>/<container identifier>/<object identifier>”, where <object store domain> is the domain of the object store, <container identifier> is an identifier for the distributed file system, and <object identifier> identifies the object to be read. In an embodiment, the object identifier is a hash of the object and/or a hash of a version of the object.
Client 100 may attempt to access the data object once it receives the data object location from MDS 102. If the data object location is a URL, the client may issue an HTTP GET to the URL. For example, the client may issue a GET to object store 104 and/or the cloud service provider holding the data object. In response, object store 104 may return the requested data object to client 100.
The present system may also be used to write data objects to the distributed file system. This process may be similar to reading data objects, as discussed above. Once the distributed file system is mounted and client 100 has identified the file system location where it wishes to write the data, client 100 may transmit a write intent to MDS 102. This write intent may include the identified file system location and an object identifier for the data object client 100 intends to write. In some embodiments, this object identifier may be a hash of the data object.
Upon receiving the intent, MDS 102 may consult a database table to determine if the data object has already been placed in an object store, such as object store 104. If the data object already exists, there is no need to write it to the object store a second time. MDS 102 may perform this check by comparing the provided object identifier to all of the object identifiers in the table. If there is a match, the data object exists. If there is not a match, the data object does not exist.
If the data object already exists in object store 104, client 100 may not need to transmit the data object to the store a second time. Instead, MDS 102 may create a new entry in the table comprising the object identifier and the location client 100 wishes to write the data. MDS 102 may then transmit a write complete notification to client 100, and the write process may terminate. Should client 100 issue a subsequent read for the object, MDS 102 may provide a URL to the data object on object 104 as discussed above. This process provides an inherent form of data deduplication by ensuring a data object is not written to the same object store multiple times.
If MDS 102 determines object store 104 does not have a copy of the data object (i.e. the object identifier is not found in the table), it may create a new entry for the object as discussed above. MDS 102 may additionally provide an object location back to client 100, and associate this object location with the new table entry. In some embodiments the object location is a URL constructed in the same manner as the URL generated during the read process.
Once client 100 receives the object location it may write the data object to that location. If the object location is a URL identifying an object store, such as object store 104, client 100 may write the data to that location using an HTTP POST or PUT. The POST or PUT request may include the data object client 100 wishes to store on object store 104. Client 100 may wait for a confirmation from object store 104 before determining the write was successful.
While the above examples discuss reading and writing data objects as individuals, other configurations may exist. For example, individual data objects may be broken into a set of data chunks. Each of these data chunks may be stored and accessed on the object store in the same manner as the individual data objects discussed above. The data chunks may be uniquely addressable and immutable, meaning they are not changed on the object store once they have been written. When a client wishes to read a data object, the client may submit identifiers for all the data object's constituent chunks to the MDS and receive a URL for each. Similarly, for writes the client may submit identifiers for all the data object's constituent chunks to the MDS. In response, the MDS may only provide write URLs for the chunks that do not already exist on the object store. If the chunks already exist the MDS may simply update the metadata table; there is no need to write the chunks a second time.
Turning now to
A block 200, an IO request may be received at a metadata server. This IO request may be transmitted to the metadata server from a client, and may be a read and/or a write request. The IO request may be for a data object and/or a portion of a data object. In some embodiments, this data object is a file and/or a folder stored on the distributed file system.
At 202, an object identifier may be determined. The object identifier may be any piece of information capable of uniquely identifying the requested object, or a portion thereof, on the distributed file system. For example, it may be a path and/or a hash of the object. In some embodiments, this identifier is provided to the metadata server from the client as part of the IO request.
Block 202 may also determine a CDN domain. The CDN domain may be a domain for the content delivery network, and may be used to access data on the object store through the CDN. In some embodiments, the CDN domain is stored on the MDS and associated with a particular object store and/or an container on that object store. For example, Amazon S3, Microsoft Azure, and a private cloud may be associated with different CDN domains. The MDS may determine which object store holds the requested data object, such as by consulting a metadata table as discussed above, and then identify the CDN domain associated with that object store.
At block 204, a URL may be generated from both the object identifier and the CDN domain. In some embodiments, this URL may be substantially similar to that discussed above. Rather than an object store domain, however, the URL may use the CDN domain discussed in reference to block 202. For example, the URL may be “http://<CDN domain>/<container identifier>/<object identifier>” where CDN domain is the CDN domain associated with the object store. As a result, clients accessing the URL may be directed to the CDN rather than directly to the object store itself. This may provide the benefits of the CDN, as discussed above, to the distributed file system.
Finally, at block 206, the URL may be returned to the client. In some embodiments multiple URLs may be returned to the client, as discussed below. The client may then use the URL to access the object store via the CDN for the desired IO operations.
A block 300, a read request may be received at a MDS. This read request may be received from a client, and in an embodiment identifies one or more data objects, or portions thereof, the client wishes to read.
At block 302, segment identifiers for a plurality of data segments may be identified. A single data object, such as a file, may be divided into a plurality of constituent segments prior to storing the data object to the object store. This division may occur at the client and/or at the MDS. These data segments may each be associated with an identifier, such as a hash, used to read the data segment in the future. When a client wishes to read a data object it may therefore be necessary to identify all the data segments that make up that object. The segment identifiers for the segments may be identified in multiple ways. For example, the segment identifiers may be provided to the MDS from the client, where the segment identifiers collectively make the object identifier. Additionally or alternatively, the MDS may comprise records of segment identifiers mapped to an object identifier and may determine the segment identifiers by consulting those records.
At block 304, a plurality of URLs may be generated using the segment identifiers and a CDN domain. This CDN domain may be determined in a manner substantially similar to that discussed above. In some embodiments, the URLs may be similar to those discussed in reference to
Finally, at block 306 the URLs may be returned to the client. The client may thereafter use these URLs to retrieve the data segments from the CDN and/or the object store, and reconstruct the data object locally.
Turning now to
At 400, a write request may be received at a MDS. This write request may be received from a client, and may indicate the client has data it wishes to write to the distributed file system. In some embodiments, this data may be a new data object, such as a new file, or it may be an update to an existing data object.
At block 402, a plurality of segment identifiers for the data may be determined. In an embodiment, data objects may be divided into multiple data segments, and each segment may be associated with an identifier. The client could, for example, provide these segment identifiers to the MDS. This allows the MDS to process the write request without actually seeing the data to be written to the object store.
At block 404, an additional check is made to determine whether the object store already contains the data segments associated with the segment identifiers. For example, if the write request is an update to an existing file, much of the data may already exist in the object store. The MDS may compare each of the data segment identifiers to a metadata table to see if they are already associated with an object store. If they are, that data segment identifier may be removed from the list of data segment identifiers because it does not need additional processing. If the metadata does not contain a particular data segment identifier, that data segment may need to be written to the object store.
Finally, at block 406, a write URL for each data segment that does not exist in the object store may be generated. In some embodiments the URL comprises both the CDN domain and a segment identifier. The URL could be, for example, “http://<CDN domain>/<container identifier>/<segment identifier>”. Generating URLs only for segments (i.e. chunks) that do not exist in the object store may be particularly helpful when only a portion of a file on the distributed file system is edited. Rather than changing the segment as it already exists, an new segment is stored.
For the sake of clarity, the processes and methods herein have been illustrated with a specific flow, but it should be understood that other sequences may be possible and that some may be performed in parallel, without departing from the spirit of the invention. Additionally, steps may be subdivided or combined. As disclosed herein, software written in accordance with the present invention may be stored in some form of computer-readable medium, such as memory or CD-ROM, or transmitted over a network, and executed by a processor.
All references cited herein are intended to be incorporated by reference. Although the present invention has been described above in terms of specific embodiments, it is anticipated that alterations and modifications to this invention will no doubt become apparent to those skilled in the art and may be practiced within the scope and equivalents of the appended claims. More than one computer may be used, such as by using multiple computers in a parallel or load-sharing arrangement or distributing tasks across multiple computers such that, as a whole, they perform the functions of the components identified herein; i.e. they take the place of a single computer. Various functions described above may be performed by a single process or groups of processes, on a single computer or distributed over several computers. Processes may invoke other processes to handle certain tasks. A single storage device may be used, or several may be used to take the place of a single storage device. The disclosed embodiments are illustrative and not restrictive, and the invention is not to be limited to the details given herein. There are many alternative ways of implementing the invention. It is therefore intended that the disclosure and following claims be interpreted as covering all such alterations and modifications as fall within the true spirit and scope of the invention.
This application is a Continuation of U.S. application Ser. No. 14/671,675 filed Mar. 27, 2015 which claims priority to U.S. Provisional Patent Application 62/088,427, filed Dec. 5, 2014, which applications are incorporated herein by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
6091518 | Anabuki | Jul 2000 | A |
6289345 | Yasue | Sep 2001 | B1 |
6324492 | Rowe | Nov 2001 | B1 |
6434608 | Desai et al. | Aug 2002 | B1 |
6434609 | Humphrey | Aug 2002 | B1 |
6467048 | Olarig | Oct 2002 | B1 |
6658533 | Bogin | Dec 2003 | B1 |
6959339 | Wu | Oct 2005 | B1 |
7533133 | Lanzatella | May 2009 | B1 |
7590803 | Wintergerst | Sep 2009 | B2 |
7685463 | Linnell | Mar 2010 | B1 |
8122102 | Wein | Feb 2012 | B2 |
8224994 | Schneider | Jul 2012 | B1 |
8260913 | Knapp | Sep 2012 | B2 |
8281035 | Farber | Oct 2012 | B2 |
1467174 | Desai et al. | Mar 2015 | A1 |
1467307 | Desai et al. | Mar 2015 | A1 |
8984144 | Schapira | Mar 2015 | B2 |
1486439 | Panghal | Sep 2015 | A1 |
1486533 | Panghal et al. | Sep 2015 | A1 |
9160704 | Wein | Oct 2015 | B2 |
1497842 | Panghal et al. | Dec 2015 | A1 |
1497848 | Panghal et al. | Dec 2015 | A1 |
1497852 | Panghal et al. | Dec 2015 | A1 |
9280683 | Echeverria | Mar 2016 | B1 |
9286331 | Knapp | Mar 2016 | B2 |
1519959 | Desai et al. | Jun 2016 | A1 |
1519961 | Desai et al. | Jun 2016 | A1 |
1519962 | Desai et al. | Jun 2016 | A1 |
9426071 | Caldejon | Aug 2016 | B1 |
1538880 | Javadekar et al. | Dec 2016 | A1 |
9847966 | Wein | Dec 2017 | B2 |
9898477 | Panghal | Feb 2018 | B1 |
10021212 | Desai | Jul 2018 | B1 |
10191914 | Manville | Jan 2019 | B2 |
10353873 | Desai | Jul 2019 | B2 |
10547585 | Wein | Jan 2020 | B2 |
20020124098 | Shaw | Sep 2002 | A1 |
20020138559 | Ulrich | Sep 2002 | A1 |
20020152318 | Menon | Oct 2002 | A1 |
20030005084 | Humphrey | Jan 2003 | A1 |
20040064485 | Yoshida | Apr 2004 | A1 |
20040255048 | Ran et al. | Dec 2004 | A1 |
20050246393 | Coates | Nov 2005 | A1 |
20050262150 | Krishnaswamy | Nov 2005 | A1 |
20060036602 | Unangst | Feb 2006 | A1 |
20060080511 | Hoover | Apr 2006 | A1 |
20060161642 | Bopardikar | Jul 2006 | A1 |
20060200623 | Gonzalez | Sep 2006 | A1 |
20060277196 | Oosawa | Dec 2006 | A1 |
20070094354 | Soltis | Apr 2007 | A1 |
20070168542 | Gupta | Jul 2007 | A1 |
20070171562 | Maejima | Jul 2007 | A1 |
20070174333 | Lee | Jul 2007 | A1 |
20070179981 | Vincent | Aug 2007 | A1 |
20070237086 | Tulac | Oct 2007 | A1 |
20080005468 | Faibish | Jan 2008 | A1 |
20080195483 | Moore | Aug 2008 | A1 |
20090083494 | Bhanoo | Mar 2009 | A1 |
20100042743 | Jeon | Feb 2010 | A1 |
20100070982 | Pitts | Mar 2010 | A1 |
20100106914 | Krishnaprasad | Apr 2010 | A1 |
20100161657 | Cha | Jun 2010 | A1 |
20100293336 | Shribman | Nov 2010 | A1 |
20100299447 | Salvi | Nov 2010 | A1 |
20100332401 | Prahiad | Dec 2010 | A1 |
20100332456 | Prahiad | Dec 2010 | A1 |
20110119437 | Ogus | May 2011 | A1 |
20110145189 | Zheng et al. | Jun 2011 | A1 |
20110258461 | Bates | Oct 2011 | A1 |
20120054152 | Adkins | Mar 2012 | A1 |
20120151016 | Wein | Jun 2012 | A1 |
20120226770 | Schapira | Sep 2012 | A1 |
20120254116 | Thereska | Oct 2012 | A1 |
20120311248 | Goodman | Dec 2012 | A1 |
20130041872 | Aizman | Feb 2013 | A1 |
20130060884 | Bernbo | Mar 2013 | A1 |
20130110906 | Zearing | May 2013 | A1 |
20130179573 | McCarty | Jul 2013 | A1 |
20130226888 | Govind | Aug 2013 | A1 |
20130238752 | Park et al. | Sep 2013 | A1 |
20130290284 | Knapp | Oct 2013 | A1 |
20130297735 | Wein | Nov 2013 | A1 |
20130297969 | Kim | Nov 2013 | A1 |
20140039818 | Arya | Feb 2014 | A1 |
20140040412 | Yanagihara | Feb 2014 | A1 |
20140156822 | Choi | Jun 2014 | A1 |
20140165119 | Liu | Jun 2014 | A1 |
20140189432 | Gokhale | Jul 2014 | A1 |
20140304268 | Gunda | Oct 2014 | A1 |
20140337484 | Kasten | Nov 2014 | A1 |
20140365866 | Kinoshita | Dec 2014 | A1 |
20150161048 | Patil et al. | Jun 2015 | A1 |
20150180968 | Schapira | Jun 2015 | A1 |
20150249709 | Teng et al. | Sep 2015 | A1 |
20150277802 | Oikarinen | Oct 2015 | A1 |
20150280959 | Vincent | Oct 2015 | A1 |
20150350106 | Whalley | Dec 2015 | A1 |
20150364168 | Riley | Dec 2015 | A1 |
20160011816 | Aizman | Jan 2016 | A1 |
20160092354 | Steely | Mar 2016 | A1 |
20160239397 | Thomas | Aug 2016 | A1 |
20160269501 | Usgaonkar | Sep 2016 | A1 |
20160292429 | Manville | Oct 2016 | A1 |
20160337426 | Shribman | Nov 2016 | A1 |
20160357450 | Rao et al. | Dec 2016 | A1 |
20160364407 | Hong | Dec 2016 | A1 |
20170004082 | Sehgal | Jan 2017 | A1 |
20200026689 | Desai et al. | Jan 2020 | A1 |
Entry |
---|
What is Data Deduplication?—Definition from Techopedia, https://www.techopedia.com/definition/1067/data-deduplication (Year: 2014). |
What is Data Deduplication?—Definition from Techopedia, https://www.techopedia.com/definition/13725/inline-deduplication (Year: 2014). |
Number | Date | Country | |
---|---|---|---|
20190340158 A1 | Nov 2019 | US |
Number | Date | Country | |
---|---|---|---|
62088427 | Dec 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16031775 | Jul 2018 | US |
Child | 16511252 | US | |
Parent | 14671675 | Mar 2015 | US |
Child | 16031775 | US |