The present invention generally relates to a method for content delivery in a Content Distribution Network, or CDN, comprising using buckets as logical containers for content files and associating file system meta-data to said buckets, and more particularly to a method further comprising associating content distribution meta-data to the buckets for managing the content delivery through a CDN service.
Next, some definitions are given that are useful for understanding the terminology used for both, the prior art disclosures and the present invention.
PoP: A point-of-presence is an artificial demarcation or interface point between two communication entities. It is an access point to the Internet that houses servers, switches, routers and call aggregators. ISPs typically have multiple PoPs.
Content Delivery Network (CDN): This refers to a system of nodes (or computers) that contain copies of customer content that is stored and placed at various points in a network (or public Internet). When content is replicated at various points in the network, bandwidth is better utilized throughout the network and users have faster access times to content. This way, the origin server that holds the original copy of the content is not a bottleneck.
ISP DNS Resolver: Residential users connect to an ISP. Any request to resolve an address is sent to a DNS resolver maintained by the ISP. The ISP DNS resolver will send the DNS request to one or more DNS servers within the ISP's administrative domain.
URL: Uniform Resource Locator (URL), is the address of a web page on the world-wide web. No two URLs are unique. If they are identical, they point to the same resource.
URL (or HTTP) Redirection: URL redirection is also known as URL forwarding. A page may need redirection if its domain name changed, if creating meaningful aliases for long or frequently changing URLs, if spell errors from the user when typing a domain name, if manipulating visitors etc. In this case, a typical redirection service is one that redirects users to the desired content. A redirection link can be used as a permanent address for content that frequently changes hosts (much like DNS).
ARL (Alternate Resource Locator): ARL is really a URL with CDN specific data embedded. ARL is a subset of URLs and it is used to direct requests to CDN content servers.
Bucket: A bucket is a logical container for a customer that holds the CDN customer's content. A bucket either makes a link between origin server URL and CDN URL or it may contain the content itself (that is uploaded into the bucket at the entry point). An end point will replicate files from the origin server to files in the bucket. Each file in a bucket may be mapped to exactly one file in the origin server. A bucket has several attributes associated with it—time from and time until the content is valid, geo-blocking of content. Mechanisms are also in place to ensure that new versions of the content at the origin server get pushed to the bucket at the endpoints and old versions are removed.
A customer may create as many buckets as he wants. A bucket is really a directory that contains content files. A bucket may contain sub-directories and content files within each of those sub-directories.
Geo-location: It is the identification of real-world geographic location of an Internet connected device. The device may be a computer, mobile device or an appliance that allows for connection to the Internet for an end user. The IP-address geo-location data can include information such as country, region, city, zip code, latitude/longitude of a user.
Consistent Hashing: This method provides hash-table functionality in such a way that adding or removing a slot does not significantly alter the mapping of keys to slots. Consistent hashing is a way of distributing requests among a large and changing population of web servers. The addition of removal of a web server does not significantly alter the load on the other servers.
Using bucket or a container, as an abstraction of a folder to store content is well understood. However, merely using them as folders with access controls is archaic.
In Amazon, an S3 bucket [2], [3] merely serves as a folder that holds content files. A S3 bucket is created in exactly one region (a physical region US, EU or APAC is associated with a bucket). An object stored in a region resides only in that region but may be accessed from anywhere by an end user. Amazon S3 does not copy or move an object to another region.
In Amazon's S3, a bucket created has the following properties (or meta-data) [3], [4]: Bucket Name, Creation date of Bucket, Bucket Location or region where created, Owner name, Owner id, Versioning Status, Total virtual folder in selected bucket, Total number of files in selected bucket, Total size of bucket, Total number of objects.
These properties (or meta-data) are similar to file-system properties under Unix. In addition, the access to a S3 object is guided by policies that must be explicitly defined via an access control list (ACL). The ACL is designed to allow read, write permissions for everyone, authenticated users and for the owner (or creator) of the bucket and objects in the bucket.
Amazon serves content using Amazon cloudfront (Amazon's version of Content Distribution Network, or CDN). In order to serve content from an S3 bucket, the CDN customer creates a bucket, creates a distribution (which is equivalent to getting URLs for the content that need to be served by the CDN). This interaction is via REST API and using the CDN customer's credentials. The cloudfront infrastructure copies the requested content from the S3 bucket to the edge location and serves the content to the requesting end user.
Decisions on geo-blocking, how long the content in a bucket is valid for distribution by cloudfront are implemented as policies. An example of a policy request is: Deny all requests that originate from USA. Policies are evaluated before the request is made to an S3 bucket. Policies together with ACLs control access to objects in cloudfront.
Amazon cloudfront distribution supports only HTTP objects or streaming distributions (RTMP). In addition, RTMP variants (RTMPE, RTMP,T RTMPTE) are also supported. Currently, cloudfront does not support live-streaming of content.
Both Amazon cloudfront and Akamai use validity of content as part of the URL (Akamai refers to it as ARL—Akamai Resource Locator [1] and Cloudfront generates this when creating a distribution). So, the buckets are merely folders with access controls.
Currently, several companies including Amazon [2] and Akamai [1] use the notion of a bucket as a folder to store data. By associating access control over the buckets, they allow for data in a bucket to be shared among a selected group of users or make the bucket public.
When used for content delivery, merely associating access control at the bucket level is insufficient since any content delivery infrastructure needs a lot of additional information about the bucket before it can serve content from a bucket.
At the present time, the state of art for associating meta-data with content is based on one or more of the following concepts: Meta-data is part of the request string itself. While this is useful in quickly resolving the servers that contain the content (since all of a customer's data resides in one bucket), the number of meta-data fields is limited since the resource locator (e.g. Akamai Resource Locator (ARL), a URL to locate CDN content) can be only so long. Further, this scheme is inflexible should the CDN customer want to associate new meta-data with the content or change any of the meta-data (or the CDN administrator has to change the URL for every change in meta-data). It is considered the following example of such an ARL 0: <cdn-endhost>/<customer_id>/<meta-data>/content. Here, content may be the name of the jpg or video file. The meta-data is received from the origin server in response to a request for the content. This implies that a subsequent request has to be made for the content, which requires another level of indirection. For cloudfront, every request to a new edge implies that the edge gets content from the same S3 bucket. Content of an S3 bucket reside only in the region in which the bucket is created. The S3 bucket in Amazon has file-system like meta-data. A meta-data configuration file is created per-bucket. While this is useful, it is not sufficiently granular to address several issues. Two files from the same customer may need to be served at different rates depending on content (High Definition content will need to be served at a higher rate), content from a customer may be served to end users in certain geographic areas of the world and not others. Meta-data configuration files are distributed to CDN content servers. While this may appear to be a simple solution, it has the overhead of every server maintaining several configuration files and synchronizing them to ensure consistency. Again, these configuration files are per-customer.
U.S. Pat. No. 7,240,100 discloses a method for associating metadata to a given piece of content to be delivered through a CDN, said meta-data being located in a metadata configuration file distributed to CDN servers, or in a per-customer metadata configuration file. The meta-data associated by the method of U.S. Pat. No. 7,240,100 is only of a file-system type.
U.S. Pat. No. 7,647,329 and U.S. Pat. No. 7,739,239, disclose storing data as objects within buckets, each of said objects being comprised of a file and optionally any metadata that describes that file. To store an object according to said patents, one must upload the file he wants to store to a bucket. When one uploads a file, he can set permissions on the object as well as any metadata. For each bucket, one can control access to the bucket (who can create, delete, list objects, etc.). U.S. Pat. No. 7,647,329 and U.S. Pat. No. 7,739,239 only disclose associating file-system meta-data.
All the above schemes are very rigid in terms of how they treat meta-data. The meta-data is not sufficiently granular (it is per-customer or per-bucket rather than per-file), cumbersome to work with for a CDN customer and prone to maintenance overhead. Also, the associated meta-data of said disclosures is only of a file-system kind.
It is necessary to offer an alternative to the state of the art which covers the gaps found therein, particularly those related to the rigidity meta-data associated to buckets is treated in the above cited disclosures.
To that end, the present invention provides a method for content delivery in a CDN, comprising using buckets as logical containers for content files, and associating file system meta-data to said buckets, wherein, differently from the known proposals, the method further comprises, in a characteristic manner, associating content distribution meta-data to said buckets, said content distribution meta-data including attributes or properties of specific use for a CDN system, and using said content distribution meta-data for managing the content delivery through a CDN service.
The buckets created and used according to the method of the invention are named in the present description as intelligent buckets.
For an embodiment, the method comprises generating automatically said file-system meta-data when a bucket or a content file in a bucket is created.
While said file-system meta-data are similar to file attributes of any file system (such as an operating system), the content distribution meta-data are attributes or properties of specific use for a CDN system, i.e. they are an inherent property of the buckets and hence, they give such buckets, intelligence for use in a CDN.
Depending on the embodiment, the method comprises associating said file system meta-data and said content distribution meta-data with each file of each bucket, including said content files, and/or with each bucket.
The method comprises, preferably, creating said intelligent buckets with content distribution as the sole application.
In general, the method comprises carrying out said content delivery through said CDN, to an end user, using said associated meta-data to guide said content delivery, thus giving to the CDN customer, by means of associating meta-data for content delivery for a bucket and with each file in a bucket, flexibility in how to treat meta-data for each file.
Other embodiments of the invention are described in appended claims 7 to 17, and in a subsequent section related to the detailed description of several embodiments.
For the present invention, in the service provider's CDN, an intelligent bucket is a logical container for a customer to store data in a CDN.
The previous and other advantages and features will be more fully understood from the following detailed description of embodiments, with reference to the attached drawing, which must be considered in an illustrative and non-limiting manner, in which:
First, a brief description of each component of the service provider's CDN system illustrated by
Entry Point (Publishing point): Any CDN customer may interact with the CDN infrastructure solely via the entry point. The entry point runs a web services interface with users of registered accounts to create/delete and update buckets.
A customer has two options for uploading content. The customer can either upload files into the bucket or give URLs of the content files that reside at the CDN customer's website. Once content is downloaded by the CDN infrastructure, the files are moved to another directory for post-processing. The post-processing steps involve checking the files for consistency and any errors. Only then is the downloaded file moved to the origin server. The origin server contains the master copy of the data.
CDN Manager: The CDN manager hosts the Content Manager API, the DNS API and the Network Topology API (all APIs are on this server). There is one instance of the CDN manager for the entire CDN. The CDN manager may reside at one of the entry points (publishing points) or in a separate server.
End Point: An end point is the entity that manages communication between end users and the CDN infrastructure. It is essentially a custom HTTP server.
Tracker: The tracker is the key entity that enables intelligence and coordination of the CDN infrastructure.
Origin Server This is the server(s) in the CDN infrastructure that contains the master copy of the data. Any end point that does not have a copy of the data can request it from the origin server. The CDN customer does not have access to the origin server. Telefonica's CDN infrastructure moves data from the ftp server to the origin server after performing sanity-checks on the downloaded data.
The service provider's CDN infrastructure uses intelligent bucket as an abstraction for storing and delivery of a CDN customer's content. At the service provider's CDN, all buckets are of type managed. Content for buckets of type managed is controlled entirely by the CDN. For the managed bucket type, the service provider's CDN allows for the creation of two kinds of buckets (although the method is not limited to only said two kinds of buckets): VoD and Live Streaming. Both buckets have associated with them, file-system type meta-data and meta-data that controls content delivery.
A VoD bucket by definition is on demand and may store any kind of content (the format of the file may be of any type). The end points that serve the content in the bucket if the end point can serve the kind of content specified in the file types in the bucket. So, a VoD bucket may serve HTTP objects, RTSP, RTMP, MMS etc. The VoD bucket may also use a variety of delivery mechanisms including RTMP, smooth streaming and iphone streaming. A VoD bucket does not place any restriction on the kind of media file or the delivery mechanism for the file.
A live bucket is created to stream live content. A live bucket may serve any live media stream over any delivery mechanism.
The CDN end points serve the content requested by end users. However, the rest of the CDN infrastructure, the entry point where the intelligent bucket is created and the tracker that synchronizes the bucket meta-data with the entry point periodically only serve as a proxy to the bucket for the CDN infrastructure.
In the service provider's CDN, a CDN customer may interact with the CDN infrastructure only at the entry point. It is first described how to set the properties of a bucket; add content to a bucket and finally, how to associate meta-data with files in a bucket that makes the bucket intelligent for use in content delivery in the CDN infrastructure.
Any CDN customer may create a bucket at the entry point. The bucket has meta-data that is both of file-system type and for content-delivery.
The fields in file system meta-data are: bucket_id, name, enabled, date_created and last_modified, last_accessed. The rest of parameters are associated with content distribution.
The following parameters may be associated with a bucket when a bucket is created. What is listed is the name of each parameter, its type and also if it is needs to be defined at the bucket creation time. A number of fields are optional. Some of the fields may be assigned a default value when a bucket is created.
Having created a bucket, the CDN customer can upload content to the bucket. Next, is disclosed how to define a live bucket and additional fields that may be needed.
If the content in the bucket is live, i.e. is a live streaming bucket, the bucket has additional meta-data associated with it.
All of the other fields like whitelist, blacklist, geoloc, startdate, enddate and authentication may be defined just as in a VoD bucket.
A live bucket is treated differently from a VoD bucket. A live splitter in the CDN infrastructure gets the live stream from the content owner. A segmenter at the live splitter breaks the stream into segments. The live splitter then builds a playlist from these segments and forwards the playlist to the end point. The live splitter periodically updates the playlist. Thus, the end point infers the playlist as content arriving from an origin server in the CDN infrastructure. This content is then served to requesting end users.
Once a bucket is created, its meta-data may be updated/modified via a REST interface.
Next a detailed description of the association of meta-data with files in a VoD bucket is given. Said description is also valid for other embodiments for which the data to be delivered is other different to video data. It is also explained how a CDN customer may add files into a VoD bucket. Every file in a VoD bucket has additional file-level parameters that need to be defined. Next, such fields will be defined.
A customer has two options for uploading content. The customer can either upload files into the bucket or give URLs of the content files that reside at the CDN customer's website. Once content is downloaded by the CDN infrastructure, the files are moved to another directory for post-processing. The post-processing steps involve checking the files for consistency and any errors. Only then is the downloaded file moved to the origin server. The origin server contains the master copy of the data.
At the entry point, once content has been downloaded, a requests.xml file is automatically created. This file has meta-data associated with every file that a CDN customer puts into the bucket. A monitoring process looks for the existence of requests.xml file for the post-processing steps discussed above. A CDN customer can overwrite any of the bucket parameters for a file by calling the file API using a REST interface (or using a user-friendly interface) at the time of uploading a file or at any time thereafter. There are additional file parameters that need to be defined.
Once the monitoring process detects that all files referenced in the xml file are present (and have the right size), the files are moved to another directory for post-processing. This step involves checking the files for consistency and any errors.
When files are uploaded to a bucket, the following meta-data fields inherited from the bucket must also be modified as needed: enabled, startdate, enddate, geoloc, deliverytype, bandwidth, blacklist, whitelist, and authentication.
This allows the content owner to have full control over the geographic area where the content is distributed, the mode of delivery and also the bandwidth at which the file must be delivered. The only requirement is that the bandwidth at the bucket level must be greater than or equal to the bandwidth set at the file level.
In addition, the geoloc field may be modified at the file level to ensure that countries in which a file is valid is a subset of the countries in which a bucket is valid. So, if a bucket is valid in [es, br, us, uk], every file in such a bucket must be valid in a subset of [es, br, us, uk].
The file-based interaction mechanism proceeds as follows. The file is a collection of <cdnrequest> xml blocks. Each block has one value for method=“add_file|remove_file|update_file”. A user logs into the FTP account and uploads a file called requests.xml. We first consider the case that the CDN customer uploads requests.xml and the content file. The format of requests.xml file is:
Once the uploaded content is processed, it is assigned a CDN URL. In the above example, if the bucket id of the user was 87, the content URL is http://87.t-cdn.net/87/demo-output.flv. Once this occurs, a callback is executed to the URL http://cdncustomer.com/callmehere/result?reqid=100&name=demo-output.flv&result=0.
Next, the inventor considers the case that the user FTPs requests.xml file alone into the bucket. In such a case, the requests.xml file will be written as:
The entry point will then download the file from the CDN customer's web server. The entry point will build the URL from the parameters fileurl and the fileName. Once the entry point has downloaded the file, it is processed and assigned a CDN URL as before. The processing of the file (after it is downloaded) generates hashes for each file at the block level (1 Kbyte) so that content integrity at the block level can be verified on any end points prior to distributing the content. These hashes (at the block level) are also stored in the CDN customer's bucket.
The origin server at the CDN has all the files that are part of requests.xml. Two methods by which a client can download content to the CDN infrastructure where the CDN provider manages the buckets have been described above.
The methods remove_file and update_file are defined as follows.
Here, the file1.flv is to be removed from the bucket. Once the removal of the file from the bucket succeeds, a callback is executed at the end point with the following URL: http://cdncustomer.com/callmehere/result?reqid=101&name=file1.flv&result=0.
Finally, looking at the format of update_file method it has to be noted that the version of the content itself can't be updated. Rather, this is a way of updating the optional parameters of a file. This method may be used if the content can be shown in other countries, longer or shorter duration. Note in this example that the validity of the demo-output.flv was changed from 2010-12-31 to 2011-12-31.
After processing the files, the monitoring process generates a responses.xml file in the same directory.
There are additional file parameters that need to be defined. These parameters are:
The CDN customer can download the responses.xml file when it is available. It has to be noted that the response id is the same as the id in the requests.xml file to indicate that the responses.xml file is in response to the requests.xml.
Different components of the CDN, the tracker and the entry point serve a proxy for the meta-data of a bucket and files in a bucket. The meta-data that guides the content delivery comes into play at the end points. Here are described the steps shown on the
The flexibility of the method of the invention allows setting bucket level parameters per customer. It can be set:
A CDN customer at any location may create a content delivery bucket. Once a bucket is created and content uploaded to the CDN infrastructure's origin server, the meta-data is associated with the bucket. This, however, has no meaning until and end user requests content that is in the service provider's CDN. Once an end point comes up, the content meta-data is proxied to the end point. When an end point gets a content request from an end user, the end point first gets the content from the origin server (and in some cases, its neighbours in the same datacenter). The end point then serves the content request. The content served is guided by the meta-data of the bucket and the requested file.
Any change in meta-data of the bucket by a CDN customer is reflected at the end points within a few seconds.
Bucket Size as an Indicator for Caching:
The size of the bucket is a very important parameter in determining how the CDN infrastructure treats a bucket.
The CDN customer may designate a bucket as small. In this case, the bucket object is less than one megabyte in size.
A CDN customer may also designate a bucket as being large. Large buckets have a bucket object that is more than a megabyte in size.
Typically, large bucket objects are delivered via HTTP download while small objects may be cached by the CDN infrastructure.
The monitor process at the entry point is responsible for maintaining a filelist.xml file that is associated with every bucket. This file contains name, size and SHA1 of each file in the bucket. Since the tracker synchronizes with the entry point frequently, it also maintains the filelist.xml file for each bucket. Since the end points also synchronize with the tracker regularly, they also have a copy of the xml file.
If an end user makes a request for bogus content, the end point first checks if this content is part of the CDN infrastructure. If it is not, the request is terminated. This ensures that requests for content do not affect the end points that are serving content to other end users. This protects the CDN infrastructure against such attacks.
The invention ensures content integrity both at the block and file level.
In summary, it has been seen how meta-data is associated with a bucket when a bucket a created. It also has been seen how a managed customer bucket can get the content files either via FTP or by allowing the CDN infrastructure to download the content. Meta-data associated with each file can overwrite meta-data at the bucket level giving the CDN customer fine-grained control over for how files in a bucket should be handled.
By providing intelligence to the buckets, customers can use it in a wide variety of ways in the CDN infrastructure, setting QoS, caching, security, geo-location, and validity (time duration for which content is valid) of content, rate at which each content file may be served.
This invention provides the following advantages:
This invention provides a mechanism within the meta-data that allows the service provider's CDN infrastructure to make intelligent decisions about distributing content from a customer's bucket.
A person skilled in the art could introduce changes and modifications in the embodiments described without departing from the scope of the invention as it is defined in the attached claims.
ADSL Asymmetric Digital Subscriber Line
CDN Content Distribution Network
DNS Domain Name Service
PoP Point of Presence
TLD Top Level Domain
FTP File Transfer Protocol
HTTP HyperText Transfer Protocol
ARL Akamai Resource Locator
Number | Date | Country | Kind |
---|---|---|---|
P201130755 | May 2011 | ES | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2012/058397 | 5/7/2012 | WO | 00 | 2/4/2014 |