The present invention relates to a data storage system, for example in a communications network.
There is an increasing need to store ever larger amounts of data in a structured and readily accessible form, such as is provided by a database. However, a conventional database may be unsuitable for accommodating very large data objects, such as video or audio files. A database is conventionally expanded by enhancing the implementing hardware, for example by increasing storage (RAM or SSD). Such hardware enhancements are expensive and lead to “downtime” during each upgrade. There is a need for a way to increase the effective storage capacity of a database while avoiding the expense and disruption of a hardware upgrade.
The invention allows the effective storage capacity of a database to be increased by storing data such as character large object (CLOB) and binary large object (BLOB) data objects in a data object storage system, such as cloud storage, together with storing associated non-content data (such as metadata) in a database. According to an embodiment of the invention, the data object storage system may be remote from the database—such as remote cloud storage. By using a data object storage system, in this way, data storage may be flexibly and transparently expanded at reduced cost and avoiding downtime while maintaining the benefits of structured data storage.
The invention accordingly provides in a first aspect, a method comprising a data system interface, a database system and a data object storage interface, in which the method comprises, at the data system interface:
receiving a request issued by a requester to retrieve content data from the database system;
forwarding details of the request to the database system; receiving from the database system a response comprising non-content data relating to a data object stored in a data object storage; forwarding to the data object storage interface: the non-content data and details of the request; receiving from the data object storage interface, a response comprising the content data; and forwarding the content data to the requester.
The invention accordingly provides in a second aspect, a method comprising a data system interface, a database system and a data object storage interface, in which the method comprises, at the data object storage interface:
receiving from the data system interface, details of a request issued by a requester to retrieve content data from the database system, together with non-content data relating to a data object stored in a data object storage; in which the non-content data are retrieved from the database system; forwarding to the data object storage, the details of the request and the non-content data; receiving from the data object storage a response, in which the response comprises a data object comprising the content data; and forwarding to the data system interface the content data.
According to an embodiment, the invention provides a method comprising the above two methods.
According to an embodiment, the request issued by the requester is interpretable in the database system as indicating a query statement indicating that the requested data is held in data object storage.
According to an embodiment, the request issued by the requester is interpretable in the database system as indicating a query statement comprising the non-content data.
According to an embodiment, the non-content data indicates a location of the data object in the data object storage.
According to an embodiment, the non-content data comprises at least one of version and time non-content data relating to at least one data object stored in the data object storage.
According to an embodiment, the method further comprises, at the data object storage interface, using the non-content data to ensure absolute consistency of at least one data object stored in the data object storage.
According to an embodiment, the method further comprises, at the data object storage interface, initiating a comparison of the non-content data received with the request to non-content data for the data object received from the object storage; in which the non-content data comprises at least one of version and time data.
According to an embodiment, the comparison indicates that the non-content data received with the request and the non-content data for the data object received from the object storage do not match, the object storage interface provides an instruction to apply a lock on a record associated with the object in the database system.
According to an embodiment, the lock is removed once non-content data for the data object received from the object storage are found to match the non-content data received with the request.
According to an embodiment, the method further comprises, at the data system interface:
receiving from a requester, a request to store content data in the database system; forwarding content data associated with the request to the data object storage interface for storing in the data object storage; and forwarding non-content data associated with the request to the database system.
According to an embodiment, the method further comprises, storing in the database system non-content data defining an association between non-content data stored in the database system and a location in the data object storage for storing the content data.
The present invention accordingly provides, in a third aspect, a computer program element comprising computer program code to, when loaded into a computer system and executed thereon, cause the computer to perform the steps of the method set out above.
The invention also provides in a fourth aspect, a system comprising a data system interface in which the data system interface comprises:
a first interface for communicating with a requester device; in which the first interface is configured to receive from the requester device, a request to retrieve a data object from the database system;
a second interface for communicating with a database system; in which the second interface is configured to forward details of the request to the database system; and to receive a response from the database system, in which the response comprises metadata relating to a data object stored in a data object storage; and
a third interface for communicating with a data object storage through an object storage interface; in which the third interface is configured to forward to the object storage interface details of the request and the metadata.
The invention provides in a fifth aspect, a system comprising:
a data object storage interface in which the data object storage interface comprises: a first interface for communicating with a data system interface and through the data system interface with a requester and a database system; in which the first interface is configured to receive from the data system interface, a request, issued by a requester, to retrieve content data from the database system; receive from the data system interface, non-content data relating to a data object comprising the requested content data stored in a data object storage; and forward to the data system interface for delivery to the requester, content data comprised in a data object received from the data object storage;
a second interface for communicating with the data object storage; in which the second interface is configured to forward to the data object storage details of the request, the non-content data; and receive from the data object storage in response, a data object.
According to an embodiment, the invention also provides a system comprising both the above two systems.
According to an embodiment, the non-content data is derived from the database system. According to an embodiment, the data object storage is a cloud storage.
The present invention accordingly provides, in a sixth aspect, a computer program element comprising computer program code to, when loaded into a computer system and executed thereon, cause the computer to perform the steps of the method set out above.
In order that the present invention may be better understood, embodiments thereof will now be described, by way of example only, with reference to the accompanying drawings in which:
The invention provides for management of data objects, which may comprise a large amount of content data by storing a relatively small amount of non-content data (e.g. metadata) relating to the data object in a databases as block storage, while storing the large amount of content data in data object storage. According to an embodiment, the data object storage used is cloud storage. The invention provides advantages similar to those that would be experienced from storing the data object in the database while avoiding having to support a database large enough to store large quantities of content-data. The invention allows a requester to use sophisticated database techniques to find and manipulate large data objects stored in data object storage as if they were all held in the database.
Cloud storage provides massive scale (10s to 100s of petabytes and billions of data objects) and direct access over HTTP and is approximately ten times less expensive per byte than block storage. A problem with cloud storage is that it does not support an operating system or a database structure. Despite this, the invention effectively extends a database controlled by a database management system (DBMS) to include cloud storage. This is done, according to the invention, by using data segregation. Data is segregated between those data, such as non-content data, that need to be stored in a block storage system and those data, such as content data, that do not. The association between the non-content data and the content data can be maintained when they are stored separately by means of a reference key. According to the invention, no database structure is necessary in the data object storage system. A database structure is implemented in block storage, which stores version- and time-related information relating to data objects stored in the data object storage system, effectively creating a virtual database across the database and the data object storage system.
Hence although cloud storage does not support a database structure, the database structure that is embodied by the DBMS may be used to access and manipulate blocks stored in cloud storage, through the use of reference keys. An example of a suitable reference key is: http://cms-backamaze.gb.storage.cloud.bt.com/Locfilename∥12345, where “http://cms-backamaze.gb.storage.cloud.bt.com” is a reference to a cloud storage system in which “cms-backamaze” is the name of a container where all files are stored and organized and “gb.storage.Cloud.bt.com” is specific to a cloud provider. Also in this example reference key, “Locfilename” is the name of a file and “12345” is the version number of the file.
The requester device 110 may comprise a computer, smart phone, tablet device, or other device that comprises a processor and a computer network interface (e.g., Wi-Fi or a wired network interface card). A suitable system is shown in
According to an embodiment, cloud storage 150 is located remotely from data system interface 120 and is typically accessed over the internet 142. In practice, cloud storage 150 may be located in datacentres anywhere around the world. According to an embodiment, each of requester device 110, data system interface 120, database system 130, data object storage interface 140 and data object storage 150 are controlled by program code executed by a processor. An exemplary processor and associated processing circuit is shown in
We now provide a more detailed description of certain embodiments.
Storing Data
With reference to
Consider, for example, a database table called “person” which comprises an ID (as a number), a person's name, a photo of the person and their curriculum vitae. Since large documents, such as image and CV can occupy a large amount of memory, it is advantageous to store them in a cloud storage. According to an embodiment of the invention, to facilitate this segregated storage in a way that allows the documents to be retrieved using a simple database request, data definition language (DDL) SQL statements containing the key word “CLOUD” may be used. The key word “CLOUD” in the DDL statement may be used to maintain internal reference keys in database system 130 and as an identifier that large documents are stored in a cloud storage across one or multiple servers. A suitable table creation statement is provided in Table 1:
The data system interface 120 identifies which fields are to be stored in block memory of the database system 130 and which are to be stored in cloud storage 150. The latter are distinguished by having the key word “CLOUD” associated with them in the database. Advantageously, to an end user or requester, the resulting segregated storage appears to be a single database running on a single machine.
Where the database system 130 is a distributed system across multiple servers, data object storage interface 140 may generate the reference key once only but instruct the data system interface 120 to pass the reference key (together with the non-content data) to each one of the multiple servers. Even where non-content data is replicated across multiple servers, however, a single version of the content data may be maintained in the cloud storage. All copies of the non-content data for a specific chunk of content data are associated with the same reference key.
Accessing Data
Retrieval of data from the cloud storage and modification of data in the cloud storage is initiated by a database request generated by a requester at requester device 110. Retrieval of content data is based on non-content data. Non content data may be Name, Type, Content Type, etc. (as shown by way of example in
According to an embodiment of the invention, retrieval of data from the cloud storage uses data aggregation. With data aggregation, when data is retrieved, e.g. by use of a reference key, the relevant data from the cloud storage and from database system 130 is aggregated. When data object storage interface 140 receives the requested content data retrieved from the cloud storage 150, it is forwarded to data system interface 120. On receipt of the retrieved content data from data object storage interface 140, data system interface 120 aggregates the associated non-content data from the database system 130 with the retrieved content data and forwards, in a response to the request, the aggregated information to the requester device 110. The system acts to virtually extend the database by use of a cloud storage, even though the DBMS cannot be installed on a cloud storage system or any cloud storage.
Deleting Data
Advantageously, the invention is able to move a large quantity of data to a low-cost cloud storage such as cloud storage, which results in a much smaller database. This can mean the database can be accommodated in fast-access RAM memory, whereas a conventional, larger database may be too large for RAM and will need to be stored in slower discs memory. RAM takes nanoseconds to read from or write to, while hard drive access speed is measured in milliseconds. Hence the invention can significantly improve the performance of a database.
The invention effectively maintains a single, virtual database across two different storage systems, one main, block storage (typically local and fast but expensive) and another cloud storage (typically remote and inexpensive but slow). The main memory stores data as blocks. With block storage, files are split into evenly sized blocks of data, each with its own address but with no additional information (non-content data) to provide more context for what that block of data is. The cloud storage stores data as data objects. Data object storage, by contrast, does not split files up into blocks of data. Instead, entire clumps of data are stored as a single data object that contains the data, non-content data, and the unique identifier. Data object storage does not store information on relationships between data objects, which is done in the database. There is no limit on the type or amount of data which can be stored in data object storage, which makes data object storage powerful and customizable.
Consistency
Even after integrating with data object storage, the invention achieves atomicity, consistency, isolation, and durability (ACID)—which are the desired properties of a relational database system. As we have indicated, above, cloud storage does not support database properties, in particular, cloud storage is not strongly consistent. Consistency is achieved as follows, according to an embodiment of the invention.
The database system 130 sends messages with information (e.g. version and time non-content data and reference key) to the data object storage interface 140. The database system may interact with the data object storage interface when DML commands are executed in the database system, for example, when data needs to be fetched, stored or deleted from data object storage. During execution of DDL statements, such as “create table operation”, the data system interface 120 behaves as a plugin to the database to store data about the data objects (such as CLOBs, BLOBs and user defined data) that is to be stored in data object storage. The data system interface segregates data which is to be stored in data object storage from data which is to be stored in the database system. When we refer to “data object storage”, this term includes storage systems in which the data may be replicated across multiple servers. In particular, we use the term “data object storage” to include cloud storage.
A problem with certain forms of data object storage, for example conventional cloud storage, is a lack of absolute consistency, because cloud storage only supports eventual consistency wherein retrieving a data object may not return the latest version of the data object, but an older version. That is, subsequent attempts to read a data object from cloud storage may or may not yield the latest version of the data object. Cloud storage may store a data object across a variety of data-centres located at different geographical locations. This provides durability and also resilience against failure of a single data-centre. When the data object is stored in one of the data centres a success message is returned to the requesting client, together with version and time non-content data. When a data object is read from cloud storage, the cloud storage may try to retrieve the data object from a data-centre close to the data centre where the request was made but this may not be the data-centre in which the data object was most-recently stored—resulting in the return of a data object with old version and time non-content data.
According to the invention, the version and time non-content data related to a data object stored in data object storage are also recorded in the database. Consistency is maintained by the database system 130 which keeps track of the version and time stamp of each data object stored in data object storage. Whenever a requester of the present invention provides a data object together with a request for storage, the data system interface 120 directs the data object to the data object storage interface 140, from where it is passed to the data object storage 150. When the data object is stored in the cloud storage, a copy of the latest relevant version and time non-content data is stored in the database system. Unlike data object storage, the database system is naturally, strongly consistent. The inherent consistency of the database system is exploited according to an embodiment of the invention to ensure consistent behaviour on the part of the data object storage. According to this embodiment, whenever the data system interface issue a data-retrieval request to the data object storage interface, it is accompanied by version and time non-content data relating to the requested data object and retrieved from the database. When the data object storage provides a data object in response to a data-retrieval request, it is accompanied by version and time non-content data relating to the provided data object and retrieved by the data object storage. When the data object storage interface receives a data object in response to a data-retrieval request, it compares the version and time non-content data relating to the requested data object retrieved from the database with the version and time non-content data for the data object received from the data object storage. If the non-content data do not match, the data object storage interface applies a lock on the record associated in the database 130 with the data object until matching non-content data are received from the data object storage. In the presence of a mismatch, the data object storage interface may apply a configurable time limit for retries and a configurable limit on the number of retries. The data object storage interface may retry retrieval of the data object until it receives from the data object storage, version and time non-content data that matches the version and time non-content data held in the database. When matching version and time non-content data are received, the data object storage interface returns the data object to the data system interface 120 for delivery to the requester and the lock is released.
Insofar as embodiments of the invention described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as data object code, for example.
Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilises the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present invention.
It will be understood by those skilled in the art that, although the present invention has been described in relation to the above described example embodiments, the invention is not limited thereto and that there are many possible variations and modifications which fall within the scope of the invention. The invention is not restricted to large data objects and has application to data objects comprising any amount of content data.
According to an embodiment, the data object storage interface uses REST or HTTP methods over the web to retrieve data object content from a cloud storage. According to embodiments, the data held in a cloud storage system may be stored in any availability zone (i.e. a data centre in a region of a remote cloud storage) at any time. According to embodiments, data in a cloud storage system can be moved between different data centres in a region of a cloud storage system without this affecting the ability of the data system interface 120, database system or data object storage interface 140 to locate the data by use of the reference key.
The invention is not limited to cloud storage and may be applied to any data storage capable of dealing efficiently with large data objects. However, use of cloud storage provides the benefits of low cost, no up-front cost, pay-as-you-go model, highly scalable, auto-scalable and high availability. According to an embodiment of the invention, the data object storage or cloud storage is remote from the requester and the database system, by which we mean not directly connected to the same network as the data system interface, database system or object storage interface. The data object storage interface 140 may have the capability to transform the retrieved content data from a cloud storage format to CLOB, BLOB or custom data-type, as appropriate, before forwarding to data system interface 120. The data object storage interface 140 may also encrypt the content data using various encryption method such as AES (Advanced Encryption Standard) prior to storing in the cloud storage 280. However, encryption of content data is optional. Data object storage interface 140 may have the capability to check whether the retrieved content data is in encrypted format and, where the content data is encrypted, decrypt using the appropriate key and appropriate decryption method.
The data object storage interface 140 may store credentials relating to a user account with a cloud service provider. As a part of setup, data object storage interface 140 accepts credentials such as user names, passwords, certificates and keys. Data object storage interface 140 may store this information in a properties file or credential store in an application server (i.e. where the application server is the environment for the data object storage interface 140). Using the credentials in combination with an appropriate protocol (such as REST, SOAP, etc.), data object storage interface 140 communicates with the data object storage 150.
Data object storage interface 140 may also add appropriate headers to content data for example—creation date, expiry date, created user name, last modified date, last modified user name, etc. These headers may be useful to identify various attributes associated with the said data for purposes of management of data stored in the cloud, including deletion of data objects. These headers are effectively additional non-content data that are stored in the cloud with the content data and are not in stored in the database system. These headers may be used to maintain consistency across both the database system and data object storage. Part of the headers may also be stored in the database system to ensure consistency across the systems is maintained.
The scope of the present invention includes any novel features or combination of features disclosed herein. The applicant hereby gives notice that new claims may be formulated to such features or combination of features during prosecution of this application or of any such further applications derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.
Number | Date | Country | Kind |
---|---|---|---|
201611022486 | Jun 2016 | IN | national |
16186413.7 | Aug 2016 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/064706 | 6/15/2017 | WO | 00 |