This application claims priority from Singapore Patent Application No. 10201406349V filed on Oct. 3, 2014.
This invention is related to a storage system for a data center. More specifically, this invention is related to a distributed active hybrid storage system for a data center.
Current storage devices or volumes have little or no intelligence capabilities. They are dummy devices which can be instructed to perform simple read/write operations. It relies on a stack of system software in a storage server to abstract the block-based storage device. With more data in data centers, more storage servers are required to manage devices and provide storage abstraction. This increases not only hardware cost but also the cost of server maintenance.
With the advancement of Central Processing Unit (CPU) and Non-Volatile Memory (NVM) technologies, it is increasingly feasible to incorporate the functionalities of system and clustering software implementation and other data management into smaller controller board to optimize system efficiency and performances to reduce Total Cost of Ownership (TOC). The NVM is a solid state memory and storage technology for storing data at a very high speed and/or a very low latency access time, and the NVM retains the data stored even with the removal of power. Examples of NVM technologies include but are not limited to STT-MRAM (Spin torque transfer MRAM), ReRAM (Resistive RAM) and Flash memory. It is also possible the NVM may be provided by a hybrid or combination of the various different NVM technologies to achieve balance between cost and performance.
Thus, what is needed is a system for utilizing CPU and NVM technology to provide intelligence for storage devices and reduce or eliminate their reliance on storage servers for such intelligence. Furthermore, other desirable features and characteristics will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and this background of the disclosure.
In accordance with one aspect of the present invention, an active storage system is disclosed. The active storage system includes a storage device, a non-volatile memory and an active drive controller. The active drive controller performs data management and/or cluster management within the active storage system, the active drive controller also includes a data interface for receiving at least object and/or file data.
In accordance with another aspect of the present invention, another active storage system is disclosed. The active storage system includes a metadata server and one or more active hybrid nodes. Each active hybrid node includes a plurality of Hybrid Object Storage Devices (HOSDs) and a corresponding plurality of active drive controllers, each of the plurality of active drive controllers including a data interface for receiving at least object and/or file data for its corresponding HOSD. One of the plurality of active drive controllers also includes an active management node, the active management node interacting with the metadata server and each of the plurality of HOSDs for managing and monitoring the active hybrid node.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to illustrate various embodiments and to explain various principles and advantages in accordance with a present invention, by way of non-limiting example only.
Embodiments of the invention are described hereinafter with reference to the following drawings, in which:
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been depicted to scale.
The following detailed description is merely exemplary in nature and is not intended to limit the invention or the application and uses of the invention. Furthermore, there is no intention to be bound by any theory presented in the preceding background of the invention or the following detailed description. It is the intent of this invention to present active storage systems which include active drive controllers coupled to hybrid storage devices within the systems for performing data management and cluster management, the cluster management including interaction with a metadata server and other active drive controllers to discover and join a cluster or to form and maintain a cluster. The active drive controllers in accordance with a present embodiment include a data interface for receiving object data, file data and key value data.
Referring to
Referring to
Referring to
Referring to
Key Value (KV) interfaces are built on top of the object store. A mapping layer is designed and implemented to map a KV entry 420 to an object 410. There are various mechanisms for mapping KV to Objects. In one-to-one mapping as depicted in the mapping illustration 400, each KV entry 420 is mapped to a single object 410. The KV entry 420 includes a key 422, a value 424 and other information 426. The key 422 is mapped 432 to the object ID 412. The value 424 is mapped 434 to the object data 414. And the other information 426 can include version, checksum and value size and is mapped 436 to the object metadata 416.
Referring to
The AHN 702 also includes an object store 716, a local file storage 718 and hybrid storage 720, the hybrid storage 720 including HDDs 112 and NVMs 110. The local file storage includes the object metadata 416 (or the object metadata 516, 614, 615) and the object data files 414 (or the object data files 514, 616). The object store 716 includes an object interface 722 for interfacing with the object client 712 and a key value interface 724 for interfacing with the KV client 714. The key value interface 724 is responsible for KV to object mapping such as the mapping illustrated in
The software architecture and modules that form the operations and functions of the AHN 702 are described in more detail. The software executables are stored in the non-volatile media for program code storage, and are recalled by the AHN processor into main memory during bootup for execution. The AHN 702 provides both object interfaces and key-value (KV) interfaces to applications in the object client server 712 and the KV client server 714. The object interfaces 722 are the native interfaces to the underlying object store 716. The object store 716 can alternatively be implemented as a file store (e.g., the file store 726) to store the objects as files.
There are three main layers of software: the node daemon 704, the object store 716 and the local file system 718. The node daemon layer 704 refers to various independent run-time programs or software daemons. The message handler daemon 710 handles the communication protocol based on TCP/IP with other ANHs, AMNs and client terminals for forming and maintaining the distributed cluster system and providing data transfer between client servers and the ANHs.
The reconstruction daemon 708 is responsible for executing the process of rebuilding lost data from failed drives in the system by decoding data from the associated surviving data and check code drives. The MapReduce daemon 706 provides the MapReduce and the Hadoop Distributed File System (HDFS) interfaces for the JobTracker in the MapReduce framework to assign data analytic tasks to ANHs for execution so that data needed for processing can be directly accessed locally in one of more storage devices in the ANH node. And the client installable program daemon 730 is configured to execute a program stored on any one or more storage devices attached to the ANH. As applications or client servers can post and install jobs into the AHN for execution, the client installable program daemon communicates with client terminals for uploading and installing executable programs into one or more storage devices attached to the ANH.
The principle of running data computing in the AHN 702 is to bring computation closer to storage, meaning that the daemon only needs to access data from a local AHN 702 for a majority of the time and send the results of the job back to the application or client server. In many situations, the results of the data computing are much smaller in size than the local data used for computation. In this way the amount of data need to be transmitted over the network 140 can be reduced and big data processing or computation can be distributed along with the storage resources to vastly improve total system performance.
The object store 716 is a software layer to provide object interface 722 and KV interface 724 to the node daemon layer 704. The object store layer 716 also maps objects to files by the file store 726 so that objects can be stored and managed by a file system underneath. Data compression and hybrid data management are the other two main modules in the object store layer 716 (though shown as the single module 728 in
The local file system layer 718 provides file system management of data blocks of the underlying one or more storage devices for storing of object metadata 416 and object data 414 by resolving each object into the corresponding sector blocks of the one or more storage devices. Data sector blocks for deleted objects are reclaimed by the local file system layer 718 in accordance with the present embodiment for future allocation of sector spaces for storing newly created objects.
Referring to
The AMN 802 is a multiple function node. Besides a cluster management and monitoring function 814, the AMN 802 sends instructions to migrate data due to new nodes added, or failed and inactive AHNs, or unbalanced data access to the AHNs from a Data migration and reconstruction daemon 816. In addition, the AMN 802 can also advantageously reduce network traffic by sending instructions via a switch controller daemon 818 to the SCB switches 810 to forward data packets to destinations not specified by a sender.
The message handler daemon 812 implements the communication protocols with other AMNs, if there are any, AHNs in the cluster, application servers, and the programmable switches. The cluster management and monitoring daemon 814 provides the algorithms and functions to form and maintain the information about the cluster. The client server communicates with the cluster management and monitoring daemon 814 to extract the latest HOSDs topology in the cluster for determining the corresponding HOSDs to store or retrieve data. Based on the monitoring status of the cluster, the AMN 802 sends instructions from the data migration and reconstruction daemon 816 to migrate data due to a new node added, or failed and inactive AHNs, or unbalanced data access to the AHNs. In addition, the AMN 802 can also send instructions to the programmable switches via the switch controller daemon 818 to replicate and forward data packets to the destinations autonomously to reduce load on the client communication.
Referring to
Referring to
Referring to
In the event an associated entry is not found 1106 in the flow table, the packet headers and associated payload parameters are sent to the AMN 1014 to obtain a new entry for this packet or flow and the flow and parity node tables are updated 1108 in the programmable switch 1004 in accordance with the response received from the AMN 1014 which contains the new table entry information. When the entry is found 1106, the packet is forwarded 1110 to the AHN which contains the destination HOSD as indicated by the entry. Separate data write requests with the same data received from the application server 902 are duplicated 1112, 1114 by the programmable switch 1004 for forwarding to each of the parity nodes 1008 associated with the data node 1006 as listed in the corresponding entry in the parity node table 1012. Both parity nodes 1008 and data nodes 1006 are provided by HOSDs in the distributed storage cluster.
Referring to
For the case of multiple HOSD/HDD failures occurring across different AHNs 1216, each AHN will be responsible for its own HOSD/HDD reconstruction 1218. For each AHN, the reconstruction procedure is the reconstruction daemon 816 looks 1220 for the data which is available in the attached NVM and copies it directly to the replacement HOSDs/HDDs and the object map which is also used as a reconstruction map is updated 1222 either after each object is reconstructed or after multiple objects are reconstructed 1214.
Thus, it can be seen that the present embodiment provides a system for utilizing CPU and NVM technology to provide intelligence for storage devices and reduce or eliminate their reliance on storage servers for such intelligence. In addition, it provides advantageous methods for reduced network communication by bringing data computation closer to data storage, and only forwarding results of the data computing which are much smaller in size than the local data used for computation across the network. In this way the amount of data needed to be transmitted over the network can be reduced and big data processing or computation can be distributed along with the storage resources to vastly improve total system performance. While exemplary embodiments have been presented in the foregoing detailed description of the invention, it should be appreciated that a vast number of variations exist.
It should further be appreciated that the exemplary embodiments are only examples, and are not intended to limit the scope, applicability, operation, or configuration of the invention in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing an exemplary embodiment of the invention, it being understood that various changes may be made in the function and arrangement of elements and method of operation described in an exemplary embodiment without departing from the scope of the invention as set forth in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10201406349V | Oct 2014 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2015/050367 | 10/2/2015 | WO | 00 |