The present invention relates in general to storage networks, and more particularly to the management of a distributed storage network.
A storage network provides connectivity between servers and shared storage and helps enterprises to share, consolidate, and manage data and resources. Unlike direct attached storage (DAS), which is connected to a particular server, storage networks allow a storage device to be accessed by multiple servers, multiple operating systems, and/or multiple clients. The performance of a storage network thus depends very much on its interconnect technology, architecture, infrastructure, and management.
Fibre Channel has been a dominant infrastructure for storage area networks (SAN), especially in mid-range and enterprise end user environments. Fibre Channel SANs uses a dedicated high-speed network and the Small Computer System Interface (SCSI) based protocol to connect various storage resources. The Fibre Channel protocol and interconnect technology provide high performance transfers of block data within an enterprise or over distances of, for example, up to about 10 kilometers.
Network attached storage (NAS) connects directly to a local area network (LAN) or a wide area network (WAN). Unlike storage area networks, network attached storage transfers data in file format and can attach directly to an internet protocol (IP) network. Internet SCSI (iSCSI) is an Internet Engineering Task Force (IETF) standard developed to enable transmission of SCSI block commands over the existing IP network by using the TCP/IP protocol. An IP SAN is a network of computers and storage devices that are IP addressable and communicate using the iSCSI protocol. An IP SAN allows block-based storage to be delivered over an existing IP network without installing a separate Fibre Channel network.
To date, most storage networks utilize storage virtualization implemented on a host, in storage controllers, or in other places of the networks. As the storage networks grow in size, complexity, and geographic expansion, a need arises to effectively manage physical and virtual entities in distributed storage networks.
Embodiments of the present invention provide systems and methods for managing a geographically distributed storage. In one embodiment, the system includes a network of nodes and storage devices, and a management module for managing the network of nodes and storage devices. The storage devices may be heterogeneous in their access protocols, including, but not limited to, Fibre Channel, iSCSI (internet-SCSI), Network File System (NFS), and Common Internet File System (CIFS).
In one example, the management module includes a Site Manager, a Storage Resource Manager, a Node Manager, and a Data Service Manager. The Site Manager is the management entry point for site administration. It may run management user interfaces such as a Command Line Interface (CLI) or a Graphical User Interface (GUI), manages and persistently stores site and user level information, and provides authentication and access control, and other site-level services such as alert and log management. The Storage Resource Manager provides storage virtualization so that storage devices can be effectively managed and configured for applications of possibly different types. The Storage Resource Manager may contain policy management functions for automating creation, modification, and deletion of virtualization objects, and determining and maintaining a storage layout. The Node Manager forms a cluster of all the nodes in the site. The Node Manager can also perform load balancing, high availability, and node fault management functions. The Data Service Manager may implement data service objects, and may provide virtualized data access to hosts/clients coupled to the network of nodes and storage devices through data access protocols including, but not limited to, iSCSI, Fibre Channel, NFS, or CIFS.
In one example, the components of the storage management module register with a service discovery entity, and integrate with an enterprise network infrastructure for addressing, naming, authentication, and time synchronization purposes.
In another embodiment of the invention, a system for managing a distributed storage comprises a plurality of sites, and a management module associated with each site. The sites are hierarchically organized with an arbitrary number of levels in a tree form, such that a site can include another site as a virtual node, creating a parent-child relationship between sites. Thus, a flexible, hierarchical administration system is provided through which administrators may manage multiple sites from a single site that is the parent or grandparent of the multiple sites. In one example, the administrator name resolution is hierarchical, such that a system administrator account created on one site is referred to relative to the site's name on the hierarchy.
In one example, a service request directed to a site is served by storage resources that belong to the site. In one embodiment, a site administrator can choose to export some of its storage resources for use by a parent site, relinquishing the control and management of these resources to the parent site. The sites may also use resources from other sites that may be determined by access control lists as specified by the site system administrators.
In another embodiment of the invention, a method is provided for making the Site Manager component highly available by configuring one or more standby instances for each active Site Manager instance. In one example, the active and standby Site Manager instances run on dedicated computers. In another example, active and standby Site Manager instances run on the storage nodes.
In another embodiment of the invention, a flexible alert handling mechanism is provided as part of the Site Manager. In one example, the alert handling mechanism may include a module to set criticality levels for different alert types; a user notification module, the notification module through management agents for alerts at or above a certain criticality; an Email notification module providing alerts at or above a certain criticality, a call-home notification module providing alerts at or above a certain criticality, and a forwarding module providing alerts from a child Site Manager to its parent depending on the root cause and criticality.
Embodiments of the present invention provide systems and methods for managing geographically distributed storage devices. These storage devices can be heterogeneous in their access protocols and physical interfaces and may include one or more Fibre Channel storage area networks, one or more Internet-Protocol storage area network (IP SAN), and/or one or more network-attached storage (NAS) devices. Various embodiments of the present invention are described herein.
Referring to
A storage device 110 may include raw or physical storage objects, such as disks, and/or virtualized storage objects, such as volumes and file systems. The storage objects (either virtual or physical) are sometimes referred to herein as storage resources. Each storage device 110 may offer one or more common storage networking protocols, such as iSCSI, Fibre Channel (FC), Network File System (NFS) protocol, or Common Internet File System (CIFS) protocol. Each storage device 110 may connect to the network 100 directly or through a node 120.
A node 120 may be a virtual node or a physical node. An example of a physical node is a controller node corresponding to a physical storage controller, which provides storage services through virtualized storage objects such as volumes and file systems. An example of a virtual node is a node representing multiple physical nodes, such as a site node corresponding to a management site 130, which represents a cluster of all the nodes in the management site, as discussed in more detail below. Depending on whether it serves any locally attached storage devices or not, a node 120 may also be a node without storage or a node with storage. A node 120 without storage has no locally attached storage devices so that its computing resources are used mainly to provide further virtualization services on top of storage objects associated with other nodes, or on top of other storage devices. A node 120 with storage has at least one local storage device, and its computing resources may be used for both virtualization of its own local storage resources and other storage objects associated with other nodes. A node 120 with storage is sometimes referred to as a leaf node.
In one example, storage service clients 140 are offered services through the nodes 120, and not directly through the storage devices 110. In that respect, nodes 120 can be viewed as an intermediary layer between storage clients 140 and storage devices 110.
A management site (“site”) 130 may include a collection of nodes 120 and storage devices 110, which are reachable to each other and have roughly similar geographical distance properties. A site 130 may also include one or more other sites as virtual nodes, as discussed in more detail below. The elements that comprise a site may be specified by system administrators, allowing for a large degree of flexibility. A site 130 may or may not own physical entities such as physical nodes and storage devices. In the example shown in
In one embodiment of the present invention, as shown in
The storage management module 200 is used by site administrators to manage a site 130 via management station(s) 150, which may run a management user interface, such as a command line interface (CLI) or a graphical user interface (GUI). In one embodiment, the Site Manager 210 is the management entry point for site administration, and the management station 150 communicates via the management user interface with the Site Manager 210 using a site management interface or protocol, such as the Simple Network Management Protocol (SNMP), or Storage Management Initiative Specification (SMI-S). SNMP is a set of standards for managing devices connected to a TCP/IP network. SMI-S is a set of protocols for managing multiple storage appliances from different vendors in a storage area network, as defined by Storage Network Industry Association (SNIA). The Site Manager 210 manages and persistently stores site and user level information, such as site configuration, user names, permissions, membership information, etc. The Site Manager 210 may provide authentication to access a site, and access control rights for storage resources. It can also provide other site-level services such as alert and log management. In one example, at least one active instance of the Site Manager 210 is run for each site 130, as discussed in more detail below.
In one example, the Site Manager 210 is responsible for creating, modifying, and/or deleting user accounts, and handling user authentication requests. It also creates and deletes user groups, and associates users with groups. It is capable of either stand-alone operation, or integrated operation with one or more enterprise user management systems, such as Kerberos, Remote Dial In User Service (RADIUS), Active Directory, and/or Network Information Service (NIS). Kerberos is an IETF standard for providing authentication, RADIUS is an authentication, authorization, and accounting protocol for applications such as network access or IP mobility intended for both local and roaming situations, Active Directory is Microsoft's trademarked directory service and an integral part of the Windows architecture, and NIS is a service that provides information to be known throughout a network.
The user information may be stored in a persistent store 212 associated with the Site Manager where the user account is created. The persistent store could be local to the Site Manager, in which case it is directly maintained by the Site Manager or external to the Site Manager, such as one associated with the NIS, Active Directory, Kerberos, or RADIUS. A user created in one site can have privileges for other sites as well. For example, a site administrator for a parent may have site administration privileges for all of its descendants.
In one example, there can be different user roles, such as site administrator, group administrator, and guest. Site administrators may be capable of performing all the operations in a site. Group administrators may be capable of managing only the resources assigned to their groups. For example, each department in an organization may be assigned a different group, and the storage devices belonging to a particular department may be considered to belong to the group for that department. Guests may generally have read-only management rights.
In addition to the capabilities defined by user roles, it may also be possible to limit the access permissions of each system administrator through access control lists on a per-object basis. In order to make this more manageable, it may also be possible to define groups of objects, and define access control lists for groups. Moreover, it may be possible to group administrator accounts together, and give them group-level permissions.
Alerts may be generated by different components including components 210, 220, 230, and 240 of the storage management module 200. Regardless of where they are generated, alerts are forwarded to the Site Manager 210 where they are persistently stored (until they are cleared by the system or by an administrator), in one example. The Site Manager 210 also notifies users and other management agents, such as SNMP or SMI-S, whenever a new alert at or above a certain criticality is generated. System administrators can set the notification criticality level, so that alerts at or above a certain criticality may be emailed to a set of administrator-defined email addresses. The users can also set other types of notifications and define other actions based on the alert type. Also, there may be a “call-home” feature whereby the Site Manager 210 notifies a storage vendor through an analog dial-up line if there are critical problems that require service.
In one embodiment, there is only one alert created per root cause. However, the same alert may be referenced by multiple objects if it impacts the health of all those objects. For example, when a storage device hosts two storage objects, one from a particular site and the other from another site, the failure of the storage device impacts both of these storage objects from different sites, and the alerts from the storage objects are generated by the storage management modules for both sites.
The Storage Resource Manager 220 provides storage virtualization for the storage devices 110 owned by a site based on storage requirements for applications of potentially different types, so that the storage devices in the site can be effectively used and managed for these applications. An application of one type has typically different storage requirements from that of another type. Storage requirements for an application can be described in terms of protection, performance, replication, and availability attributes. These attributes define implicitly how storage for these applications should be configured, in terms of disk layout and storage resource allocation for virtualized storage objects that implements the storage solution for these requirements.
In one example, Storage Resource Manager 220 includes policy management functions and uses a storage virtualization model to create, modify, and delete virtualized storage objects for client applications. It also determines and maintains a storage layout of these virtualized storage objects. Examples of storage layouts include different Redundant Array of Independent (or Inexpensive) Disks (RAID) levels, such as RAID0 for performance, RAID1 for redundancy and data protection, RAID10 for both performance and redundancy, RAID5 for high storage utilization with some redundancy, at the expense of decreased performance, etc. In one example, each site runs an active instance of the Storage Resource Manager 220 in a host 140 or node 120.
The Node Manager 230 is responsible for forming the site node for a site, which represents a cluster of all the nodes in the site. For that reason, the Node Manager 230 for a site 130 is sometimes referred to as the site node corresponding to the site 130. The Node Manager 230 may also handle storage network functions such as load balancing, high availability, and node fault management functions for the site. In one embodiment, the Node Manager 230 for a site 130 assigns node resources, such as CPU, memory, interfaces, and bandwidth, associated with the nodes 120 in the site 130, to the storage objects in the site 130, based on the Quality of Service (QoS) requirements of virtualized storage objects as specified by site administrators. In one example, nodes can have service profiles that may be configured to provide specific types of services such as block virtualization with iSCSI and file virtualization with NFS. Node service profiles are considered in assigning virtualized storage objects to nodes. An active instance of Node Manager 230 preferably runs on every physical node.
From the perspective of the Storage Resource Manager 220 at a site, the site includes a single node (with or without storage) and zero or more storage devices, and all storage services associated with the site are provided via this node. Specifically, the Storage Resource Manager 220 interacts with the site node that represents a cluster of all nodes in the site. In one example, the Node Manager 230 provides this single node image to the Storage Resource Manager 220, and the members of the cluster are hidden from the Storage Resource Manager 220.
Furthermore, the Node Manager 230 running on a physical node configures and monitors the Data Service Manager 240 on that particular node. The Data Service Manager 240, in one example, implements data service objects, which are software components that implements data service functions such as caching, block mapping, RAID algorithms, data order preservation, and any other storage data path functionality. The Data Service Manager 240 also provides virtualized data access to hosts/clients 140 through one or more links 242 using one or more data interfaces, such as iSCSI, FC, NFS, CIFS. It also configures and monitors storage devices 110 through at least one other 244 link using at least one management protocol and/or well-defined application programming interfaces (API) for managing storage devices locally attached to a particular node. Examples of management protocols for link 244 include but are not limited to SNMP, SMI-S, and/or any proprietary management protocols. An active instance of Data Service Manager 240 runs on every physical node.
The components 210, 220, 230, and 240 of the site software 200 may register with and utilize a Network Service Infrastructure 250 for addressing, naming, authentication, and time synchronization purposes. In one embodiment, the network service infrastructure 250 includes a Dynamic Host Configuration Protocol (DHCP) server (not shown), iSNS server (not shown), a Network Time Protocol (NTP) server (not shown), and/or a name server (not shown), such as a Domain Name System (DNS) or an Internet Storage Name Service (iSNS) server.
In order to reduce manual configuration, by default the physical nodes are configured through the DHCP server, which allows a network administrator to supervise and distribute IP addresses from a central point, and automatically sends a new address when a computer is plugged into a different place in the network. From the DHCP server, the physical nodes are expected to obtain not only their IP addresses, but also the location of the name server for the network 100.
A host 140 accessing the iSCSI data services provided by a site 130 may use the iSNS server to discover the location of the iSCSI targets. In the case of a failover that requires the IP address of an iSCSI target to change, the iSNS server may be used to determine the new location. The iSNS server may also be used for locating storage devices and internal targets in a site.
DNS Service Discovery (DNS-SD), which is an extension of the DNS protocol for registering and locating network services, may be used for registering NFS and CIFS data services. As an alternative, the Service Location Protocol (SLP) may also be used as the service discovery protocol for NFS and CIFS data services. SLP is an IETF standards track protocol that provides a framework to allow networking applications to discover the existence, location and configuration of networked services in enterprise networks.
In one embodiment, each site 130 supports one or more commonly used authentication services, such as NIS, Active Directory, Kerberos, or RADIUS. The commonly used authentication services may be used to authenticate users and control their access to various network services.
In order to address time synchronization requirements, site entities may synchronize their real time clocks by means of the NTP server, which is commonly used to synchronize time between computers on the Internet, for the purposes of executing scheduled tasks, and time stamping event logs, alerts, and metadata updates.
In one embodiment, network 100 may comprise one or more sub-networks (subnet). A subnet may be a physically independent portion of a network that shares a common address component. A site may span multiple subnets, or multiple sites may be included in the same subnet. In order to provide for subnet-independent access to management services, dynamic DNS may be used to determine the location of the Site Manager 210. Alternatively, all physical instances of a Site Manager 210 could be placed on a same subnet, and conventional IP takeover techniques could be used to deal with a Site Manager failover. However, this alternative is not a preferred solution, particularly in the case of a network having multiple sites.
In order to manage multiple sites under a same management entity, sites may be hierarchically organized in a tree form with an arbitrary number of levels. Further, a site can include another site as an element or constituent. That is, a site can be a collection of nodes, storage devices, and other sites. This creates a parent-child relationship between sites. As shown in
In one exemplary application of the site hierarchy, the leaf sites correspond to the physical storage sites or sections of physical storage sites of an enterprise or organization, while the parent sites are non-leaf sites that correspond to a collection of their child sites. As an example, each physical storage site has a network of at least one storage controller and at least one storage device.
In one example, the hosts or clients 140 which connect to a parent site to access a storage service (e.g., an iSCSI volume, or an NFS file system) discover the parent site's contact address through the Network Services Infrastructure 250, and connect to that contact address. The contact address resides in a physical node in a leaf site, and it could be migrated to other nodes or other leaf sites as needed due to performance or availability reasons. The hosts or clients 140 do not need to be aware of which physical node is providing the site access point.
Note that each site in a site hierarchy is assumed to have a unique name. If two site hierarchies are to be merged, it should first be ensured that the two site hierarchies do not have any sites with the same name.
For the system administrators, the name resolution may be hierarchical. In other words, a system administrator account may be created on a specific site, and referred to relative to that site's name in the hierarchy. In one exemplary embodiment, the privileges of a system administrator on a parent site are applicable by default to all of its child sites, and so forth.
In one embodiment, a parent site can be created for one or more existing child sites. Creation of a parent site is optional and can be used if there are multiple sites to be managed under a single management and/or viewed as a single site. A site administrator may configure a site as a parent site by specifying one or more existing sites as child sites. Since, in one example, a site can have only one parent site, the sites to be specified as child sites must be orphans, meaning that they are not child sites of other parent site(s). Additionally, a child and its parent have to authenticate each other to establish this parent-child relationship. This authentication may take place each time the communication between a parent and a child is reestablished. The site administrator of a child or parent site may be allowed to tear down an existing parent-child relationship. When a site becomes a child of a parent site, the site node for the child site joins the parent site as a virtual node.
In one embodiment, the Site Manager 210 for each site in the site hierarchy is responsible for forming, joining, and maintaining the site hierarchy. When a system administrator issues a command to create a site in a site hierarchy, the site's identity and its place in the site hierarchy are stored in the persistent store of the Site Manager for that site. Therefore, each Site Manager knows the identity of its parent and child sites, if it has any. When a Site Manager 210 for a child site is first started up, if the site has a parent site, the Site Manger 210 discovers the physical location of its parent site using the Network Service Infrastructure 250, and establishes communication with the Site Manager of its parent using a management protocol such as SNMP or SMI-S. Similarly, the Site Manager 210 of a parent site determines the physical location of their children sites using the Network Service Infrastructure 250 and establishes communication with them.
Each component 210, 220, 230, and 240 in the storage management module 200 has a different view of the site hierarchy, and some components in the site software program 200 do not even need to be aware of any such hierarchy. For example, the Data Service Manager 240 does not need to be aware of the site concept, and may be included only in leaf sites. From the perspective of a Node Manager 230 for a parent site, a child site is viewed as a virtual node with storage; and from the perspective of the Storage Resource Manager 220 for a parent site, a child site is viewed as a storage device of the parent site. Therefore, the storage virtualization model used by the Storage Resource Manager 220 for a parent site is the same as that for a leaf site, except that the Storage Resource Manager 220 for a parent site only deals with one type of storage device—one that corresponds to a child site. The Storage Resource Manager 220 of a site does not need to know or interact with the Storage Resource Manager 220 of another site, whether the other site is its parent site or its child site.
Since the parent sites do not have any physical entities, and instead rely on the physical entities of the leaf sites, the storage management module 200 for a leaf site can be structured differently from the storage management module 200 for a parent site.
A storage service request directed to a site is served by accessing the storage resources in the site. Referring to
The export operation is initiated by a site administrator who has privileges for the leaf site 130-L. The site administrator first requests the Storage Resource Manager component 220-L of the Storage management module 200-L for the leaf site to release the ownership of the exported object. It then contacts the Site Manager 210-P of the parent site 130-P using the site management interface to inform the parent site 130-P about the exported object. The Storage Resource Manager 220-L of the leaf site 130-L contacts its site node 230-L about the ownership change for this particular object. In turn, the site node 230-L propagates this change to the associated leaf nodes so that it can be recorded on persistent stores associated with the exported the objects.
Alternatives to the export approach discussed above include use of Access Control Lists to give permissions to administrators of the parent site to use some of the resources owned by its child sites.
A parent site's Site Manager may also connect to and manage its child sites through the Site Manager's external interfaces. This allows administrators to manage multiple child sites from a single parent by relaying commands entered at the parent site to a child site.
Unlike the storage management module for a leaf site, the storage management module 200-P for the parent site 130-P does not need to include its own Data Service Manager component, because the parent site does not have any physical resources. The Node Manager component 230-P of the parent site 130-P provides a virtual node representing a cluster of all of the site nodes corresponding to the child sites 130-C. The parent site's node manager 230-P also configures and communicates with the node manager(s) 230-C of the child site(s) 130-C by assigning storage resources in the parent site to the site nodes corresponding to the child sites. The node manager(s) 230-C of the child site(s) 130C in turn configure and assign the storage resources to the nodes belonging to the child site(s) 130-C. This continues if the child site(s) 130-C happen to be the parent(s) of other site(s), until eventually the storage resources in the parent site 130-P are assigned to one or more of the leaf nodes in one or more leaf sites.
The Site Manager 210 in each site management agent 200 is the component primarily responsible for the management of a geographically distributed site. In one embodiment, the Site Manager 210 for each site 130 is run with high availability. The high availability of the Site Manager 210 is achieved by running an active instance of the Site Manager 210 for each site and configuring one or more standby instances for each active instance of the Site Manager 210. In one embodiment, a site 130 is considered not available for management if neither an active Site Manager instance and nor a standby Site Manager instance is available. However, services provided by the data service manager 240, node manager 230, and storage resource manager 210 for the site may continue to be available even when the site is not available for management. In other words, the data and control paths associated with storage resources in a site will not be affected or degraded because of Site Manager failures.
In one embodiment of the present invention, the persistent store of the active instance of the Site Manager 210 is replicated by the standby instance of the Site Manager using known mirroring techniques. The standby instance of the Site Manager uses keep-alive messages to detect any failure of the active instance, and when a failure is detected, the standby instance of the Site Manager switches to an active mode and retrieves from its copy of the persistent store the state of the failed active instance of the Site Manager.
The instances of the Site Manager 210 for a site 130 can run on dedicated hosts 140 located anywhere in the network 100, or on nodes 120 in the site 130.
For a leaf site, the physical locations of the dedicated hosts 140 where the Site Manger instances run are independent of the physical locations of the leaf site, meaning that the dedicated hosts 140 may or may not be at the same physical location as the leaf site. Similarly, for a parent site, such as site W, the physical locations of the dedicated hosts 140 where the Site Manger instances run are independent of the physical locations of the child sites, such as site U and site V, meaning that the dedicated hosts 140 may or may not be at the same physical locations as the child sites. As illustrated in
Similarly, assuming a parent site, such as site F, is to be created for two other parent sites, such as site C and site E, the Site Manager of a leaf site that is a descendant of site C, such as site A, requests its site node SNA to create a Site Manager instance SMA for site F on one of its leaf nodes, which may or may not be the same leaf node the SMA for site A is running. With the active Site Manager instance for site F created on site A, the site node for site F is also created on site A. To add site E as the second child of site F, another Site Manager instance SMS for site F is created in a leaf site that is a descendant of site E, such as site D, by the site node SNA of site D. This other instance SMS becomes a standby instance of the Site Manager for site C.
Note that it is permissible to mix the two types of deployment of Site Manager instances, as discussed above in reference to
While the methods disclosed herein have been described and shown with reference to particular operations performed in a particular order, it will be understood that these operations may be combined, sub-divided, or re-ordered to form equivalent methods without departing from the teachings of the present invention. Accordingly, unless specifically indicated herein, the order and grouping of the operations is not a limitation of the present invention.
While the invention has been particularly shown and described with reference to embodiments thereof, it will be understood by those skilled in the art that various other changes in the form and details may be made without departing from the spirit and scope of the invention.
The present application claims priority to U.S. Provisional Application Ser. No. 60/586,516 entitled “Geographically Distributed Storage Management,” filed on Jul. 9, 2004, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60586516 | Jul 2004 | US |