The present disclosure relates to storage systems and more particularly, to scalable, zoned namespace, solid-state storage for a networked storage system.
Various forms of storage systems are used today. These forms include direct attached storage (DAS) network attached storage (NAS) systems, storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up data and others.
A storage system typically includes at least one computing system executing a storage operating system for storing and retrieving data on behalf of one or more client computing systems (“clients”). The storage operating system stores and manages shared data containers in a set of mass storage devices operating in a group of a storage sub-system. The storage devices (may also be referred to as “disks”) within a storage system are typically organized as one or more groups (or arrays), wherein each group is operated as a RAID (Redundant Array of Inexpensive Disks).
Applications that store and access data continue to evolve. For example, media, entertainment, and other types of applications need to efficiently store and retrieve data. e.g., for content/video streaming. Data can be stored as files and objects rather than blocks. Most stored data are immutable and, based on data lifecycle, may be stored for a long duration. The data lifecycle may begin as “hot,” which means initially data access and read frequency is high. Then as time progresses data becomes “warm” with lower access frequency than hot data. Eventually, the data may become “cold” data that is rarely accessed and changed.
Conventional all flash arrays (i.e., storage arrays with all solid-state drives (“SSDs”) are expensive. Traditional hard-drive systems are not able to meet the performance requirements to access stored data by these media applications because data cannot be stored or retrieved quickly enough. Continuous efforts are being made to develop technology for providing scalable storage solutions with reasonable cost of ownership with an optimum mix of processing, memory and storage ability to store and access data efficiently for evolving application needs.
The various features of the present disclosure will now be described with reference to the drawings of the various aspects disclosed herein. In the drawings, the same components may have the same reference numerals. The illustrated aspects are intended to illustrate, but not to limit the present disclosure. The drawings include the following Figures:
As a preliminary note, the terms “component”, “module”, “system,” and the like as used herein are intended to refer to a computer-related entity, either software-executing general-purpose processor, hardware, firmware and a combination thereof. For example, a component may be, but is not limited to being, a process running on a hardware processor, a hardware processor, an object, an executable, a thread of execution, a program, and/or a computer.
By way of illustration, both an application running on a server and the server can be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. Also, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate via local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems via the signal).
Computer executable components can be stored, for example, at non-transitory, computer readable media including, but not limited to, an ASIC (application specific integrated circuit), CD (compact disc), DVD (digital video disk), ROM (read only memory), floppy disk, hard disk, storage class memory, solid state drive, EEPROM (electrically erasable programmable read only memory), memory stick or any other storage device type, in accordance with the claimed subject matter.
In one aspect, innovative technology is provided for high capacity (e.g., in peta-bytes (“PB”)) storage devices that can be scaled up or down based on storage needs, independent of compute/memory that may be used for executing a storage operating system.
As an example, the storage devices 14 include zoned namespace solid state drives (“ZNS SSDs”). In one aspect, ZNS SSDs comply with the NVMe (Non-Volatile Memory Host Controller Interface) zoned namespace (ZNS) specification defined by the NVM Express® (NVMe®) standard organization. A “zone” as defined by the NVMe ZNS standard is a sequence of blocks that are written in a sequential fashion and are overwritten by performing a “Zone Erase” or “Zone Reset operation” per the NVMe specification. Storage space at each ZNS SSD is exposed as zones, e.g., physical zones (“PZones”) and RAID zones (“RZones”), each RAID zone having a plurality of PZones. The RZones are presented to software layers that interface with a file system to process read and write requests.
Conventional SSD systems face various challenges when it comes to shared SSD storage. For example, in a cluster-based storage system with multiple cluster storage nodes that provide access to storage, managing shared free space across clusters or shared file system metadata can be difficult, especially for a single multi core system. It is also difficult to implement distributed RAID on shared SSDs because it can be difficult to coordinate background RAID processing between multiple cluster nodes, as well as determining which node will respond to errors. In one aspect, as described below in detail, the technology disclosed herein solves various technical challenges that face conventional storage operating systems.
In one aspect, the storage space at multiple PB SSDs 14A-14N can be presented as a PB scale single namespace 15. In NVMe® technology, a namespace is a collection of logical block addresses (LBA) accessible to a software layer, e.g., a storage operating system instance. A namespace identifier (“NSID” or “NS”) is an identifier used by a NVMe controller (e.g., 16) to provide access to a namespace. A namespace is typically not a physical isolation of blocks, rather involves isolation of addressable logical blocks. The innovative technology disclosed herein uses conventional namespace (referred to as “CNS” in the specification and some of the Figures) to provide exclusive access to one storage operating system instance, and ZNS 19 (e.g., having zone 1-zone 20,000) to provide shared access to multiple storage operating system instances, as described below in detail. CNS in this context, as used herein, refers to a contiguous range of blocks which are randomly read/writable, whereas ZNS is a collection of zones where a zone is a range of blocks that can be randomly read, but written sequentially per the NVMe ZNS standard.
Storage space at various media types can be accessed via multiple namespaces shown as NSID1-NSID7. NSIDs 1-6 are configured to access the NVRAM 26 and HFE 27 type storage. NSID-16 provide exclusive access to NVRAM 26 and HFE 27 to various storage operating system instances, as described below in detail. NSID7 provides shared access to LFE, i.e., PB scale storage 29, also described below in detail.
Multiple NVMeoF controllers 16A-16B can read and write data via an interconnect 22 for requests received via network connections 18A/18B. As an example, interconnect 22 is an abstraction that represents any one or more separate physical buses and/or point-to-point connections, connected by appropriate bridges, adapters and/or controllers. Interconnect 22, therefore, may include, for example, a system bus, a Peripheral Component Interconnect (PCI) Express (PCIe) bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus (sometimes referred to as “Firewire”) or any other interconnect type.
As an example, data is stored redundantly across failure domains such that a single failure (e.g., 32) will not cause loss of data access because spare storage capacity, shown as 34, can be used to store data from the failed domain. If a network link (e.g., 18A) fails, then another network link (e.g., 18B) can be used to access storage. If one of the NVMeoF controller (e.g., 16A) fails, then the other controller (e.g., 16B) can be used to access the underlying storage using the assigned namespaces.
The file system 42 uses logical storage objects (e.g., a storage volume, a logical unit number (LUN) or any other logical object) to store information and retrieve information. The storage space at the storage devices (e.g., HFE 27 and LFE 29) is represented by one or more “aggregates,” and within each aggregate one or more storage volumes/LUNs are created. Each storage system instance has access to one or more aggregates to store and retrieve information i.e., the storage system instance owns the “storage.” To store and retrieve information, a computing device, typically issues write and/or read requests. Based on the request type (i.e., write or read request), the storage operating system instance 36 stores information at the storage space within one or more aggregate or retrieves information.
The file system 42 logically organizes stored information as a hierarchical structure for stored files/directories/objects. Each “on-disk” file may be implemented as a set of data blocks configured to store information, such as text, whereas a directory may be implemented as a specially formatted file in which other files and directories are stored. The data blocks are organized within a volume block number (VBN) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (FBN). The file system typically assigns sequences of FBNs on a per-file basis, whereas VBNs are assigned over a larger volume address space. The file system organizes the data blocks within the VBN space as a logical volume. The file system typically consists of a contiguous range of VBNs from zero to n, for a file system of size n−1 blocks.
As an example, the file system uses an inode, a data structure, to store information, such as metadata, about a file, whereas the data blocks are structures used to store the actual data for the file. The information in an inode may include, e.g., ownership of the file, file modification time, access permission for the file, size of the file, file type and references to locations of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks (e.g., L1 blocks.
Each storage operating system instance 36 may also include a protocol layer and an associated network access layer, to enable communication over a network with other systems. Protocol layer may implement one or more of various higher-level network protocols, such as NFS (Network File System) (44A-44N), CIFS (Common Internet File System) (46A-46N), S3 (48A-48N), Hypertext Transfer Protocol (HTTP), TCP/IP and others. The S3 protocol uses an HTTP REST (Representational State Transfer) API (Application Programming Interface) that utilizes HTTP requests e.g., “get”, “put”, “post,” and “delete,” requests for reading, storing and deleting data. The S3 48 interface is used to store and retrieve storage objects stored at cloud storage, as described below.
The network access layer may also include one or more drivers, which implement one or more lower-level protocols to communicate over the network, such as Ethernet.
Each operating system instance 36 may also include a storage access layer and an associated storage driver layer to communicate with the storage devices. The storage access layer may implement a higher-level disk storage protocol, such as a RAID layer, and a zone translation layer (ZTL), while the storage driver layer may implement a lower-level storage device access protocol, such as the NVMe protocol.
Each operating system instance 36 executes an exclusive interface (may also be referred to as exclusive RAID CNS) 38A-38N and a shared interface (may also be referred to as shared RAID ZNS) 40A-40N. The exclusive interface 38 provides access to exclusive private, HFE 27 for hot data and metadata using an exclusive namespace, while the shared interface 40 provides access to globally shared LFE 29 using a shared namespace. The globally shared LFE 29 may also be used to store hot read-only data 56 that is accessible to any of the storage operating system instances 36. This allows a system to promote read data that becomes hot but is still stored at a capacity tier (i.e., LFE 29). This configuration provides globally shared LFE 29 with “read anywhere” capability.
The
The
The capacity tier storage (i.e., LFE 29) may be managed by the storage operating system instance 37 with a storage module 70 that interacts with the LFE capacity tier storage 29. Data at the capacity tier 29 is accessed directly through shared interface 40 via read path 67A, while exclusive interface 38 accesses data at HFE 27. When data at HFE 27 becomes immutable, it is tiered down as immutable data 67B to LFE 29. Cold data 67C can also be tiered out to cloud storage 69 via interface 68.
In one aspect, using a dedicated capacity storage operating system instance 37 to manage LFE 29 is advantageous because the objects written to LFE 29 can be efficiently checked for duplicate blocks by the storage operating system instance 37, thus providing global dedupe across multiple instance objects.
In one aspect, the various namespaces (e.g., NSD1-NSID 12,
The configuration process starts the storage operating system instances 36 to discover the various namespaces. Once the namespaces are visible to each storage operating system instance 36, the ownership of each namespace is assigned. The ownership information regarding each namespace is maintained as specific block offsets at a storage location. The configuration process next configures RAID or other redundancy schemes over the namespaces. The specific configuration of redundancy scheme depends on whether a single appliance with multiple storage devices is being configured or a collection of appliances are being used. An example configuration for a single appliance could be RAID1 across failure domains. After RAID or other redundancy schemes have been configured, the storage system instances 36 create aggregates and volumes on the namespaces owned by each. The ZNS may be assigned ownership i.e., full read/write access by special storage system instances 36 that serve as shared cold data repositories to the other storage system instances 36, but read-only access is granted to the ZNS from non-owner instances. Ownership and shared access may be asserted using NVMe protocol reservation on the namespaces during system operation.
In capacity tier (e.g., LFE 29), aggregate 72B includes one or more capacity volumes 74B to store immutable data or readable hot data. The immutable data may be compressed and de-duplicated.
In the example of
In one aspect, to implement the configuration of
Process Flows:
The configuration process then starts the storage operating system instances to discover the various namespaces. Once the namespaces are visible to each instance, the ownership of each namespace is assigned. The ownership information regarding each namespace is maintained as specific block offsets. The configuration process next configures RAID or other redundancy schemes over the namespaces. The specific configuration of redundancy scheme depends on whether a single appliance is being configured or a collection of appliances are being used. An example configuration for a single appliance could be RAID1 across failure domains. After RAID or other redundancy schemes have been configured, the storage system instances 36 create aggregates and volumes on the namespaces owned by each. The ZNS 19 may be assigned ownership i.e., full read/write access by special storage system instances that serve as shared cold data repositories to the other storage system instances, but read-only access is granted to the ZNS from non-owner instances. Ownership and shared access may be asserted using NVMe protocol reservation on the namespaces during system operation.
In block B211, each storage operating system instance 36A-36N is initialized and discover the assigned exclusive namespace (e.g., NS1 and NS2,
In one aspect,
In block B204, exclusive namespace (e.g., NS1 and NS2,
In block B206, a shared namespace (e.g., NS4 and NS25,
In block B210, the storage operating system instances 36A-36N directly access data from portion 56 using the shared namespace, while continuing to use HFE 27 for read and write access.
In block B220, a shared namespace (e.g., NS4 and NS5,
In block B228, an exclusive namespace (e.g., NS1 and NS2,
In one aspect, in block B230, the first storage system instance 36A identifies data that may have become cold or immutable (e.g., file F2,
In block B232, the S3 BIN interface 66A of the first storage operating system instance 36A requests (e.g., S3 PUT,
In block B234, the capacity tier instance 37 transfers the file F2 as object X 76A and stores the object X 76A at the LFE 29. It is noteworthy that the object X 76A may also be stored at a cloud-based storage 69, as shown in
In one aspect, a method for using the HFE 27 and LFE 29 is provided. The method includes assigning (e.g., B228,
The method further includes utilizing (e.g., B236,
In one aspect, updating the metadata of the data object at the second portion includes storing a pointer (e.g., 82,
In one aspect, the first portion includes a first type of solid-state drive (e.g., QLC) and the second portion includes a second type (e.g., TLC) of solid-state drive, where the first type is a capacity tier with storage performance lower than the second type. Furthermore, the first namespace is a zoned namespace (e.g., ZNS 19) for providing shared read access to the first and second instance and write access to the second instance.
In block B244, an exclusive namespace (e.g., NS1, NS2 and NS 3,
In block B246, a shared namespace (e.g., NS4) is assigned to the multiple storage operating system instances 36A-36C to enable read access at LFE 29. The various zones in LFE 29 are configured such that some portions are writable by the storage operating system instances 36A-36C. For example, zone 54B is writable by the storage operating system instance 36A using namespace NS1, zone 54C is writable by the storage operating system instance 36B using namespace NS2 and zone 54D is writable by the storage operating system instance 36C using namespace NS3. Zones 54E, 54F, 54G, 54H and 54I are readable by any storage operating system instance 36A-36C using the shared namespace, NS4. HFE 27A-27C and NVRAM 26A-26C are used for storing metadata and buffered data.
In block B248, the read only and writable zones of LFE 29 are used by the storage operating system instances 36A-36C. The metadata can be used by each storage operating system instances 36A-36C to access data from the shared zones of LFE 29 using the shared namespace NS4. The metadata at HFE 27 is maintained using the exclusive namespace NS1-NS3 by the storage operating system instances 36A-36C, respectively.
In one aspect, process 240 can be implemented by a shared data structure (not shown) that stores zone information in LFE 29. This data structure can be replicated via multiple CNS to HFE 27 (and or NVRAM 26). Each zone may have the following states: “Free,” “Full”, “Readable by any”, or “Writable-by-owner”. Whenever a storage operating system instance 36 wants to modify the shared data structure to change the state of any zone it atomically obtains a lock on a page storing the zone state. After obtaining the lock the update to the state change is written to all replicas. The update is successful if a write quorum number of replicas were successfully updated, if not, the update is rolled back, and the lock is released. Other data structures for tracking shared zone information, for example, reference counts on data blocks in zones can be managed in a similar way. The reference counts are updated whenever a file is deleted or overwritten that release blocks within a zone.
In one aspect, methods and systems for are provided for using the configuration of
The method further includes utilizing (e.g., B248,
System 100:
In one aspect, the storage system 120 uses the storage operating system 124 to store and retrieve data from the storage sub-system 116 by accessing the storage devices 114 via storage device controllers 103A-103N (similar to the NVMeoF controller 116 (
In one aspect, system 100 also includes a cloud layer 136 having a cloud storage manager (may also be referred to as “cloud manager”) 122, and a cloud storage operating system (may also be referred to as “Cloud Storage OS”) 140 (similar to storage operating system instances 36,
As an example, a cloud provider 104, provides access to the cloud layer 136 and its components via a communication interface 112. A non-limiting example of the cloud layer 136 is a cloud platform, e.g., Amazon Web Services (“AWS”) provided by Amazon Inc., Azure provided by Microsoft Corporation, Google Cloud Platform provided by Alphabet Inc. (without derogation of any trademark rights of Amazon Inc., Microsoft Corporation or Alphabet Inc.), or any other cloud platform. In one aspect, communication interface 112 includes hardware, circuitry, logic and firmware to receive and transmit information using one or more protocols. As an example, the cloud layer 136 can be configured as a virtual private cloud (VPC), a logically isolated section of a cloud infrastructure that simulates an on-premises data center with the on-premise, storage system 120.
In one aspect, the cloud manager 122 is provided as a software application running on a computing device or within a VM for configuring, protecting and managing storage objects. In one aspect, the cloud manager 122 enables access to a storage service (e.g., backup, restore, cloning or any other storage related service) from a “micro-service” made available from the cloud layer 136. In one aspect, the cloud manager 122 stores user information including a user identifier, a network domain for a user device, a user account identifier, or any other information to enable access to storage from the cloud layer 136.
Software applications for cloud-based systems are typically built using “containers,” which may also be referred to as micro-services. Kubernetes is an open-source software platform for deploying, managing and scaling containers including the cloud storage OS 140, and the cloud manager 122. Azure is a cloud computing platform provided by Microsoft Corporation (without derogation of any third-party trademark rights) for building, testing, deploying, and managing applications and services including the cloud storage OS 140, the and cloud manager 122. Azure Kubernetes Service enables deployment of a production ready Kubernetes cluster in the Azure cloud for executing the cloud storage OS 140, and the cloud manager 122. It is noteworthy that the adaptive aspects of the present disclosure are not limited to any specific cloud platform.
The term micro-service as used herein denotes computing technology for providing a specific functionality in system 100 via the cloud layer 136. As an example, the cloud storage OS 140, and the cloud manager 122 are micro-services, deployed as containers (e.g., “Docker” containers), stateless in nature, may be exposed as a REST (representational state transfer) application programming interface (API) and are discoverable by other services. Docker is a software framework for building and running micro-services using the Linux operating system kernel (without derogation of any third-party trademark rights). As an example, when implemented as docker containers, docker micro-service code for the cloud storage OS 140, and the cloud manager 122 is packaged as a “Docker image file”. A Docker container for the cloud storage OS 140, and the cloud manager 122 is initialized using an associated image file. A Docker container is an active or running instantiation of a Docker image. Each Docker container provides isolation and resembles a lightweight virtual machine. It is noteworthy that many Docker containers can run simultaneously in a same Linux based computing system. It is noteworthy that although a single block is shown for the cloud manager 122 and the cloud storage OS 140, multiple instances of each micro-service (i.e., the cloud manager 122 and the cloud storage OS 140) can be executed at any given time to accommodate multiple user systems 108.
In one aspect, the cloud manager 122 and the cloud storage OS 140 can be deployed from an elastic container registry (ECR). As an example, ECR is provided by AWS (without derogation of any third-party trademark rights) and is a managed container registry that stores, manages, and deploys container images. The various aspects described herein are not limited to the Linux kernel or using the Docker container framework.
An example of the cloud storage OS 140 includes the “CLOUD VOLUMES ONTAP” provided by NetApp Inc., the assignee of this application. (without derogation of any trademark rights) The cloud storage OS 140 is a software defined version of a storage operating system 124 executed within the cloud layer 136 or accessible to the cloud layer 136 to provide storage and storage management options that are available via the storage system 120. The cloud storage OS 140 has access to cloud storage 128, which may include block-based, persistent storage that is local to the cloud storage OS 140 and object-based storage that may be remote to the cloud storage OS 140.
In another aspect, in addition to cloud storage OS 140, a cloud-based storage service is made available from the cloud layer 136 to present storage volumes (shown as cloud volume 142). An example of the cloud-based storage service is the “Cloud Volume Service,” provided by NetApp Inc. (without derogation of any trademark rights). The term volume or cloud volume (used interchangeably throughout this specification) means a logical object, also referred to as a storage object, configured to store data files (or data containers or data objects), scripts, word processing documents, executable programs, and any other type of structured or unstructured data. From the perspective of a user system 108, each cloud volume can appear to be a single storage drive. However, each cloud volume can represent the storage space in one storage device, an aggregate of some or all the storage space in multiple storage devices, a RAID group, or any other suitable set of storage space. The various aspects of the present disclosure may include both the Cloud storage OS 140 and the cloud volume service or either one of them.
As an example, user systems 108 are computing devices that can access storage space at the storage system 120 via the connection system 118 or from the cloud layer 136 presented by the cloud provider 104 or any other entity. The user systems 108 can also access computing resources, as a virtual machine (“VM”) (e.g., compute VM 110) via the cloud layer 136. A user may be the entire system of a company, a department, a project unit or any other entity. Each user system is uniquely identified and optionally, may be a part of a logical structure called a storage tenant (not shown). The storage tenant represents a set of users (may also be referred to as storage consumers) for the cloud provider 104 that provides access to cloud-based storage and/or compute resources (e.g., 110) via the cloud layer 136 and/or storage managed by the storage system 120.
In one aspect, host systems 102 are configured to execute a plurality of processor-executable applications 126A-126N (may also be referred to as “application 126” or “applications 126”), for example, a database application, an email server, and others. These applications may be executed in different operating environments, for example, a virtual machine environment, Windows, Solaris, Unix (without derogation of any third-party rights) and others. The applications 126 use storage system 120 or cloud storage 128 to store information at storage devices. Although hosts 102 are shown as stand-alone computing devices, they may be made available from the cloud layer 136 as compute nodes executing applications 126 within VMs (shown as compute VM 110).
Each host system 102 interfaces with a management module 134 of a management system 132 for managing backups, restore, cloning and other operations for the storage system 120. The management module 134 is used for managing and configuring various elements of system 100. Management system 132 may include one or more computing systems for managing and configuring the various elements. Although the management system 132 with the management module 134 is shown as a stand-alone module, it may be implemented with other applications, for example, within a virtual machine environment. Furthermore, the management system 132 and the management module 134 may also be referred to interchangeably throughout this specification.
In one aspect, the storage system 120 provides a set of storage volumes directly to host systems 102 via the connection system 118. In another aspect, the storage volumes are presented by the cloud storage OS 140, and in that context a storage volume is referred to as a cloud volume (e.g., 142). The storage operating system 124/cloud storage OS 140 present or export data stored at storage devices 114/cloud storage 128 as a volume (or a logical unit number (LUN) for storage area network (“SAN”) based storage).
The storage operating system 124/cloud storage OS 140 are used to store and manage information at storage devices 114/cloud storage 128 based on a request generated by application 126, user 108 or any other entity. The request may be based on file-based access protocols, for example, the Common Internet File System (CIFS) protocol or Network File System (NFS) protocol, over the Transmission Control Protocol/Internet Protocol (TCP/IP). Alternatively, the request may use block-based access protocols for SAN storage, for example, the Small Computer Systems Interface (SCSI) protocol encapsulated over TCP (iSCSI) and SCSI encapsulated over Fibre Channel (FC), object-based protocol or any other protocol.
In a typical mode of operation, one or more input/output (I/O) requests are sent over connection system 118 to the storage system 120 or the cloud storage OS 140, based on the request. Storage system 120/cloud storage OS 140 receives the I/O requests, issues one or more I/O commands to storage devices 114/cloud storage 128 to read or write data on behalf of the host system 102 and issues a response containing the requested data over the network 118 to the respective host system 102.
Although storage system 120 is shown as a stand-alone system, i.e., a non-cluster-based system, in another aspect, storage system 120 may have a distributed architecture; for example, a cluster-based system that may include a separate network module and storage module. Briefly, the network module is used to communicate with host systems 102, while the storage module is used to communicate with the storage devices 114.
Alternatively, storage system 120 may have an integrated architecture, where the network and data components are included within a single chassis. The storage system 120 further may be coupled through a switching fabric to other similar storage systems (not shown) which have their own local storage subsystems. In this way, all the storage subsystems can form a single storage pool, to which any client of any of the storage servers has access.
In one aspect, the storage system 120 (or the cloud storage OS 140) can be organized into any suitable number of virtual servers (may also be referred to as “VServers” or virtual storage machines), in which each VServer represents a single storage system namespace with separate network access. Each VServer has a specific client domain and a security domain that are separate from the client and security domains of other VServers. Moreover, each VServer can span one or more physical nodes, each of which can hold storage associated with one or more VServers. User systems 108/host 102 can access the data on a VServer from any node of the clustered system, through the virtual interface associated with that VServer. It is noteworthy that the aspects described herein are not limited to the use of VServers.
As an example, one or more of the host systems (for example, 102A-102N) or a compute resource (not shown) of the cloud layer 136 may execute a VM environment where a physical resource is time-shared among a plurality of independently operating processor executable VMs (including compute VM 110). Each VM may function as a self-contained platform, running its own operating system (OS) and computer executable, application software. The computer executable instructions running in a VM may also be collectively referred to herein as “guest software.” In addition, resources available within the VM may also be referred to herein as “guest resources.”
The guest software expects to operate as if it were running on a dedicated computer rather than in a VM. That is, the guest software expects to control various events and have access to hardware resources on a physical computing system (may also be referred to as a host system) which may also be referred to herein as “host hardware resources”. The host hardware resource may include one or more processors, resources resident on the processors (e.g., control registers, caches, and others), memory (instructions residing in memory, e.g., descriptor tables), and other resources (e.g., input/output devices, host attached storage, network attached storage or other like storage) that reside in a physical machine or are coupled to the host system.
Storage Operating System:
As an example, operating system 124/36 may include several modules, or “layers”. These layers include a file system 301 (similar to 42) that keeps track of a directory structure (hierarchy) of the data stored in storage devices and manages read/write operations, i.e., executes read/write operations on storage devices in response to host system 102 requests.
The storage operating system 124/36 may also include a protocol layer 303 and an associated network access layer 305, to allow storage system 120 to communicate over a network with other systems, such as host system 102, and management system 132. Protocol layer 303 may implement one or more of various higher-level network protocols, such as NFS (e.g., 44,
Network access layer 305 may include one or more drivers, which implement one or more lower-level protocols to communicate over the network, such as Ethernet. Interactions between host systems 102 and the storage sub-system 116 are illustrated schematically as a path, which illustrates the flow of data through storage operating system 124.
The storage operating system 124 may also include a storage access layer 307 and an associated storage driver layer 309 to communicate with a storage device 14. The storage access layer 307 may implement a higher-level disk storage protocol, such as RAID layer while the storage driver layer 309 may implement a lower-level storage device access protocol, such as the NVMe protocol.
It should be noted that the software “path” through the operating system layers described above needed to perform data storage access for a client request may alternatively be implemented in hardware. That is, in an alternate aspect of the disclosure, the storage access request data path may be implemented as logic circuitry embodied within a field programmable gate array (FPGA) or an ASIC. This type of hardware implementation increases the performance of the file service provided by storage system 120.
In addition, it will be understood to those skilled in the art that the invention described herein may apply to any type of special-purpose (e.g., file server, filer or storage serving appliance) or general-purpose computer, including a standalone computer or portion thereof, embodied as or including a storage system. Moreover, the teachings of this disclosure can be adapted to a variety of storage system architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The term “storage system” should therefore be taken broadly to include such arrangements in addition to any subsystems configured to perform a storage function and associated with other equipment or systems.
Processing System:
The processing system 400 includes one or more processors 402 and memory 404, coupled to a bus system 405. The bus system 405 shown in
The processors 402 are the central processing units (CPUs) of the processing system 400 and, thus, control its overall operation. In certain aspects, the processors 402 accomplish this by executing programmable instructions stored in memory 404. A processor 402 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.
Memory 404 represents any form of random-access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices. Memory 404 includes the main memory of the processing system 400. Instructions 406 which implements techniques introduced above may reside in and may be executed (by processors 402) from memory 404. For example, instructions 406 may include code for executing the process blocks of
Also connected to the processors 402 through the bus system 405 are one or more internal mass storage devices 410, and a network adapter 412. Internal mass storage devices 410 may be or may include any conventional medium for storing large volumes of data in a non-volatile manner, such as one or more magnetic or optical based disks. The network adapter 412 provides the processing system 400 with the ability to communicate with remote devices (e.g., storage servers) over a network and may be, for example, an Ethernet adapter, a FC adapter, or the like. The processing system 400 also includes one or more input/output (I/O) devices 408 coupled to the bus system 405. The I/O devices 408 may include, for example, a display device, a keyboard, a mouse, etc.
Cloud Computing: The system and techniques described above are applicable and especially useful in the cloud computing environment where storage is presented and shared across different platforms. Cloud computing means computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that may be rapidly provisioned and released with minimal management effort or service provider interaction. The term “cloud” is intended to refer to a network, for example, the Internet and cloud computing allows shared resources, for example, software and information to be available, on-demand, like a public utility.
Typical cloud computing providers deliver common business applications online which are accessed from another web service or software like a web browser, while the software and data are stored remotely on servers. The cloud computing architecture uses a layered approach for providing application services. A first layer is an application layer that is executed at client computers. In this example, the application allows a client to access storage via a cloud.
After the application layer is a cloud platform and cloud infrastructure, followed by a “server” layer that includes hardware and computer software designed for cloud specific services. The storage systems described above may be a part of the server layer for providing storage services. Details regarding these layers are not germane to the inventive aspects.
Thus, methods and apparatus for scalable storage appliance have been described. Note that references throughout this specification to “one aspect” or “an aspect” mean that a particular feature, structure or characteristic described in connection with the aspect is included in at least one aspect of the present invention. Therefore, it is emphasized and should be appreciated that two or more references to “an aspect” or “one aspect” or “an alternative aspect” in various portions of this specification are not necessarily all referring to the same aspect. Furthermore, the particular features, structures or characteristics being referred to may be combined as suitable in one or more aspects of the present disclosure, as will be recognized by those of ordinary skill in the art.
While the present disclosure is described above with respect to what is currently considered its preferred aspects, it is to be understood that the disclosure is not limited to that described above. To the contrary, the disclosure is intended to cover various modifications and equivalent arrangements within the spirit and scope of the appended claims.
This patent application claims priority under 35 USC § 119(e) to US Provisional Patent Application, entitled “SCALABLE SOLID-STATE STORAGE SYSTEM AND METHODS THEREOF”, Ser. No. 63/290,549 filed on Dec. 16, 2021, the disclosure of which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5664187 | Burkes et al. | Sep 1997 | A |
6101615 | Lyons | Aug 2000 | A |
8074021 | Miller et al. | Dec 2011 | B1 |
8463991 | Colgrove et al. | Jun 2013 | B2 |
8549222 | Kleiman et al. | Oct 2013 | B1 |
8775868 | Colgrove et al. | Jul 2014 | B2 |
8832373 | Colgrove et al. | Sep 2014 | B2 |
8850108 | Hayes et al. | Sep 2014 | B1 |
8862820 | Colgrove et al. | Oct 2014 | B2 |
9003144 | Hayes et al. | Apr 2015 | B1 |
9021297 | Hayes et al. | Apr 2015 | B1 |
9134917 | Kimmel et al. | Sep 2015 | B2 |
9201600 | Hayes et al. | Dec 2015 | B1 |
9218244 | Hayes et al. | Dec 2015 | B1 |
9229808 | Colgrove et al. | Jan 2016 | B2 |
9244769 | Colgrove et al. | Jan 2016 | B2 |
9367243 | Hayes et al. | Jun 2016 | B1 |
9483346 | Davis et al. | Nov 2016 | B2 |
9495255 | Davis et al. | Nov 2016 | B2 |
9525738 | Hayes et al. | Dec 2016 | B2 |
9563506 | Hayes et al. | Feb 2017 | B2 |
9588842 | Sanvido et al. | Mar 2017 | B1 |
9594633 | Colgrove et al. | Mar 2017 | B2 |
9672125 | Botes et al. | Jun 2017 | B2 |
9672905 | Gold et al. | Jun 2017 | B1 |
9798477 | Botes et al. | Oct 2017 | B2 |
9880899 | Davis et al. | Jan 2018 | B2 |
9934089 | Hayes et al. | Apr 2018 | B2 |
9967342 | Colgrove et al. | May 2018 | B2 |
10095701 | Faibish et al. | Oct 2018 | B1 |
10180879 | Colgrove et al. | Jan 2019 | B1 |
10248516 | Sanvido et al. | Apr 2019 | B1 |
10303547 | Hayes et al. | May 2019 | B2 |
10353777 | Bernat et al. | Jul 2019 | B2 |
10372506 | Baptist et al. | Aug 2019 | B2 |
10379763 | Colgrove et al. | Aug 2019 | B2 |
10387247 | Baptist et al. | Aug 2019 | B2 |
10387250 | Resch et al. | Aug 2019 | B2 |
10387256 | Dhuse et al. | Aug 2019 | B2 |
10402266 | Kirkpatrick et al. | Sep 2019 | B1 |
10417092 | Brennan et al. | Sep 2019 | B2 |
10432233 | Colgrove et al. | Oct 2019 | B1 |
10437673 | Baptist et al. | Oct 2019 | B2 |
10437678 | Resch | Oct 2019 | B2 |
10452289 | Colgrove et al. | Oct 2019 | B1 |
10467107 | Abrol et al. | Nov 2019 | B1 |
10489256 | Hayes et al. | Nov 2019 | B2 |
10503598 | Trichardt et al. | Dec 2019 | B2 |
10521120 | Miller et al. | Dec 2019 | B1 |
10530862 | Isely et al. | Jan 2020 | B2 |
10534661 | Resch | Jan 2020 | B2 |
10572176 | Davis et al. | Feb 2020 | B2 |
10579450 | Khadiwala et al. | Mar 2020 | B2 |
10606700 | Alnafoosi et al. | Mar 2020 | B2 |
10613974 | Dreier et al. | Apr 2020 | B2 |
10656871 | Peake | May 2020 | B2 |
10657000 | Resch | May 2020 | B2 |
10671480 | Hayes et al. | Jun 2020 | B2 |
RE48222 | Colgrove et al. | Sep 2020 | E |
10776204 | Resch et al. | Sep 2020 | B2 |
10810083 | Colgrove et al. | Oct 2020 | B1 |
10817375 | Colgrove et al. | Oct 2020 | B2 |
10838834 | Sanvido et al. | Nov 2020 | B1 |
10860424 | Dhuse et al. | Dec 2020 | B1 |
10891192 | Brennan et al. | Jan 2021 | B1 |
RE48448 | Colgrove et al. | Feb 2021 | E |
11269778 | Kanteti | Mar 2022 | B1 |
11340987 | Gole et al. | May 2022 | B1 |
11442646 | Agarwal | Sep 2022 | B2 |
11698836 | Gole et al. | Jul 2023 | B2 |
20060129873 | Hafner et al. | Jun 2006 | A1 |
20060242539 | Kang et al. | Oct 2006 | A1 |
20100332401 | Prahlad et al. | Dec 2010 | A1 |
20120084506 | Colgrove et al. | Apr 2012 | A1 |
20120151118 | Flynn et al. | Jun 2012 | A1 |
20140281227 | Herron | Sep 2014 | A1 |
20150169244 | Asnaashari et al. | Jun 2015 | A1 |
20150199151 | Klemm et al. | Jul 2015 | A1 |
20160313943 | Hashimoto | Oct 2016 | A1 |
20160342470 | Cudak et al. | Nov 2016 | A1 |
20170124345 | Christiansen | May 2017 | A1 |
20170220264 | Sokolov et al. | Aug 2017 | A1 |
20190004964 | Kanno | Jan 2019 | A1 |
20190018788 | Yoshida et al. | Jan 2019 | A1 |
20190278663 | Mehta et al. | Sep 2019 | A1 |
20200089407 | Baca et al. | Mar 2020 | A1 |
20200394112 | Gupta et al. | Dec 2020 | A1 |
20200409589 | Bennett et al. | Dec 2020 | A1 |
20200409601 | Helmick et al. | Dec 2020 | A1 |
20210081273 | Helmick et al. | Mar 2021 | A1 |
20210081330 | Bennett et al. | Mar 2021 | A1 |
20210132827 | Helmick et al. | May 2021 | A1 |
20210303188 | Bazarsky et al. | Sep 2021 | A1 |
20210334006 | Singh et al. | Oct 2021 | A1 |
20220027051 | Kant et al. | Jan 2022 | A1 |
20220137844 | Goss et al. | May 2022 | A1 |
20220197553 | BenHanokh | Jun 2022 | A1 |
20220229596 | Jung | Jul 2022 | A1 |
20220244869 | Kanteti | Aug 2022 | A1 |
20220283900 | Gole et al. | Sep 2022 | A1 |
20220291838 | Gorobets et al. | Sep 2022 | A1 |
20230082636 | Zhu et al. | Mar 2023 | A1 |
20230107466 | Gole | Apr 2023 | A1 |
Number | Date | Country |
---|---|---|
1343087 | Sep 2003 | EP |
Entry |
---|
European Search Report for Application No. EP22157793 dated Jul. 19, 2022, 16 pages. |
Dholakia A., et al., “A New Intra-disk Redundancy Scheme for High-Reliability RAID Storage Systems in the Presence of Unrecoverable Errors,” ACM Transactions on Storage, May 2008, vol. 4 (1), Article 1, 42 pages. |
Mao, B., et al., “HPDA: A Hybrid Parity-Based Disk Array for Enhanced Performance and reliability,” ACM Transactions on Storage (TOS), vol. 8, No. 1, Publication [online), Feb. 2012 [retrieved Apr. 4, 2016). Retrieved from the Internet: URL: http://or.nsfc.gov.cn/bitstream/00001903-5/90177/1/1000003549834.pdf , pp. 4.1-4.20. |
NetApp, Inc., “Data ONTAP® 7 .3 Data Protection Online Backup and Recovery Guide,” Feb. 22, 2011, Part No. 210-05212_A0, 432 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Active/Active Configuration Guide,” Jun. 16, 2011, Part No. 210-05247_A0, 214 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Archive and Compliance Management Guide,” Mar. 4, 2010, Part No. 210-04827_A0, 180 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Block Access Management Guide for iSCSI and FC,” Mar. 4, 2010, Part No. 210-04752_B0, 202 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Data Protection Tape Backup and Recovery Guide,” Jan. 15, 2010, Part No. 210-04762_A0, 142 pages. |
Netapp, Inc., “Data ONTAP® 7.3 Documentation Roadmap,” Jul. 9, 2008, Part No. 210-04229_A0, 8 pages. |
NetApp, Inc., “Data ONTAP® 7.3 File Access and Protocols Management Guide,” Sep. 10, 2009, Part No. 210-04505_B0, 382 pages. |
NetApp, Inc., “Data ONTAP® 7.3 MultiStore Management Guide,” Mar. 4, 2010, Part No. 210-04855_A0, 144 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Network Management Guide,” Jan. 15, 2010, Part No. 210-04757_A0, 222 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Software Setup Guide,” Nov. 4, 2010, Part No. 210-05045_A0, 116 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Storage Efficiency Management Guide,” Mar. 4, 2010, Part No. 210-04856_A0, 76 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Storage Management Guide,” May 3, 2012, Part No. 210-04766_B0, 356 pages. |
NetApp, Inc., “Data ONTAP® 7.3 System Administration Guide,” Nov. 11, 2010, Part No. 210-05043_A0, 350 pages. |
NetApp, Inc., “Data ONTAP® 7.3 Upgrade Guide,” Nov. 11, 2010, Part No. 210-05042_A0, 200 pages. |
NetApp, Inc., “Data ONTAP® 7.3.7 Release Notes,” May 31, 2012, Part No. 215-06916_A0, 182 pages. |
NetApp, Inc., “Date ONTAP® 7.3 Core Commands Quick Reference,” Jun. 2008, Part No. 215-03893_A0, 1 page. |
NetApp, Inc., “Notices,” 2010, Part No. 215-05705_A0, 46 pages. |
NetApp, Inc., “V-Series Systems Hardware Maintenance Guide,” Jul. 2006, Part No. 210-00975_A0, 202 pages. |
NetApp, Inc., “V-Series Systems Implementation Guide for Hitachi® Storage,” Dec. 2009, Part No. 210-04694_A0, 66 pages. |
NetApp, Inc., “V-Series Systems Installation Requirements and Reference Guide,” Oct. 2010, Part No. 210-05064_A0, 214 pages. |
NetApp, Inc., “V-Series Systems MetroCiuster Guide,” Jul. 2009, Part No. 210-04515_A0, 80 pages. |
NVM Express Base Specification; Mar. 9, 2020; Revision 1.4a; NVM Express Workgroup; 405 pages. |
Notice of Allowance on (co-pending U.S. Appl. No. 17/192,606) dated Jan. 28, 2022. |
International Preliminary Report on Patentability for Application No. PCT/US2021/028879, dated Oct. 25, 2022, 6 pages. |
International Search Report and Written Opinion for International Application No. PCT/US2021/028879, dated Aug. 5, 2021, 8 pages. |
Mao B., et al., “HPDA: A Hybrid Parity-Based Disk Array for Enhanced Performance and Reliability,” Retrieved from Internet URL: https://www.researchgate.net/publication/224140602, May 2020; 13 pages. |
Notice of Allowance on (co-pending U.S. Appl. No. 17/727,511) dated Dec. 14, 2022. |
Notice of Allowance on (co-pending U.S. Appl. No. 16/858,019) dated Dec. 20, 2022. |
Non-Final Office Action dated May 15, 2023 for U.S. Appl. No. 17/650,936, filed Feb. 14, 2022, 19 pages. |
Co-pending U.S. Appl. No. 17/456,012, inventors Doucette; Douglas P. et al., filed Nov. 22, 2021. |
Co-pending U.S. Appl. No. 17/650,936, inventors Abhijeet; Prakash Gole et al., filed Feb. 14, 2022. |
Notice of Allowance dated Mar. 1, 2023 for U.S. Appl. No. 17/727,511, filed Apr. 22, 2022, 15 pages. |
Non-Final Office Action for Co-pending U.S. Appl. No. 17/494,684 dated Mar. 30, 2023. |
Non-Final Office Action for Co-pending U.S. Appl. No. 17/456,012 dated Apr. 18, 2023. |
International Search Report and Written Opinion, International Patent Application No. PCT/US2022/049431, dated Mar. 3, 2023, 13 pgs. |
Non-Final Office Action dated Mar. 30, 2023 for U.S. Appl. No. 17/494,684, filed Oct. 5, 2021, 8 pages. |
Notice of Allowance dated Jun. 16, 2023 for U.S. Appl. No. 16/858,019, filed Apr. 24, 2020, 10 pages. |
Notice of Allowance dated Jun. 26, 2023 for U.S. Appl. No. 17/494,684, filed Oct. 5, 2021, 8 pages. |
Notice of Allowance dated Aug. 30, 2023 for U.S. Appl. No. 17/456,012, filed Nov. 22, 2021, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20230195383 A1 | Jun 2023 | US |
Number | Date | Country | |
---|---|---|---|
63290549 | Dec 2021 | US |