This disclosure relates generally to the field of data storage and in particular to sharable virtual data storage for a server.
A network-attached storage is a system that provides data access between servers and storage devices through a computer network. Network-attached shared storage topology may include a processor, a server machine, a storage array, and communication links (such as PCI express). The interfaces in the topology are used as cache storage rather than primary storage. Nonetheless, the performance of the interfaces in a network attached storage topology is limited due to the latency of the system.
For example, the number of IO's that a processor can handle may be limited by the computing power of the processor and then the processor may become a bottleneck that prevents an exploitation of the advantages associated with a higher performance storage device, such as a solid state memory. Second, the cache storage is not hot-pluggable since it is installed inside of a server and therefore reduces the serviceability of the system. Moreover, the system lacks of data storage security and reliability because the topology is non-redundant (e.g. if the storage associated with a server crashed, it is difficult to extract data from the failed array to recover the data that the server has stored). In addition, it is prohibitively expensive if an organization wants to build a network-attached storage system, which requires implementation of cache storage in a server (e.g. it usually costs approximately $30,000 for a 1 TB PCIe card.)
In one aspect, a method of processing a data request through a communication link from a server received by a shared device in a storage array which is at a remote location from the storage array. The method may include routing the data request between the server and the shared device present in the storage array.
Another illustrative aspect may include generating a virtual storage device in the server to enable the server to share a shared storage device in the storage array with other servers by means of a switching fabric between the server and the storage array.
The method may include routing storage data between the shared storage device and a set of data buffers in the server through the communication link when the storage data is accessed from the storage array using the virtual storage device.
The method may further include storing a set of mapping information for the virtual storage device as a mapping table in a management processor that is at a remote location from the server, deriving the locations on access for the virtual storage device algorithmically.
In yet another aspect, enumerating the shared device associated with the shared storage device in the storage array into a storage block to form an independent logical storage block, partitioning the shared storage device. The method may include assigning at least one of the storage blocks in the storage array to the server by the management processor.
The method may include distributing the storage data across at least one of the shared storage devices through the shared device. The method may further include requesting data from at least one of the shared storage devices.
The methods and systems disclosed herein may be implemented by any means for achieving various aspects, and may be executed in a form of a machine-readable medium embodying a set of instructions that, when executed by a machine, cause the machine to perform any of the operations disclosed herein. Other features will be apparent from the accompanying drawings and from the detailed description that follows.
Example embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Other features of the present embodiments will be apparent from the accompanying drawings and from the disclosure of the various embodiments.
Several methods and a system for a shareable virtual non-volatile storage device for a server are disclosed. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various embodiments. Although the present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments.
According to one embodiment disclosed herein, a set of shared drivers associated with a plurality of shared storage devices is centralized in a storage array and coupled with a number of servers which do not have any locally stored physical drives. In such a manner, the server may share the shared driver with other servers through a virtualized storage device located in the server. A processor (e.g. CPU in the storage array, management CPU, etc.) may assign one of the shared storage devices or multiple shared storage devices to at least one of the given servers. The processor may expose necessary information to both of the shared storage device and the server so that the server can communicate with the shared storage device directly without going through a physical drive.
In one embodiment, the Servers 104A-N may be data processing devices. In one embodiment, the data processing device may be a hardware device that includes a processor, a memory (not shown in
The Servers 104A-N may communicate with the Shared Drivers through the Communication Links 105A-N, according to one or more embodiments. In another embodiment, the Servers 104A-N may bypass a processor of the Server 104 when they access the Shared Drivers 100 in the Storage array 102 and route the Data Request 124 between the Server 104 and the Shared Drivers 100 in the Storage Array 102. In one embodiment, the Shared Drivers 100 in the Storage Array 102 may receive the Data Request 124 from a Server (e.g., Server 104A) that is at a remote location from the Storage Array 102 through a Communication Link (e.g., Communication Link 105A). In another embodiment, a Server (e.g., Server 104A) may receive a Storage Data 426 from the Shared Driver 100 in the Storage Array 102 and the Server (e.g., Server 104A) may send Storage Data to the Shared Driver 100 in the Storage Array 102.
In one embodiment, the Switching Fabric 114 may be one or more network switches. In one embodiment, the network switch may be a PCIe switch and/or a fibre channel switch, etc. In one embodiment, the Switching Fabric 114 may include the NTBs 122A-N to isolate each of the Servers 104A-N from the Switching Fabric 114 and the Management Processor 120. Although the NTBs 122A-N are not necessary, they provide clock isolation and also isolate the Servers 104A-N so that the Servers 104A-N can keep functioning when any of the Communication Links 106A-N get disconnected. Some or all of the NTBs 122A-N may be replaced with a transparent bridge, source route bridging, etc. In one embodiment, the Communication Links 116A-M and Communication Link 118 may be in accordance with the PCIe, PCI/PCI-X and/or AGP bus standard. In one embodiment, the Communication Links 116A-M and the Communication Link 118 may be NVMe or SCSIe communication protocols, etc. In another embodiment, the Communication Links 116A-M and the Communication Link 118 may include cables (e.g. PCIe cables), network switches, I/O ports and/or network bridges.
In one embodiment, the Shared Driver 100A for the Controller 202A associated with the Shared Storage Device 112A may be enumerated into the Storage Block 110A to form an independent logical storage block. In one embodiment, the Controller 202A may be registered to the logical mapping of the Storage Block 110A by the Management Processor 120. In one embodiment, the Controller 202A may perform a read/write operation transferring the Storage Data 426 between the Storage Block 110A and the Servers 104A-N. In one embodiment, the Controller 202A may receive a read/write descriptor setup from the Server 104. In one embodiment, the Shared Storage Device 112A may read by transferring the Storage Data 426 from one or more of the Servers 104A-N based on the mapping information of the Storage Device 112A and the read/write descriptor from the one or more of the Servers 104A-N received by the Controller 202A. In one embodiment, the Shared Storage Device 112A may write by transferring the Storage Data 426 through the Controller 202A to one or more of the Servers 104A-N based on the mapping information of the Storage Device 112A and the read/write descriptor from the one or more of the Servers 104A-N.
In operation 1006, the Virtual Storage Device 126 may be generated in the Servers 104A-N to enable the Servers 104A-N to share the Storage Device 112 in the Storage Array 102 with other servers through the Switching Fabric 114 between the Servers 104A-N and the Storage Array 102. In operation 1008, the Storage Data 426 may be routed between the Storage Device 112 in the Storage Array 102 and a set of Data Buffers 304 in each of the Servers 104A-N through the Communication Links 105A-N when the Storage Data 426 is accessed from the Storage Array 102 using the Virtual Storage Devices 126A-N.
Although the present embodiments has been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. For example, the various devices, modules, analyzers, generators, etc. described herein may be enabled and operated using hardware circuitry (e.g., CMOS based logic circuitry), firmware, software and/or any combination of hardware, firmware, and/or software (e.g., embodied in a machine readable medium).
For example, each of the Servers 104A-N may be a PC, the Shared Storage Device 112 may be a SSD, the Management Processor 120 may be a management CPU, the Communication Links 106A-N, 116A-M, and 118 may be PCIe buses and the Switching Fabric 114 may be a PCIe switching fabric. The system may consist of multiple PCs, multiple SSDs and a management CPU interconnected via a PCIe switching fabric. The management CPU and SSDs may be connected to the PCIe switching fabric via PCIe transparent ports. The PCs may be connected to the PCIe switching fabric via PCIe non-transparent ports. The resources in the System domain such as the Controller registers and the Server Mgmt Queue are mapped across the NTB into the Server domain. The management CPU may enumerate the SSD controllers and partitions the SSDs into blocks. Blocks from multiple SSDs may be aggregated to form a logical SSD block device. The mapping of the logical block devices to set of Controller ID, IO Queue ID and storage blocks is stored in persistent storage by the management CPU. The PCs during initialization may register with the management driver in the management CPU. The communication between the PCs and the management CPU may be done using the server management Queue implemented in the management CPU memory. The management driver may send the logical SSD block device information to the server management driver in the PCs. The logical block device information includes the Controller ID, the IO submission/completion queue ID on the Controller and the storage block information for each of the SSDs that are included in the logical block device. Alternately this mapping of logical drive to a set of Controller IDs, IO Queue IDs and storage blocks can be algorithmically derived by the server management driver based on the ID of the PCs. In this scheme each PCs in the system will be assigned a unique ID.
The logical driver information may be communicated to the server NVMe driver in the PC by the server management driver in the PC. The server NVMe driver may maintain the logical mapping for registers to a logical disk for each of the SSD controllers with the multi disk driver in the PC. The multi disk driver can be a separate module or a functionally separated module within the server NVMe driver. At this stage the multi disk driver may register the logical disk with the PC operation system and is available for data input and output to the SSDs.
The multi disk driver may issue an read/write IO request to the server NVMe driver. The server NMVe driver may map the IO operation to one of more NVMe IO queues based on the mapping of the logical drive and sets up the read/write descriptor in the mapped NVMe IO queues at the next available location in the NVMe IO queue. The server NVMe driver may write to the controller register to indicate the new read/write descriptors setup in the NVMe IO queue. The controllers may perform the read/write operation transferring data between the SSD and the data buffers in the PC. For a read operation data may be transferred from the SSD to the data buffers in the PCs. For a write operation data may be transferred from the data buffers to the SSD.
In addition, it will be appreciated that the various operations, processes, and methods disclosed herein may be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and may be performed in any order (e.g., including using means for achieving the various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
This application is a non-provisional application claiming priority to U.S. Provisional Patent Application Ser. No. 61/593,237 titled: “SHAREABLE VIRTUAL NON-VOLATILE STORAGE DEVICE FOR A SERVER,” filed on Jan. 31, 2012.
Number | Name | Date | Kind |
---|---|---|---|
6055603 | Ofer et al. | Apr 2000 | A |
6304942 | DeKoning | Oct 2001 | B1 |
6345368 | Bergsten | Feb 2002 | B1 |
6397267 | Chong, Jr. | May 2002 | B1 |
6421715 | Chatterjee et al. | Jul 2002 | B1 |
6425051 | Burton et al. | Jul 2002 | B1 |
6564295 | Okabayashi et al. | May 2003 | B2 |
6571350 | Kurokawa et al. | May 2003 | B1 |
6640278 | Nolan et al. | Oct 2003 | B1 |
6647514 | Umberger et al. | Nov 2003 | B1 |
6732104 | Weber | May 2004 | B1 |
6732117 | Chilton | May 2004 | B1 |
6802064 | Yao et al. | Oct 2004 | B1 |
6915379 | Honda et al. | Jul 2005 | B2 |
6965939 | Cuomo et al. | Nov 2005 | B2 |
6988125 | Elnozahy et al. | Jan 2006 | B2 |
7031928 | Cochran | Apr 2006 | B1 |
7035971 | Merchant | Apr 2006 | B1 |
7035994 | Tanaka et al. | Apr 2006 | B2 |
7437487 | Chikamichi | Oct 2008 | B2 |
7484056 | Madnani et al. | Jan 2009 | B2 |
7590522 | Hansen et al. | Sep 2009 | B2 |
7653832 | Faibish et al. | Jan 2010 | B2 |
7743191 | Liao | Jun 2010 | B1 |
7769928 | Tran et al. | Aug 2010 | B1 |
7836226 | Flynn et al. | Nov 2010 | B2 |
7870317 | Suresh | Jan 2011 | B2 |
7958302 | Cherian et al. | Jun 2011 | B2 |
8019940 | Flynn et al. | Sep 2011 | B2 |
8588228 | Onufryk et al. | Nov 2013 | B1 |
20010013059 | Dawson et al. | Aug 2001 | A1 |
20020035670 | Okabayashi et al. | Mar 2002 | A1 |
20020087751 | Chong, Jr. | Jul 2002 | A1 |
20020144001 | Collins et al. | Oct 2002 | A1 |
20020147886 | Yanai et al. | Oct 2002 | A1 |
20030074492 | Cochran | Apr 2003 | A1 |
20030126327 | Pesola et al. | Jul 2003 | A1 |
20030182504 | Nielsen et al. | Sep 2003 | A1 |
20040010655 | Tanaka et al. | Jan 2004 | A1 |
20040047354 | Slater et al. | Mar 2004 | A1 |
20040088393 | Bullen et al. | May 2004 | A1 |
20050039090 | Jadon et al. | Feb 2005 | A1 |
20050125426 | Minematsu | Jun 2005 | A1 |
20050154937 | Achiwa | Jul 2005 | A1 |
20050193021 | Peleg | Sep 2005 | A1 |
20060156060 | Forrer, Jr. et al. | Jul 2006 | A1 |
20060265561 | Boyd et al. | Nov 2006 | A1 |
20070038656 | Black | Feb 2007 | A1 |
20070083641 | Hu et al. | Apr 2007 | A1 |
20070168703 | Elliott et al. | Jul 2007 | A1 |
20070233700 | Tomonaga | Oct 2007 | A1 |
20080010647 | Chapel et al. | Jan 2008 | A1 |
20080071999 | Boyd et al. | Mar 2008 | A1 |
20080118065 | Blaisdell et al. | May 2008 | A1 |
20080140724 | Flynn et al. | Jun 2008 | A1 |
20080216078 | Miura et al. | Sep 2008 | A1 |
20090019054 | Mace et al. | Jan 2009 | A1 |
20090119452 | Bianchi | May 2009 | A1 |
20090150605 | Flynn et al. | Jun 2009 | A1 |
20090177860 | Zhu et al. | Jul 2009 | A1 |
20090248804 | Ohtani | Oct 2009 | A1 |
20090320033 | Gokhale et al. | Dec 2009 | A1 |
20100100660 | Tamagawa | Apr 2010 | A1 |
20100122021 | Lee et al. | May 2010 | A1 |
20100122115 | Olster | May 2010 | A1 |
20100211737 | Flynn | Aug 2010 | A1 |
20100223427 | Chan et al. | Sep 2010 | A1 |
20100241726 | Wu | Sep 2010 | A1 |
20100262760 | Swing et al. | Oct 2010 | A1 |
20100262772 | Mazina | Oct 2010 | A1 |
20110022801 | Flynn | Jan 2011 | A1 |
20110035548 | Kimmel et al. | Feb 2011 | A1 |
20110055458 | Kuehne | Mar 2011 | A1 |
20110060882 | Efstathopoulous | Mar 2011 | A1 |
20110060887 | Thatcher et al. | Mar 2011 | A1 |
20110060927 | Fillingim et al. | Mar 2011 | A1 |
20110078496 | Jeddeloh | Mar 2011 | A1 |
20110087833 | Jones | Apr 2011 | A1 |
20110179225 | Flynn et al. | Jul 2011 | A1 |
20110219141 | Coile et al. | Sep 2011 | A1 |
20110219208 | Asaad et al. | Sep 2011 | A1 |
20110289261 | Candelaria | Nov 2011 | A1 |
20110289267 | Flynn et al. | Nov 2011 | A1 |
20110296133 | Flynn et al. | Dec 2011 | A1 |
20110296277 | Flynn et al. | Dec 2011 | A1 |
20110314182 | Muppirala | Dec 2011 | A1 |
20120011340 | Flynn et al. | Jan 2012 | A1 |
20120102291 | Cherian | Apr 2012 | A1 |
Number | Date | Country |
---|---|---|
2309394 | Apr 2011 | EP |
WO2008070174 | Jun 2008 | WO |
WO2010117929 | Oct 2010 | WO |
WO2011019596 | Feb 2011 | WO |
WO2011031903 | Mar 2011 | WO |
2011106394 | Sep 2011 | WO |
Entry |
---|
Oxford Dictionaries, http://www.oxforddictionaries.com/us/definition/american—english/enumerate. |
http://technet.microsoft.com/en-us/library/cc708298%28v=ws.10%29.aspx. |
Number | Date | Country | |
---|---|---|---|
20130198450 A1 | Aug 2013 | US |
Number | Date | Country | |
---|---|---|---|
61593237 | Jan 2012 | US |