At least one embodiment of the disclosure pertains to data storage systems, and more particularly, to network and storage interfaces of a storage system.
In a typical storage system, there are a number of bottlenecks. These bottlenecks may exist in processing, in data transport, or in permanent or temporary data storage. Storage servers have host processors and host memory modules therein. The host processors used by the storage servers typically are connected to separate Peripheral Component Interconnect Express (PCIe) daughter cards for interfacing to a network (e.g., Ethernet) and storage devices (e.g., serial attached SCSI (SAS)), respectively.
A client machine may send requests and exchange data with the storage server using a network interface of the network daughter card. The storage server may respond to these requests by reading and writing data to/from storage devices using a storage interface of the storage daughter card. Inside the storage server, data travels from one daughter card, through the host processor and host memory module(s), to the other daughter card. The exchange and processing of data between the daughter cards can lead to bottlenecks in either or both of the host processor and the host memory module(s). For example, the host processor may load large amount of data structures related to a file system onto the host memory module(s) while the network daughter card is also transferring a large amount of payload received from the network to the host memory module(s). This creates a memory bottleneck in the storage server and can slow down the entire storage system.
The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
The disclosed technology is directed to a storage system where a storage interface and a network interface have a channel to communicate datasets without having to first load the datasets to a host memory of a host server or involve a host processor of the host server during movement of payload data of I/O requests from one interface to another. The host server may be responsible for hosting a file system or a structured data storage system. The host server includes a host processor system including one or more host processors. The host memory includes one or more host memory modules.
In one embodiment, the host server includes both a network interface and a storage interface on a single dual interface daughter card coupled to the host processor system, e.g., coupled through a PCIe interface. The dual interface daughter card establishes the channel to communicate datasets between the storage interface and the network interface without loading the datasets to the host memory. Data can be exchanged between the network interface and the storage interface using a local memory of the dual interface card, such that the host memory is not involved in the bulk of the transfer. Optionally, the dual interface card may offload some of the data processing from the host processor system as well. The local memory may be a single shared memory space. Alternatively, the local memory may include a portion allocated for incoming data through the storage interface and a portion allocated for incoming data through the network interface.
In another embodiment, the host server is coupled to an external appliance. The external appliance can include both a network interface and a storage interface. Similar to the dual interface daughter card, the external appliance manages both interfaces for the host storage server. When responding to read/write requests to the network interface, the external appliance can maintain a large portion of incoming and outgoing data through the storage devices and the network in the external appliance without having to transfer the data over to the host memory.
In some embodiments, the channel to communicate datasets without having to first load the datasets to the host memory is accomplished without placing the network interface and the storage interface in a single device. For example, a protocol for direct communication between a storage daughter card and a network daughter card can be established. When responding to a read/write request, portions of incoming and outgoing data from the storage devices and the network can remain in the daughter cards without being first loaded onto the host memory. As another example, a protocol for direct communication between an external storage appliance and an external network appliance can be established. When responding to a read/write request, portions of incoming and outgoing data from the storage devices and the network can remain in the external appliances without being first loaded onto the host memory.
The embodiments and implementations described in this disclosure enables a channel between the storage interface and the network interface to reduce memory bottleneck that can occur in the host memory. Further, because of a shared memory space for both the storage interface and the network interface, a dual interface processing system can further reduce computational bottlenecks that may occur on the host processor system. Compared to the conventional storage server setup, the disclosed technology increases throughput for data exchange through both network and storage.
The storage daughter card 108 includes a storage controller 112, a storage card DRAM 114, and a storage interface 116. The storage interface 116 is connected to one or more storage devices, e.g., hard disk drives, solid state drives, flash drives, tape drives or other types of persistent storage. The storage controller 112 is configured to process messages to and from the storage devices through the storage interface 116. The storage card DRAM 114 is for storing incoming or outgoing data through the storage interface 116. Whenever the storage controller 112 executes a command to transfer data out through a network connected to the network daughter card 106, the data is first sent to the host CPU 102 and stored in the host DRAM 104 before forwarding the command to the network daughter card 106.
The network daughter card 106 includes a network controller 122, a network card DRAM 124, and a network interface 126. The network interface 126 is connected to a network, e.g., wired or a wireless network. The network controller 122 is configured to process messages to and from the network through the network interface 126. The network card DRAM 124 is for storing incoming or outgoing data through the network interface 126. Whenever a message (e.g., a write request) includes a command to access a storage device connected to the storage daughter card 108, incoming payload data is first sent to the host CPU 102 and stored in the host DRAM 104 before relaying the message to the storage daughter card 108.
For example, when a write request arrives at the network interface 126, payload data and control information of the write request are stored in the network card DRAM 124. Then both the payload data and the control information are transferred to the host CPU 102 and stored in the host DRAM 104. The host CPU 102 then processes the control information to determine specific instructions for the storage devices, and sends the payload data to the storage daughter card 108 for storage. The payload data is then again stored in the storage card DRAM 114. Under this conventional system architecture, the payload data is redundantly stored in at least three separate memory devices.
For another example, when a read request arrives at the network interface 126, control information for the read request is passed from the network controller 122 to the host CPU 102 and then to the storage controller 112. The storage controller 112 retrieves the requested data through the storage interface 116 connected to the storage devices. The requested data is stored first in the storage card DRAM 112 then transferred to the host CPU 102 and stored in the host DRAM 104. The host CPU 102 then forwards the requested data to the network daughter card 106. The network controller 122 stores the requested data temporarily in the network DRAM 124 before transmitting the requested data to a requesting client through the network interface 126. Again under this conventional system architecture, the requested data is redundantly stored in at least three separate memory devices.
The dual interface card 204 includes a control device 212, a card memory 214, a network interface 218, and a storage interface 220. The control device 212 may be one or more of a processor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or other types of controller. The card memory 214 may be volatile memory, e.g., DRAM module(s), or non-volatile memory, e.g., solid state memory module(s). The network interface 218 is connected to one or more external networks, either via a wired connection or a wireless connection. The storage interface 220 is connected to one or more storage devices, including a hard disk drive, a flash drive, a solid-state drive, a tape drive, other persistent storage device, or any combination thereof. The network interface 218 and the storage interface 220 are connected (directly or indirectly) to the control device 212. The network interface 218 and the storage interface 220 may also be connected (directly or indirectly) to the card memory 214 (not shown).
The dual interface card 204 may be a replacement of the network daughter card 106 and the storage daughter card 108 of
For another example, when a read request arrives at the network interface 218, control information is passed from the controller device 212 to the host processor system 202 to determine specific instruction(s) to retrieve requested data from the one or more storage devices and to send the requested data to a particular client over the network. The host processor system 202 responds by sending the specific instruction(s) to the control device 212. In response to receiving the specific instruction(s), the control device 212 can retrieve the requested data from the one or more storage devices. Once retrieved, the requested data can be stored in the card memory 214. According to the specific instruction(s) from the host processor system 202, the control device 212 then sends the requested data through the network interface 218 to the particular client over the network. Under the disclosed system architecture with the dual interface card 204, the requested data is no longer redundantly stored, and is only stored in the card memory 214 (i.e., not store in the host memory 208) throughout the read request process.
The host storage server 302 includes a host processor system 314 and a host memory space 316. The host processor system 314 is a system of one or more processors. The host memory space 316 is a memory space implemented by one or more memory modules, e.g., DRAM or other volatile memory. The host storage server 302 includes an appliance interface 318 for coupling with the control bus 306. The appliance interface 318 can relay messages from the interface appliance 304 to the host processor system 314 and relay messages from the host processor system 314 to the interface appliance 304. Optionally, the appliance interface 318 can have a connection with the host memory space 316. In some embodiments, the appliance interface 318 can share a connection to the host processor system 314 as the host memory space 316.
The interface appliance 304 includes a host interface 322 receiving the control bus 306 connecting the host storage server 302 and the interface appliance 304. The host interface 322 enables a control device 324 of the interface appliance 304 to communicate with the host storage server 302, particularly the host processor system 314. The interface appliance 304 further includes a local memory space 326 for storing incoming or outgoing data from the network interface 310 or the storage interface 312. The local memory space 326 may be volatile memory, e.g., DRAM module(s), or non-volatile memory, e.g., solid state memory module(s). The host interface 322, the local memory space 326, the network interface 310, and the storage interface 312 can individually have a connection with the control device 324. Alternatively, two or more of the host interface 322, the local memory space 326, the network interface 310, and the storage interface 212 can share a connection with the control device 324.
The interface appliance 304 may be a replacement in functionalities to the network daughter card 106 and the storage daughter card 108 of
Payload data and/or control information of a write request are stored in the local memory space 326. The control information (e.g., only the control information) is transferred to the host processor system 314 to be processed. A link to the payload data stored in the local memory space 326 may also be sent to the host processor system 314. In various embodiments, either a portion (i.e., not the whole) of the payload data or none of the payload data is sent to the host processor system 314. For example, the control information of the write request may specify which portion of the payload data to forward to the host processor system 314. The host processor system 314 then processes the control information to determine specific instruction(s) for the one or more storage devices connected through the storage interface 312. The specific instruction(s) is sent to the control device 324. In response to the specific instruction(s), the control device 324 sends the payload data through the storage interface 312 to the one or more storage devices according to the specific instruction(s). Under the disclosed system architecture of the interface appliance 304, the payload data is no longer redundantly stored, and is only stored in the local memory space 326 (i.e., not store in the host memory space 316) throughout the write request process.
For another example, when responding to a read request arriving at the network interface 310, control information is passed from the controller device 324 to the host processor system 314 to determine specific instruction(s) to retrieve requested data from the one or more storage devices and to send the requested data to a particular client over the network. The host processor system 314 responds by sending the specific instruction(s) to the control device 324. In response to receiving the specific instruction(s), the control device 324 can retrieve the requested data from the one or more storage devices. Once retrieved, the requested data can be stored in the local memory space 326. According to the specific instruction(s) from the host processor system 314, the control device 324 then sends the requested data through the network interface 310 to the particular client over the network. Under the disclosed system architecture of the interface appliance 304, the requested data is no longer redundantly stored, and is only stored in the local memory space 326 (i.e., not store in the host memory space 316) throughout the read request process.
Blocks, components, and/or modules associated with the storage server 200 and the storage system 300 may be implemented as hardware modules or a combination of hardware and software modules. Controlling modules may be operable as a processor or other computing device, e.g., a single board chip, application specific integrated circuit, or a field programmable field array.
Each of the modules may operate individually and independently of other modules. Some or all of the modules may be executed on the same host device or on separate devices. The separate devices may be coupled via a communication module to coordinate its operations via a wired interconnect or wirelessly. Some or all of the modules may be combined as one module.
A single module may also be divided into sub-modules, each sub-module performing separate method step or method steps of the single module. In some embodiments, the modules can share access to a memory space. One module may access data accessed by or transformed by another module. The modules may be considered “coupled” or capable of communicating with one another if they share a physical connection or a virtual connection, directly or indirectly, allowing data accessed or modified from one module to be accessed in another module. The storage server 200 and/or the storage system 300 may include additional, fewer, or different modules for various applications.
Then in step 404, a controller (e.g., a processor or other control device) of the dual interface device can parse the write request to payload data and control data. The controller may be the control device 212 of
In step 410, the host processor processes the write request referencing a storage system data structure(s) (e.g., file object namespace, storage object metadata, or data block metadata) available to the host processor (e.g., stored in the host memory or on a persistent storage directly available to the host processor). The storage system data structure may be data and/or metadata related to data objects and data blocks of the storage system. Then in step 412, the dual interface device can receive a response instruction from the host processor. The host processor can generate and send the response instruction, in response to processing the write request with the control data and/or the storage system data structure (e.g., as in step 410). The response instruction may indicate where and how to store the payload data into one or more storage devices accessible to a storage interface of the dual interface device. For example, the storage interface may be the storage interface 220 of
Then in step 504, a controller (e.g., a processor or other control device) of the dual interface device can send at least a portion of the read request to a host processor. For example, the at least a portion of the read request can include control data of the read request or constitute the entirety of the read request. The controller may be the control device 212 of
In step 506, the host processor processes the read request with a storage system data structure(s) (e.g., file object namespace, storage object metadata, or data block metadata) available to the host processor (e.g., stored on a host memory of the host processor or on a persistent storage directly available to the host processor). The storage system data structure may be data and/or metadata related to data objects and data blocks of the storage system. Then in step 508, the dual interface device can receive a response instruction from the host processor. The host processor can generate and send the response instruction, in response to processing the read request with the control data and/or the storage system data structure (e.g., as in step 506). The response instruction may indicate where to retrieve the requested data as indicated in the read request from one or more storage devices accessible to a storage interface of the dual interface device. The response instruction may also indicate how to respond back to the client device with the requested data indicated in the read request. For example, the storage interface may be the storage interface 220 of
In response to the response instruction, the controller of the dual interface device retrieves the requested data through the storage interface to store on a local memory of the dual interface device in step 510. The local memory may be the card memory 214 of
While processes or blocks are presented in a given order in
Control data may be stored in the network headers 604. For example, the control data may include who is sending the write request, what data object(s) or data container(s) the write request is related to, security information of the write request, scheduling and other timing information related to the write request, or any combination thereof. In some embodiments, the control data may also be stored in the network trailers 608. The payload data pieces 606 include digital bits representing the data to be written to one or more storage devices in the storage system.
After the write request 602 is processed by a controller (e.g., the control device 212 of
After a response instruction is received at the controller of the dual interface device (e.g., as in step 412 of
Each of the command packets includes a storage header (e.g., a first storage header 624A or a second storage header 624B, collectively as “storage headers 624”). Each of the command packets also includes a portion of the payload data 616 (e.g., a first payload data piece 626A or a second payload data piece 626B, collectively as the “payload data pieces 626”). In some embodiments, the payload data pieces 626 may correspond to the payload data pieces 606. In other embodiments, the payload data pieces 626 do not correspond to the payload data pieces 606. Each of the command packets further includes a storage trailer (e.g., a first storage trailer 628A or a second storage trailer 628B, collectively as “storage trailers 628”). Either or both of the storage headers 624 or the storage trailers 628 may include information indicating where and how the payload data pieces 606 are to be written to the one or more storage devices.
Control data may be stored in the network header 704. For example, the control data may include who is sending the read request, what data object(s) or data container(s) the read request is related to, security information of the read request, scheduling and other timing information related to the read request, or any combination thereof. In some embodiments, the control data may also be stored in the network trailer 708 or the payload data portion 706.
After a response instruction is received at the controller of the dual interface device (e.g., as in step 508 of
The read command 712 includes a storage header 714, a payload data 716, and a storage trailer 718. Control data of the read command 712 may be stored in the storage header 714. Alternatively, the control data may be stored in the payload data portion 716 of the read command 712 or the storage trailer portion 718 of the read command 712. The control data may include information indicating where and how the data requested may be retrieved from the one or more storage devices.
In response to executing the read command 712 through the storage interface, the storage interface may return with a read response 722 to the controller of the dual interface device. The read response 722 includes one or more storage packets (e.g., a first storage packet 722A and a second storage packet 722B, collectively as the read response 722). Each of the storage packets of the read response 722 includes a storage header (e.g., a first storage header 724A or a second network header 724B, collectively as “storage headers 724”). Each of the storage packets of the read response 722 also includes a portion of the requested data (e.g., a first data piece 726A or a second data piece 726B, collectively as the “requested data pieces 726”). Each of the storage packets of the read response 722 further includes a storage trailer (e.g., a first storage trailer 728A or a second storage trailer 728B, collectively as “storage trailers 728”).
The storage headers 724 may include control information originating from the storage devices. The requested data pieces 726 in combination represents the requested data as indicated in the read request 702 for transmitting out to a destination client device. The storage trailers 728 may indicate the end of each storage packet. The storage headers 724 may differ from the storage header 714 of the read command 712. The storage trailers 728 may also differ from the storage trailer 718 of the read command 712.
In response to receiving the read response 722, the controller of the dual interface device can temporarily store data collected from the read response 722 in a local read storage 732. Requested data 734, consisting of the requested data pieces 726, may be stored in the local read storage 732. Control data 736 may also be stored in the local read storage 732. The control data 736 includes storage system metadata of the requested data, I/O related information, source and destination information, or any combination thereof. The control data 736 may reference the requested data 734, the requested data 734 may reference the control data 736, or both can reference each other. The control data 736 may be extracted from the network header 704 of the read request 702 and/or the storage headers 724 of the read response 722.
In some embodiments, the controller generates a client data transmission 742, in response to receiving the read response 722 through the storage interface without first storing the requested data 734 in the local read storage 732. In other embodiments, the client data transmissions 732 may be generated asynchronous to receipt of the read response 722. For example, the requested data 734 is first stored in the local read storage 732 before being used to generate the client data transmission 742.
The client data transmission 742 comprises network transmission packets (e.g., a first network packet 742A and a second network packet 742B, collectively as the “client data transmission 742”) for the network interface. The client data transmission 742 enables the network interface to deliver the requested data 734 to one or more storage devices connected to the storage interface.
Each of the network transmission packets includes a network header (e.g., a first network header 744A or a second network header 744B, collectively as “network headers 744”). Each of the network packets also includes a portion of the requested data 734 (e.g., a first payload data piece 746A or a second payload data piece 746B, collectively as the “payload data pieces 746”). In some embodiments, the payload data pieces 746 may correspond to the requested data pieces 726. In other embodiments, the payload data pieces 746 do not correspond to the requested data pieces 726 and are partitioned differently from the requested data 734. Each of the network packets further includes a network trailer (e.g., a first network trailer 748A or a second network trailer 748B, collectively as “network trailers 748”). Either or both of the network headers 744 or the network trailers 748 can include information indicating where and how the payload data pieces 746 are to be delivered to the destination client device across a network connected to the network interface.