The present disclosure generally relates to data processing techniques and, more specifically, to systems and methods for storing and retrieving data.
Additionally, each computing device in system 100 includes an I/O Blender, which randomly mixes data produced by multiple applications running on the multiple virtual machines in the computing device. This random mixing of the data from different applications prevents storage controller 102 from optimizing the handling of this data, which can reduce the performance of applications running on all of the computing devices.
Non-limiting and non-exhaustive embodiments of the present disclosure are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various figures unless otherwise specified.
In the following description, reference is made to the accompanying drawings that form a part thereof, and in which is shown by way of illustration specific exemplary embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the concepts disclosed herein, and it is to be understood that modifications to the various disclosed embodiments may be made, and other embodiments may be utilized, without departing from the scope of the present disclosure. The following detailed description is, therefore, not to be taken in a limiting sense.
Reference throughout this specification to “one embodiment,” “an embodiment,” “one example,” or “an example” means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment of the present disclosure. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” “one example,” or “an example” in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures, databases, or characteristics may be combined in any suitable combinations and/or sub-combinations in one or more embodiments or examples. In addition, it should be appreciated that the figures provided herewith are for explanation purposes to persons ordinarily skilled in the art and that the drawings are not necessarily drawn to scale.
Embodiments in accordance with the present disclosure may be embodied as an apparatus, method, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware-comprised embodiment, an entirely software-comprised embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” or “system.” Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
Any combination of one or more computer-usable or computer-readable media may be utilized. For example, a computer-readable medium may include one or more of a portable computer diskette, a hard disk, a random access memory (RAM) device, a read-only memory (ROM) device, an erasable programmable read-only memory (EPROM or Flash memory) device, a portable compact disc read-only memory (CDROM), an optical storage device, and a magnetic storage device. Computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages. Such code may be compiled from source code to computer-readable assembly language or machine code suitable for the device or computer on which the code will be executed.
Embodiments may also be implemented in cloud computing environments. In this description and the following claims, “cloud computing” may be defined as a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned via virtualization and released with minimal management effort or service provider interaction and then scaled accordingly. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service), service models (e.g., Software as a Service (“SaaS”), Platform as a Service (“PaaS”), and Infrastructure as a Service (“IaaS”)), and deployment models (e.g., private cloud, community cloud, public cloud, and hybrid cloud).
The flow diagrams and block diagrams in the attached figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flow diagrams or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flow diagrams, and combinations of blocks in the block diagrams and/or flow diagrams, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flow diagram and/or block diagram block or blocks.
The systems and methods described herein relate to virtual controllers and associated data storage components and systems. In some embodiments, these virtual controllers are located within a computing device, such as a client device or a server device. Accordingly, the described systems and methods may refer to “server-side virtual controllers” or “client-side virtual controllers,” which include any virtual controller located in any type of computing device, such as the computing devices discussed herein.
In particular embodiments, one or more processors implement multiple virtual machines such that each of the multiple virtual machines execute one or more applications. A virtual controller manages the storage of data received from the multiple virtual machines. Multiple I/O (Input/Output) channels are configured to communicate data from the multiple virtual machines to one or more storage devices based on data storage instructions received from the virtual controller. In some embodiments, the I/O is from a single virtual machine (VM) rather than being mixed with I/O from other virtual machines. Additionally, in some embodiments, each I/O is isolated and communicated over separate channels to the storage device. In particular embodiments, each I/O channel is given a priority based on classes assigned to the virtual machine. I/O from a particular virtual machine may have priority processing over I/O from other virtual machines. There are multiple different priorities that can be given to the I/O channels. In some embodiments, an I/O channel is also created for the hypervisor and used for its metadata operations. This I/O channel is typically given the highest priority.
In some embodiments, a virtual controller receives data from an application executed by a virtual machine and determines optimal data storage parameters for the received data based on the type of data. The virtual controller also determines a quality of service (QOS) associated with the application and communicates the received data to at least one storage device based on the optimal data storage parameters and the quality of service associated with the application.
In particular embodiments, I/O channels are prioritized at both ends of the channel (i.e., the channels extend from the application executed by the virtual machine into each of the storage devices or storage nodes that store data associated with the application). By controlling the I/O at both ends of the channel, intelligent I/O control can be applied to ensure that one virtual machine does not overburden the inbound network of a storage node (i.e., fills its queues past the point where the storage node can prioritize the inbound traffic).
In some embodiments, the storage side of the system controls the flow of I/O from multiple hypervisors, each of which is hosting multiple virtual machines. Since each virtual machine can have different priority levels, and virtual machines can move between hosts (i.e., computing devices), the storage side must be in a position to prioritize I/O from many virtual machines running across multiple hosts. This is achieved by having control at each end of the channel, and a process that runs above all channels to determine appropriate I/O scheduling for each virtual machine. The I/O channels continue across the network and into each storage node. The storage node side of the I/O channel has the same priority classes applied to them as the host side of the I/O channel.
Each computing device 202, 204, and 206 includes a hypervisor 218, 220, and 222, respectively. Each hypervisor 218-222 creates, executes, and manages the operation of one or more virtual machines on the associated computing device. Each computing device 202, 204, and 206 also includes a virtual controller 224, 226, and 228, respectively. As discussed herein, virtual controllers 224-228 manage data read and data write operations associated with the virtual machines 212-216. In particular, virtual controllers 224-228 can handle input/output data (I/O) for each application running on a virtual machine. Since virtual controllers 224-228 understand the type of data (and the data needs) associated with each application, the virtual controllers can accelerate and optimize the I/O for each application. Additionally, since each computing device 202-206 has its own virtual controller 224-228, the number of supported computing devices can be scaled without significant loss of performance.
As shown in
Hypervisor 304 includes a NTFS/CSV module 320 which is an integral part of the operating system and provides a structured file system metadata and file input/output operations on a virtual store. In addition CSV provides a parallel access from plurality of host computers (e.g., computing devices) to a single shared virtual store allowing for Virtual Machine VHDX and other files to be visible and accessed from multiple hosts and virtual controllers simultaneously.
Virtual controller 306 includes a GS-SCSI DRV module 322 which is a kernel device driver providing access to the virtual store as a standard random access block device, visible to the operating system. The block device is formatted with NTFS/CSV file system. SCSI protocol semantics are used to provide additional capabilities required to share a block device by plurality of hosts in CSV deployment, such as SCSI Persistent Reservations.
Virtual controller 306 also includes an I/O isolator 324 which isolates (or separates) I/O associated with different virtual machines 308-318. For example, I/O isolator 324 will direct I/O associated with specific virtual machines to an appropriate virtual channel. As shown in
The platinum quality of service is the highest quality of service, ensuring highest I/O scheduling priority, expedited I/O processing, increased bandwidth, the fastest transmission rates, lowest latency and the like. The silver quality of service is the lowest quality of service and may receive, for example, less low scheduling priority, low bandwidth and lower transmission rates. The gold quality of service is in between the platinum and silver qualities of service.
Virtual channel 330, which receives the gold quality of service, has an assigned service level identified by the vertical height of virtual channel 330 shown in
By offering different quality of service levels, more services and/or capacity are guaranteed for more important applications (i.e., the applications associated with the platinum quality of service level).
As shown in
Storage node controller 702 includes a dispatcher 708 which binds a TCP/IP socket pair (connection) to a matching virtual channel and dispatches the network packets according to an associated service level (e.g., quality of service level).
Storage node controller 702 also includes six virtual channels 710, 712, 714, 716, 718, and 720. These six virtual channels correspond to similar virtual channels discussed herein, for example with respect to
In some embodiments, a GridCache is implemented as part of one or more virtual controllers. The GridCache includes a local read cache that resides locally to the hypervisor on a local Flash Storage device (SSD or PCIe). When data writes occur, a copy of the updated I/O is left in the local read cache, and a copy is written to storage nodes that contain a persistent writeback flash device for caching writes before they are moved to a slower spinning disk media. Advantages of the Distributed Writeback Cache include:
Provides the fastest possible IO for both data read and data write operations.
Eliminates the need to replicate between flash devices within the host. This offloads the duplicate or triplicate write-intensive I/O operations from the hosts (e.g., computing devices).
Frees up local Flash storage capacity that would otherwise have been used for replica data from another host. Typically 1-2 replicas are maintained in the cluster.
This freed capacity can be used for a larger read cache, which increases the probability of cache hits (by 100%). This in turn increases performance by eliminating network I/O that is served from the local cache.
This increases performance of the overall system by eliminating the replica I/O from host to host.
This also increases system performance by eliminating the need to replicate I/O to another host and then write to primary storage (doubling the amount of I/O that must be issued from the original host). This additional I/O eliminates bandwidth and processing from original I/O.
Write I/O is accelerated by a persistent Write-back Cache in each storage node. I/O is evenly distributed across a plurality of storage nodes. Before leaving the host, write I/O is protected up to a predefined number of node failures using a forward error correction scheme. I/O that arrives at a storage node is immediately put into a fast persistent storage device and acknowledgement is sent back to the host the I/O has completed. The I/O can be destaged at any time from fast storage to a slower media such as a hard disk drive.
Due to virtual controller I/O Isolation and channeling of I/O, I/O priorities can also be used to govern cache utilization. Some I/O may never be cached if it is low priority, and high priority I/O will be retained longer in cache than lower priority I/O (using a class-based eviction policy). Policies can be applied at both ends of the I/O channel.
Computing device 900 includes one or more processor(s) 902, one or more memory device(s) 904, one or more interface(s) 906, one or more mass storage device(s) 908, and one or more Input/Output (I/O) device(s) 910, all of which are coupled to a bus 912. Processor(s) 902 include one or more processors or controllers that execute instructions stored in memory device(s) 904 and/or mass storage device(s) 908. Processor(s) 902 may also include various types of computer-readable media, such as cache memory.
Memory device(s) 904 include various computer-readable media, such as volatile memory (e.g., random access memory (RAM)) and/or nonvolatile memory (e.g., read-only memory (ROM)). Memory device(s) 904 may also include rewritable ROM, such as Flash memory.
Mass storage device(s) 908 include various computer readable media, such as magnetic tapes, magnetic disks, optical disks, solid state memory (e.g., Flash memory), and so forth. Various drives may also be included in mass storage device(s) 908 to enable reading from and/or writing to the various computer readable media. Mass storage device(s) 908 include removable media and/or non-removable media.
I/O device(s) 910 include various devices that allow data and/or other information to be input to or retrieved from computing device 900. Example I/O device(s) 910 include cursor control devices, keyboards, keypads, microphones, monitors or other display devices, speakers, printers, network interface cards, modems, lenses, CCDs or other image capture devices, and the like.
Interface(s) 906 include various interfaces that allow computing device 900 to interact with other systems, devices, or computing environments. Example interface(s) 906 include any number of different network interfaces, such as interfaces to local area networks (LANs), wide area networks (WANs), wireless networks, and the Internet.
Bus 912 allows processor(s) 902, memory device(s) 904, interface(s) 906, mass storage device(s) 908, and I/O device(s) 910 to communicate with one another, as well as other devices or components coupled to bus 912. Bus 912 represents one or more of several types of bus structures, such as a system bus, PCI bus, IEEE 1394 bus, USB bus, and so forth.
For purposes of illustration, programs and other executable program components are shown herein as discrete blocks, although it is understood that such programs and components may reside at various times in different storage components of computing device 900, and are executed by processor(s) 902. Alternatively, the systems and procedures described herein can be implemented in hardware, or a combination of hardware, software, and/or firmware. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein.
Although the present disclosure is described in terms of certain preferred embodiments, other embodiments will be apparent to those of ordinary skill in the art, given the benefit of this disclosure, including embodiments that do not provide all of the benefits and features set forth herein, which are also within the scope of this disclosure. It is to be understood that other embodiments may be utilized, without departing from the scope of the present disclosure.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/940,247, entitled “Server-Side Virtual Controller,” filed Feb. 14, 2014, the disclosure of which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61940247 | Feb 2014 | US |