Management software in virtualized environments is used to monitor hardware resources, such as host systems, storage arrays, and virtual machines (VMs) running in the host systems. The management software also enables resource management operations such as placement of VMs and load balancing across the host systems. One example of such management software is VMware vSphere® by VMware, Inc. of Palo Alto, Calif.
Existing resource management solutions are optimized to execute efficiently in a virtualized computing system that includes a small number of hardware resources. When the number of hardware resources included in the virtualized computing system becomes very large, such solutions do not scale well and the management thereof becomes quite inefficient. For example, a cloud-based computing system includes thousands of hardware resources that provide the physical infrastructure for a large number of different computing operations. In such cloud-based computing systems, proper initial placement and load balancing across the hardware resources is critical to avoid computing bottlenecks that can result in serious problems including a reduction in speed of VMs executing on a host system that is overloaded, potential data loss when no more free space is available in a storage array, and the like. Unfortunately, the complexity and inefficiency of load balancing scales with the number of hardware resources that are involved.
Accordingly, of benefit in the art would be a technique for providing an efficient way to manage a large number of hardware resources.
One or more embodiments of the present invention provide a method for performing initial placement and load balancing of data objects in a distributed system. The distributed system includes hardware resources, e.g., host systems and storage arrays, which are configured to execute and/or store data objects, e.g., VMs and their associated virtual machine disk format (VMDK) files. A data object is initially placed into the distributed system by a method that includes the steps of creating a virtual cluster of hardware resources, i.e., a set of hardware resources, that are compatible to execute and/or host the data object, selecting from the virtual cluster a hardware resource that is optimal for executing and/or hosting the data object, and then placing the data object into the selected hardware resource. A load balancing operation can be performed across the virtual cluster. Upon completion of the load balancing operation, the virtual cluster is released, and the distributed system is returned to its original state with the data object included therein.
A method for performing initial placement of a data object in a distributed system that includes a plurality of hardware resources, according to an embodiment of the present invention, includes the steps of determining a list of hardware resources that satisfy one or more criteria of the data object, creating a virtual cluster that includes a subset of the hardware resources included in the list of hardware resources, selecting a hardware resource from the virtual cluster into which the data object is to be placed, and placing the data object into the hardware resource.
A method of performing a load balancing operation across a plurality of hardware resources, according to an embodiment of the present invention, includes the steps of receiving a signal from each of a plurality of agents, that indicates a loading level of a hardware resource on which the agent is executing, generating a list of hardware resources that are overloaded and a list of hardware resources that are underloaded, selecting, from the list of hardware resources that are overloaded, a first subset of hardware resources, selecting, from the list of hardware resources that are underloaded, a second subset of hardware resources, creating a virtual cluster that includes the first subset of hardware resources and the second subset of hardware resources, and performing a load balancing operation that causes data objects to be transferred between the hardware resources included in the virtual cluster.
A system, according to an embodiment of the present invention, configured to perform an initial placement of a data object, comprises a plurality of hardware resources and a server machine. The server machine is configured to determine a list of hardware resources that satisfy one or more criteria of the data object, create a virtual cluster that includes a subset of the hardware resources included in the list of hardware resources, select a hardware resource from the virtual cluster into which the data object is to be placed, and place the data object into the hardware resource.
Further embodiments of the present invention provide a non-transitory computer-readable storage medium that includes instructions for causing a computer system to carry out one or more of the methods set forth above.
In some embodiments, VMs 125-127 run on top of a hypervisor (not shown), which is a software interface layer of the host system that enables sharing of the hardware resources of the host system. The hypervisor may run on top of an operating system executing on the host system or directly on hardware components of the host system. Each VM includes a guest operating system and one or more guest applications. The guest operating system is a master control program of the VM and forms a software platform on top of which the guest applications run. As also shown, an agent 132 is included in each of host systems 122-124. Information associated with the virtualization settings and configuration of host systems 122-124, and VMs 125-127 included therein, is transmitted to VM manager 102 via agent 132. In one embodiment, VM manager 102 interacts with agent 132 on each host system to exchange information using application programming interface (API) calls.
VM manager 102 communicates with storage arrays 106 via storage network 104 and is configured to interact with agent 108 to coordinate storage of VM data files, such as small VM configuration files and large virtual disks, within storage devices 112 included in each of storage arrays 106. VM manager 102 may also obtain information associated with storage arrays 106 by communicating with any agent 132 executing in host systems 122-124, where the agent 132 communicates with one or more storage arrays 106 and maintains information associated therewith. For example, agent 132 may be configured to communicate with agent 108 to manage a table of information associated with any of storage arrays 106 such that VM manager 102 is not required to be in direct communication with storage arrays 106. The communication between agents may be performed periodically or on demand depending on the configuration of virtualized computer system 100.
In one embodiment, agent 108 is a computer program executing on one or more processors. Each storage array 106 may also include a plurality of storage processors. Both storage network 104 and network 120 may be a wide area network, a local area network, or a network hosting a protocol especially suited for storage arrays, such as Fibre Channel, iSCSI, HyperSCSI, etc. For example, storage network 104 may comprise one or more of Fibre Channel switches. Each of storage arrays 106 may be any type of storage array such as a network-attached storage (NAS) filer. While storage arrays are typically made up of a plurality of disks, it should be recognized that as prices for solid-state non-volatile storage devices continue to decrease, non-volatile storage is increasingly taking the place of rotating disk storage media. The use of the term, “disk” herein, should therefore not be construed as limited only to rotating disk storage media, but also what is become known as solid state disks, or “SSDs.”
As described in greater detail herein, embodiments of the invention provide a technique for initial placement of VMs within host systems 122 and, further, initial placement of VM data files within storage arrays 106. Embodiments of the invention further provide a technique for performing load balancing across host systems 122 and/or storage arrays 106. Though
For example, in
After the subset is established, data object 201 is associated with one of the hardware resources in the subset. In the example illustrated in
VM manager 102 initializes the placement of VM data file 302 by determining which storage array of storage arrays 306-314 is compatible and/or optimized for storing VM data file 302. For example, VM data file 302 may require being stored on a storage array that offers read/write speeds that match or exceed a particular rate. In another example, VM data file 302 may require being stored on a storage array that provides high reliability, e.g., a storage array configured according to RAID-5 or RAID-6 standards. To make this determination, VM manager 102 directs a query to agent 108, where the query includes the requirements of VM data file 302. In response, agent 108 analyzes storage arrays 306-314 according to the requirements of VM data file 302 and replies to VM manager 102 with a collection of valid storage arrays that are capable of storing VM data file 302, e.g., storage arrays 306-313 (storage array 314 is invalid). Alternatively, VM manager 102 directs the query to agent(s) 132 to obtain the collection of valid storage arrays, as described above in conjunction with
As depicted in
Turning now to
Finally, as depicted in
As described above in conjunction with
VM manager 102 queries agent 132 executing within each of host systems 504-513 to determine which of host systems 504-513 are compatible for hosting new VM 502. Again, such querying may be performed on-demand or may be periodically performed where the data is maintained in, e.g., a table of information, as described above. For example, VM 502 may require that the host system includes a compact disk (CD) reader, a quad-core processor, and random access memory (RAM) that runs at or above a particular frequency, e.g., 500 MHz. Each instance of agent 132 receives the query and issues a reply that indicates whether the corresponding host system satisfies the requirements of the query. In the example illustrated in
Similar to the technique described above in conjunction with
Turning now to
Finally, as depicted in
At step 602, VM manager 102 broadcasts a query to a plurality of agents. The query includes one or more criteria for a new virtual machine. At step 604, VM manager 102 receives, from the plurality of agents, a list of hardware resources (e.g., host systems) that are compatible for hosting the new virtual machine. Alternatively, VM manager 102 may reference statistical information associated with the hardware resources—such as cached data maintained by VM manager 102—that was obtained via recent queries made to the plurality of agents. At step 606, VM manager 102 selects a subset of the hardware resources from the list of hardware resources. At step 608, VM manager 102 creates a virtual cluster that includes the subset of the hardware resources. At step 610, VM manager 102 selects a hardware resource in the virtual cluster for hosting the new virtual machine. In one embodiment, the selected hardware resource is based on a greedy criterion, e.g., locating an optimal hardware resource. In another embodiment, the hardware resource is selected at random. At step 612, VM manager 102 places the new virtual machine in the hardware resource. At step 614, VM manager 102 optionally performs load balancing across the virtual cluster, as indicated by the dotted lines around step 614. Performing load balancing is described in greater detail in
As described above in conjunction with
At step 706, VM manager 102 selects a subset of the overloaded hardware resources from the list of overloaded hardware resources. At step 708, VM manager 102 selects a subset of the underloaded hardware resources from the list of underloaded hardware resources. In one embodiment, the subset of the overloaded hardware resources is substantially similar in size to the subset of underloaded hardware resources.
At step 710, VM manager 102 creates a virtual cluster that includes the subset of the overloaded hardware resources and the subset of the underloaded hardware resources. At step 712, VM manager 102 performs load balancing across the virtual cluster. At step 714, VM manager 102 releases the virtual cluster.
The above steps described in method 700 may also be applied to perform power management operations within virtualized computer system 100. Various power management techniques may be implemented, such as those provided by VMware's DRS software. VM manager 102 may periodically query hardware resources to determine which hardware resources are underloaded and compatible with one another, e.g., hardware resources that execute and/or host few data objects, where the data objects are substantially similar to one another. VM manager 102 then creates a virtual cluster of these underloaded hardware resources and attempts to power-off one or more of the hardware resources by first transferring the data objects executing and/or hosted thereon to a different hardware resource included in the virtual cluster. Prior to performing the transfer, VM manager 102 checks to make sure that the hardware resources will not be overloaded when receiving the data objects.
Conversely, VM manager 102 may also power-on hardware resources when virtualized computer system 100 is overloaded. In one embodiment, VM manager 102 queries hardware resources to determine overloaded hardware resources, and VM manager 102 also identifies powered-off host systems that are similar to the overloaded hardware resources. VM manager 102 then powers-on the compatible hardware resources and creates a virtual cluster that includes the compatible hardware resources and the overloaded hardware resources. VM manager 102 subsequently performs a load balancing operation across the virtual cluster such that data objects executing and/or hosted by the overloaded hardware resources are transferred to the powered-on compatible resources.
The various embodiments described herein may employ various computer-implemented operations involving data stored in computer systems. For example, these operations may require physical manipulation of physical quantities—usually, though not necessarily, these quantities may take the form of electrical or magnetic signals, where they or representations of them are capable of being stored, transferred, combined, compared, or otherwise manipulated. Further, such manipulations are often referred to in terms, such as producing, identifying, determining, or comparing. Any operations described herein that form part of one or more embodiments of the invention may be useful machine operations. In addition, one or more embodiments of the invention also relate to a device or an apparatus for performing these operations. The apparatus may be specially constructed for specific required purposes, or it may be a general-purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general-purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
The various embodiments described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer readable medium refers to any data storage device that can store data, which can thereafter be input to a computer system—computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD (Compact Discs)—CD-ROM, a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
Although one or more embodiments of the present invention have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein, but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
Virtualization systems in accordance with the various embodiments may be implemented as hosted embodiments, non-hosted embodiments or as embodiments that tend to blur distinctions between the two, are all envisioned. Furthermore, various virtualization operations may be wholly or partially implemented in hardware. For example, a hardware implementation may employ a look-up table for modification of storage access requests to secure non-disk data.
Many variations, modifications, additions, and improvements are possible, regardless the degree of virtualization. The virtualization software can therefore include components of a host, console, or guest operating system that performs virtualization functions. Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claims(s).
Number | Name | Date | Kind |
---|---|---|---|
7539782 | Davis et al. | May 2009 | B2 |
7693991 | Greenlee et al. | Apr 2010 | B2 |
7694304 | Kissell | Apr 2010 | B2 |
7725559 | Landis et al. | May 2010 | B2 |
7870301 | Arndt et al. | Jan 2011 | B2 |
7941804 | Herington et al. | May 2011 | B1 |
7945652 | Tsao et al. | May 2011 | B2 |
7996510 | Vicente | Aug 2011 | B2 |
8095928 | Kallahalla et al. | Jan 2012 | B2 |
8112593 | Pandey | Feb 2012 | B2 |
8166473 | Kinsey et al. | Apr 2012 | B2 |
8180604 | Wood et al. | May 2012 | B2 |
8200738 | Roush et al. | Jun 2012 | B2 |
8230434 | Armstrong et al. | Jul 2012 | B2 |
8244882 | Davidson | Aug 2012 | B2 |
8250572 | Dahlstedt | Aug 2012 | B2 |
8271976 | Vega et al. | Sep 2012 | B2 |
8296419 | Khanna et al. | Oct 2012 | B1 |
8296434 | Miller et al. | Oct 2012 | B1 |
8316110 | Deshmukh et al. | Nov 2012 | B1 |
8341623 | Korupolu et al. | Dec 2012 | B2 |
8346935 | Mayo et al. | Jan 2013 | B2 |
8352940 | Pafumi et al. | Jan 2013 | B2 |
8356306 | Herington | Jan 2013 | B2 |
8386610 | Yahalom et al. | Feb 2013 | B2 |
8458699 | Dasari et al. | Jun 2013 | B2 |
8516489 | Laverone et al. | Aug 2013 | B2 |
8539010 | Inakoshi | Sep 2013 | B2 |
8595364 | Yahalom et al. | Nov 2013 | B2 |
8595714 | Hamer | Nov 2013 | B1 |
8612615 | Ferris et al. | Dec 2013 | B2 |
8615501 | Lorenz et al. | Dec 2013 | B2 |
8621080 | Iyoob et al. | Dec 2013 | B2 |
8656448 | Archer et al. | Feb 2014 | B2 |
8667171 | Guo et al. | Mar 2014 | B2 |
8671179 | Altaf et al. | Mar 2014 | B2 |
8677356 | Jacobs et al. | Mar 2014 | B2 |
8700876 | Shah et al. | Apr 2014 | B2 |
8713563 | Kondoh et al. | Apr 2014 | B2 |
8737408 | Cohn et al. | May 2014 | B1 |
8738972 | Bakman et al. | May 2014 | B1 |
8769102 | Zhou et al. | Jul 2014 | B1 |
8799892 | Hepkin | Aug 2014 | B2 |
20080295096 | Beaty et al. | Nov 2008 | A1 |
20090070771 | Yuyitung et al. | Mar 2009 | A1 |
20090119664 | Pike et al. | May 2009 | A1 |
20090222583 | Josefsberg et al. | Sep 2009 | A1 |
20100100877 | Greene et al. | Apr 2010 | A1 |
20100115509 | Kern et al. | May 2010 | A1 |
20100211958 | Madison et al. | Aug 2010 | A1 |
20100223618 | Fu et al. | Sep 2010 | A1 |
20100293544 | Wilson et al. | Nov 2010 | A1 |
20100306382 | Cardosa et al. | Dec 2010 | A1 |
20110078467 | Hildebrand | Mar 2011 | A1 |
20110185064 | Head et al. | Jul 2011 | A1 |
20110209146 | Box et al. | Aug 2011 | A1 |
20110258320 | Jackson | Oct 2011 | A1 |
20120005346 | Burckart et al. | Jan 2012 | A1 |
20120096293 | Floyd et al. | Apr 2012 | A1 |
20120096461 | Goswami et al. | Apr 2012 | A1 |
20120166323 | Guo | Jun 2012 | A1 |
20120246317 | Eriksson et al. | Sep 2012 | A1 |
20120304175 | Damola et al. | Nov 2012 | A1 |
20130013766 | Britsch et al. | Jan 2013 | A1 |
20140016650 | Chai | Jan 2014 | A1 |
20140137117 | Ding et al. | May 2014 | A1 |
Number | Date | Country |
---|---|---|
101765225 | Jun 2010 | CN |
102075434 | May 2011 | CN |
2071779 | Jun 2009 | EP |
201003027 | Feb 2013 | IN |
05020104 | Jan 1993 | JP |
2009163710 | Jul 2009 | JP |
Entry |
---|
Sotomayor, B. et al. “Virtual Infrastructure Management in Private and Hybrid Clouds,” IEEE Internet Computing, vol. 13, Issue 5, Sep.-Oct. 2009, pp. 14-22. |
Waldspurger, Carl. “Memory Resource Management in VMWare ESX Server,” ACM Proceedings of the 5th Symposium on Operating Systems (SIGOPS) Design and Implementation, vol. 36, Issue SI, Winter 2002, pp. 181-194. |
Govil, Kinshuk et al. “Cellular Disco: Resource Management Using Virtual Clusters on Shared Memory Multiprocessors,” ACM Proceedings of the Seventh ACM Symposium on Operating Systems Principles (SOSP), ACM SIGOPS Operating Systems Review, vol. 33, Issue 5, Dec. 1999, pp. 154-169. |
Foster, I. et al. “Virtual Clusters for Grid Communities,” Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), vol. 1, May 16-19, 2006, pp. 513-520. |
Ajay Gulati et al., “Decentralized Management of Virtualized Hosts”, U.S. Appl. No. 13/159,935, filed Jun. 14, 2011. |
Ajay Gulati et al., “Decentralized Management of Virtualized Hosts”, U.S. Appl. No. 13/160,358, filed Jun. 14, 2011. |
Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration; International Search Report and Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2011/053741 filed Sep. 28, 2011. |
Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration; International Search Report and Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2011/053745 filed Sep. 28, 2011. |
Anonymous: “DRS Performance and Best Practices”, Internet Article, 2008, XP002668551, Retrieved from the Internet: URL:http://www.vmware.com/files/pdf/drs—performance—best—practices—wp.pdf [retrieved on Feb. 1, 2012] pp. 1-20. |
Anonymous: “VMware Distributed Resource Scheduler (DRS): Dynamic Load Balancing and Resource Allocation for Virtual Machines”, Internet Article, 2009, XP002668552, Retrieved from the Internet: URL:http://www.vmware.com/files/pdf/VMware-Distributed-Resource-Scheduler-DRS-DS-EN.pdf [retrieved on Feb. 1, 2012] pp. 1-3. |
Number | Date | Country | |
---|---|---|---|
20120324071 A1 | Dec 2012 | US |