The following description is provided to assist the understanding of the reader. None of the information provided or references cited is admitted to be prior art.
Virtual computing systems are widely used in a variety of applications. Virtual computing systems include one or more host machines running one or more virtual machines concurrently. The one or more virtual machines utilize the hardware resources of the underlying one or more host machines. Each virtual machine may be configured to run an instance of an operating system. Modern virtual computing systems allow several operating systems and several software applications to be safely run at the same time on the virtual machines of a single host machine, thereby increasing resource utilization and performance efficiency. However, present day virtual computing systems still have limitations due to their configuration and the way they operate.
In accordance with at least some aspects of the present disclosure, a method is disclosed. The method includes obtaining, by a computing system, mapping information mapping a plurality of virtual processors and a plurality of virtual memories included in the computing system to a plurality of physical processors and a plurality of physical memories in a physical resource, the plurality of physical processors having non-uniform memory access times to the plurality of physical memories. The method further includes obtaining, by a computing system, a first set of latency values associated with the non-uniform memory access times between the plurality of physical processors and the plurality of physical memories. The method also includes generating, by the computing system, a second set of latency values associated with access times between the plurality of virtual processors and the plurality of virtual memories based on the mapping information and the first set of latency values.
In accordance with some other aspects of the present disclosure, a system is disclosed. The system includes a physical processing resource including a plurality of physical processors and a plurality of physical memories; the plurality of physical processors having non-uniform memory access times to the plurality of physical memories. The system also includes a virtual machine including a plurality of a plurality of virtual processors and a plurality of virtual memories. The virtual machine is configured to obtain mapping information mapping the plurality of virtual processors and the plurality of virtual memories to the plurality of physical processors and the plurality of physical memories. The virtual machine is further configured to obtain a first set of latency values associated with the non-uniform memory access times between the plurality of physical processors and the plurality of physical memories. The virtual machine is also configured to generate a second set of latency values associated with access times between the plurality of virtual processors and the plurality of virtual memories based on the mapping information and the first set of latency values.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the following drawings and the detailed description.
The foregoing and other features of the present disclosure will become apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the disclosure and are, therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings.
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated and make part of this disclosure.
The present disclosure is generally directed to operating one or more virtual machines in a computing system including non-uniform memory access (NUMA) physical resource architecture. The NUMA architecture includes one or more physical processing nodes, where each physical processing node includes one or more physical processing cores and one or more physical memory banks. The physical processing cores can have non-uniform access times to the one or more memory banks. The virtual machines include at least one virtual processing node, where each virtual processing node includes virtual processing cores and virtual memory banks.
One technical problem encountered in such computing systems is the lack of latency information associated with virtual processing cores and virtual memory banks available at the virtual machine. The NUMA architecture at the physical level can have non-uniform latencies or access times between the physical processing cores and the physical memory banks. A hypervisor can map virtual processing cores and virtual memory banks to physical processing cores and physical memory banks. Thus, there can be non-uniform latencies between virtual processing cores and virtual memory banks. Some virtual machines may not have the ability to provide latency information or may provide uniform latencies to the operating system. As a result, process or thread scheduling by the operating system on the virtual processing cores and the virtual memory banks may become inefficient or unpredictable, and may affect the performance of the computing system.
The discussion below provides at least one technical solution to the technical problems mentioned above. For example, the computing system discussed below, the virtual processing cores also can be configured to have non-uniform access times to the virtual memory banks. That is, the virtual machine is configured to include a virtual NUMA architecture. The virtual machine is configured to obtain latency information associated with the physical processing nodes, and generate latency information associated with the virtual processing nodes based on the latency information associated with the physical processing nodes and mapping information between the virtual processing nodes and the physical processing nodes. The virtual machine can provide the generated latency information to the operating system. In turn, the operating system can assign processes and threads to virtual processing cores and virtual memory banks based, in part, on the latency information. Thus, the operating system can schedule processes or threads to appropriate virtual processing cores and memory banks to reduce execution time, or improve throughput.
The virtual machine can obtain the latency times associated with the physical processing nodes from a hypervisor. The virtual machine can be configured to update the latency times associated with the virtual processing nodes if changes in the mapping information or changes in the latency times associated with the physical processing nodes is detected.
Providing and updating the latency information for the virtual processing nodes allows the virtual machine or the operating system running on the virtual machine to schedule, assign, or allocate virtual processing cores and virtual memory banks based on the current latency information. As such, the operating system can assign and reassign processes and threads to improve execution times and throughput of the computing system.
Referring now to
The virtual computing system 100 may also include a storage pool 140. The storage pool 140 may include network-attached storage 145 and direct-attached storage 150. The network-attached storage 145 may be accessible via the network 135 and, in some embodiments, may include cloud storage 155, as well as local storage area network 160. In contrast to the network-attached storage 145, which is accessible via the network 135, the direct-attached storage 150 may include storage components that are provided within each of the first node 105, the second node 110, and the third node 115, such that each of the first, second, and third nodes may access its respective direct-attached storage without having to access the network 135.
It is to be understood that only certain components of the virtual computing system 100 are shown in
Although three of the plurality of nodes (e.g., the first node 105, the second node 110, and the third node 115) are shown in the virtual computing system 100, in other embodiments, greater or fewer than three nodes may be used. Likewise, although only two of the user VMs 120 are shown on each of the first node 105, the second node 110, and the third node 115, in other embodiments, the number of the user VMs on the first, second, and third nodes may vary to include either a single user VM or more than two user VMs. Further, the first node 105, the second node 110, and the third node 115 need not always have the same number of the user VMs 120. Additionally, more than a single instance of the hypervisor 125 and/or the controller/service VM 130 may be provided on the first node 105, the second node 110, and/or the third node 115.
Further, in some embodiments, each of the first node 105, the second node 110, and the third node 115 may be a hardware device, such as a server. For example, in some embodiments, one or more of the first node 105, the second node 110, and the third node 115 may be an NX-1000 server, NX-3000 server, NX-6000 server, NX-8000 server, etc. provided by Nutanix, Inc. or server computers from Dell, Inc., Lenovo Group Ltd. or Lenovo PC International, Cisco Systems, Inc., etc. In other embodiments, one or more of the first node 105, the second node 110, or the third node 115 may be another type of hardware device, such as a personal computer, an input/output or peripheral unit such as a printer, or any type of device that is suitable for use as a node within the virtual computing system 100.
Each of the first node 105, the second node 110, and the third node 115 may also be configured to communicate and share resources with each other via the network 135. For example, in some embodiments, the first node 105, the second node 110, and the third node 115 may communicate and share resources with each other via the controller/service VM 130 and/or the hypervisor 125. One or more of the first node 105, the second node 110, and the third node 115 may also be organized in a variety of network topologies, and may be termed as a “host” or “host machine.”
Also, although not shown, one or more of the first node 105, the second node 110, and the third node 115 may include one or more processing units configured to execute instructions. The instructions may be carried out by a special purpose computer, logic circuits, or hardware circuits of the first node 105, the second node 110, and the third node 115. The processing units may be implemented in hardware, firmware, software, or any combination thereof. The term “execution” is, for example, the process of running an application or the carrying out of the operation called for by an instruction. The instructions may be written using one or more programming language, scripting language, assembly language, etc. The processing units, thus, execute an instruction, meaning that they perform the operations called for by that instruction.
The processing units may be operably coupled to the storage pool 140, as well as with other elements of the respective first node 105, the second node 110, and the third node 115 to receive, send, and process information, and to control the operations of the underlying first, second, or third node. The processing units may retrieve a set of instructions from the storage pool 140, such as, from a permanent memory device like a read only memory (ROM) device and copy the instructions in an executable form to a temporary memory device that is generally some form of random access memory (RAM). The ROM and RAM may both be part of the storage pool 140, or in some embodiments, may be separately provisioned from the storage pool. Further, the processing units may include a single stand-alone processing unit, or a plurality of processing units that use the same or different processing technology.
With respect to the storage pool 140 and particularly with respect to the direct-attached storage 150, it may include a variety of types of memory devices. For example, in some embodiments, the direct-attached storage 150 may include, but is not limited to, any type of RAM, ROM, flash memory, magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, etc.), optical disks (e.g., compact disk (CD), digital versatile disk (DVD), etc.), smart cards, solid state devices, etc. Likewise, the network-attached storage 145 may include any of a variety of network accessible storage (e.g., the cloud storage 155, the local storage area network 160, etc.) that is suitable for use within the virtual computing system 100 and accessible via the network 135. The storage pool 140 including the network-attached storage 145 and the direct-attached storage 150 may together form a distributed storage system configured to be accessed by each of the first node 105, the second node 110, and the third node 115 via the network 135 and the controller/service VM 130, and/or the hypervisor 125. In some embodiments, the various storage components in the storage pool 140 may be configured as virtual disks for access by the user VMs 120.
Each of the user VMs 120 is a software-based implementation of a computing machine in the virtual computing system 100. The user VMs 120 emulate the functionality of a physical computer. Specifically, the hardware resources, such as processing unit, memory, storage, etc., of the underlying computer (e.g., the first node 105, the second node 110, and the third node 115) are virtualized or transformed by the hypervisor 125 into the underlying support for each of the plurality of user VMs 120 that may run its own operating system and applications on the underlying physical resources just like a real computer. By encapsulating an entire machine, including CPU, memory, operating system, storage devices, and network devices, the user VMs 120 are compatible with most standard operating systems (e.g. Windows, Linux, etc.), applications, and device drivers. Thus, the hypervisor 125 is a virtual machine monitor that allows a single physical server computer (e.g., the first node 105, the second node 110, third node 115) to run multiple instances of the user VMs 120, with each user VM sharing the resources of that one physical server computer, potentially across multiple environments. By running the plurality of user VMs 120 on each of the first node 105, the second node 110, and the third node 115, multiple workloads and multiple operating systems may be run on a single piece of underlying hardware computer (e.g., the first node, the second node, and the third node) to increase resource utilization and manage workflow.
The user VMs 120 are controlled and managed by the controller/service VM 130. The controller/service VM 130 of each of the first node 105, the second node 110, and the third node 115 is configured to communicate with each other via the network 135 to form a distributed system 165. The hypervisor 125 of each of the first node 105, the second node 110, and the third node 115 may be configured to run virtualization software, such as, ESXi from VMWare, AHV from Nutanix, Inc., XenServer from Citrix Systems, Inc., etc., for running the user VMs 120 and for managing the interactions between the user VMs and the underlying hardware of the first node 105, the second node 110, and the third node 115. The controller/service VM 130 and the hypervisor 125 may be configured as suitable for use within the virtual computing system 100.
The network 135 may include any of a variety of wired or wireless network channels that may be suitable for use within the virtual computing system 100. For example, in some embodiments, the network 135 may include wired connections, such as an Ethernet connection, one or more twisted pair wires, coaxial cables, fiber optic cables, etc. In other embodiments, the network 135 may include wireless connections, such as microwaves, infrared waves, radio waves, spread spectrum technologies, satellites, etc. The network 135 may also be configured to communicate with another device using cellular networks, local area networks, wide area networks, the Internet, etc. In some embodiments, the network 135 may include a combination of wired and wireless communications.
Referring still to
The first user VM 202 also can include one or more virtual processors (vCPUs) and virtual memories (vRAMs) provided by the hypervisor 206. For example, the first user VM 202 can include a first virtual CPU (vCPU1) 218 and a second virtual CPU (vCPU2) 220, a first virtual RAM (vRAM1) 222, and a second virtual RAM (vRAM2) 224. Similarly, the second user VM 204 can include a third virtual CPU (vCPU3) 226, a fourth virtual CPU (vCPU4) 228, a third virtual RAM (vRAM3) 230, and a fourth virtual RAM (vRAM4) 232. The first guest OS 210 can run the first set of software applications 212 on one or more of the virtual CPUs and virtual RAMs included in the first user VM 202. Similarly, the second guest OS 214 can run the second set of software applications 216 on one or more of the virtual CPUs and the virtual RAMs included in the second user VM 204. In particular, the guest OSs can schedule threads associated with the respective software applications to the one or more respective virtual CPUs and assign memory space to the threads on the one or more respective RAMs.
The hypervisor 206 can implement processor and memory virtualization by abstracting the hardware resources 208 including processors, memory, and I/O devices, and present the abstraction to the first and the second user VMs 202 and 204 as the virtual CPUs and RAMs. For example, the hypervisor 206 can implement processor virtualization by scheduling time slots on one or more physical processors of the hardware resources 208 such that from the guest OS's perspective, the time slots are scheduled on the virtual CPUs. The hypervisor 206 can implement memory virtualization by maintaining a translation table that translates virtual memory addresses assigned by the guest OSs to physical memory addresses in the physical memories of the hardware resources 208.
The hardware resources 208 can include several processors and memories. While not shown in
The hypervisor 206 can advantageously use the non-uniform latencies between various pCPU cores and pRAM banks in the hardware resources 208 in processor and memory virtualization. In particular, the hypervisor may map vCPUs to pCPUs and vRAMs to pRAMs based on the known latencies between the pCPUs and the pRAMs. For example, the hypervisor 206 may run critical applications on pCPUs and pRAMs pairs having the lowest latencies.
Referring again to the first user VM 202 and the second user VM 204, the virtual processors and the virtual memory within these virtual machines also can be structured in a non-uniform memory access architecture. For example, the hypervisor 206 can present the virtual machines a virtual NUMA (or vNUMA) architecture, where the various vCPUs and vRAMs are presented to the respective guest OS as being part of vNUMA nodes. For example, the hypervisor 206 can present the first guest OS 210 two vNUMA nodes, where the first vNUMA node 270 includes the vCPU1218 and the vRAM1222, while the second vNUMA node 272 includes the vCPU2220 and the vRAM2224. Similarly, the hypervisor 206 can be present the second guest OS 214 a third and fourth vNUMA nodes, where the third vNUMA node 274 includes the vCPU3226 and the vRAM3230, while the fourth vNUMA node 276 can include the vCPU4228 and the vRAM4232. The respective guest OSs, given the vNUMA architecture, can then schedule and map their respective applications based on the latencies between the various vCPUs and vRAMs provided by the hypervisor 206. The hypervisor 206 can maintain a physical latency table (also referred to as a physical system locality information table (pSLIT)) that specifies the latencies between any CPU core and a RAM bank.
A similar latency table (also referred to as a virtual system locality information table (vSLIT)) can be maintained by the virtual machines specifying the access latencies between pairs of virtual CPUs and virtual RAM banks. For example, as discussed above, the hypervisor 206 can present to the first and second user VMs 202 and 204 a virtual NUMA architecture. The first and second VMs can maintain a virtual latency table specifying the access latencies between pairs of virtual CPUs and RAM banks.
The first and the second user VMs 202 and 204 can populate the latency values in their respective virtual latency tables based on latency values included in the physical latency table 300. In particular, the hypervisor 206 can provide the first and second user VMs 202 and 204 with the latency values included in the physical latency table 300. For example, the operating systems running on the first and second user VMs 202 and 204 can use advanced configuration and power interface (ACPI) to request the physical latency information from the hypervisor 206 or the hardware resources 208. The first and second user VMs 202 and 204 can then utilize the current mapping of vCPUs to pCPUs and vRAMs to pRAMs to determine the appropriate latency values for their respective virtual latency tables. For example, referring to
It should be noted that the vNUMA and pNUMA architecture shown in
In some implementations, first user VM 202 may populate the first virtual latency table 400 with latency values that are a function of the latency value selected from the physical latency table 300. For example, the function can include one or more of a factor, a multiplier, an offset, or any other mathematical function. For example, the first user VM 202 may multiply the latency value selected from the physical latency table 300 by a multiplication factor and populate the first virtual latency table 400 with the result. In another example, the first user VM 202 may offset the latency value selected from the physical latency table 300 and use the resulting value to populate the first virtual latency table 400. The second user VM 204 can populate the second virtual latency table 500 shown in
In one or more embodiments, the first and the second user VMs 202 and 204 may update the latency values in their respective virtual latency tables. For example, the virtual machine may update the latency values in response to changes in the mapping of the virtual CPUs and virtual RAM banks to the physical CPUs and physical RAM banks. In another example, the virtual machine may update the latency values in the virtual latency table in response to changes in the latency values in the physical latency table 300. In one or more embodiments, the first and the second user VMs 202 and 204 may repeatedly communicate with the hypervisor 206 to obtain the current mappings and the current latency values in the physical latency table, and determine and update if necessary, the latency values in their respective virtual latency tables. In one or more embodiments, the first and second user VMs 202 and 204 can receive an indication from the hypervisor 206 if there is any change in the mappings or any change in one or more latency values in the physical latency table 300. The indication may also include the updated mappings and the updated latency values. In response, the first and the second user VMs 202 and 204 can update, if necessary, the latency values in their respective virtual latency tables.
As discussed above, the hypervisor 206 can maintain the mappings of the vCPUs and the vRAMs to the pCPUs and the pRAMs in a mapping data structure. The hypervisor 206 can communicate the mapping data structure to the first and the second user VMs 202 and 204 so that the mapping information can be used to update the virtual latency tables. In one or more embodiments, the mapping data structure can include a table that lists identifiers (such as a name or a unique ID) associated with the vNUMA nodes, and the identifiers of the pNUMA nodes to which each of the vNUMA nodes are mapped. For example, referring to
The first and the second user VMs 202 and 204, by maintaining the first and second virtual latency tables 400 and 500 can leverage the vNUMA architecture and the associated non-uniform latency values to assign the first and second set of applications 212 and 216 to the appropriate virtual CPUs and virtual RAM banks. For example, if App1 were a critical application or an application requiring a high quality of service, the first user VM 202 may assign the App1 (or the associated program threads) to the virtual CPU and virtual RAM bank pair having the lowest latency value. As another example, the first user VM 202 may assign program threads associated with a database application analyzing data sets, which may include repeated memory access, may be assigned, to a virtual CPU and virtual RAM bank pair having a low latency value.
For example, referring to the first virtual latency table 400, the first user VM 202 may assign the App1 to either vCPU1218 and the vRAM1222 or to vCPU2220 and the vRAM2224. In some other embodiments, the first user VM 202 may move an application or a program thread from a low latency pair of vCPU and vRAM to a high latency pair of vCPU and vRAM if, for example, it is determined that the frequency of the thread's access to the vRAM is low enough to justify high latecy. The first user VM 202 may then assign another application or thread to the low latency pair of vCPU and vRAM. This can be of particular benefit in multithreading and multi-tasking scenarios, and can improve the performance of the virtual computing system 100.
The process 600 also includes obtaining physical latency values associated with the physical processors and the physical memories (604). At least one example of this operation has been discussed above in relation to
The process 600 further includes generating latency values associated with the virtual processors and the virtual memories based on the mapping information and the physical latency values (606). At least one example of this operation has been discussed above in relation to
It should be noted that the vNUMA and pNUMA architecture shown in
It is to be understood that in some embodiments, any of the operations described herein may be implemented at least in part as computer-readable instructions stored on a computer readable memory. Upon execution of the computer-readable instructions by a processor, the computer-readable instructions may cause a node to perform the operations.
The herein described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable,” to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.” Further, unless otherwise noted, the use of the words “approximate,” “about,” “around,” “substantially,” etc., mean plus or minus ten percent.
The foregoing description of illustrative embodiments has been presented for purposes of illustration and of description. It is not intended to be exhaustive or limiting with respect to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosed embodiments. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.