The field of invention generally relates to administrative tools for computer systems, and more particularly relates to administrative tools and associated methods, architectures, software/firmware and systems for enabling concurrent administration of a virtual machine host and virtual machines running on the host.
It can be appreciated that the use of virtual machines running on servers has become more prevalent in recent years. A virtual machine is an operating environment that runs on a computer to allow multiple operating systems, such as a MICROSOFT™ WINDOWS™ Server (e.g., WINDOWS Server 2003) or a LINUX™ Server (e.g., RED HAT™ LINUX Enterprise Edition 3), to run independently within one or more virtual machines. In most cases, virtual machines simulate complete hardware environments, such that the operating system running within a virtual machine interfaces with what appears to be a computer, when in fact it is a simulated computer running on a computer (i.e., a virtual machine). The advantage of such an arrangement is that multiple virtual machines can run on a single computer, allowing for increased operating efficiency and/or higher reliability due to virtual machine isolation.
However, this configuration presents an interesting problem. One common configuration is that of a server running the LINUX operating system hosting multiple virtual machines running the WINDOWS Server operating system and the LINUX operating system. If a problem occurs in one of the virtual machines, the problem could be due to interference from another virtual machine or a problem with the underlying operating system.
For a computer systems administrator, that is, someone who manages operating system installations, this is a very challenging situation. The problem is that WINDOWS Server administrators are only trained to manage and troubleshoot problems with WINDOWS Server installations, while LINUX administrators are only familiar with how to operate and troubleshoot LINUX installations. Such administrators need access to administrative data such as CPU usage, memory usage, disk usage and error log information such as failed system logon attempts, failed processes, and the like. As a result, a WINDOWS Server administrator encountering a malfunctioning virtual machine in the above configuration (LINUX host running WINDOWS and LINUX virtual machines) is unable to troubleshoot problems in any of the LINUX virtual machines or the LINUX host. Even experienced system administrators who are familiar with both the WINDOWS and LINUX operating systems find such a situation challenging to troubleshoot due to the need to go back and forth between the error logs and performance interfaces located on the host and in the various virtual machines. An additional challenge is that, between different operating systems, administrative data is stored and exposed in different ways, such that performance and event data is not easily transferable between two operating systems running in separate virtual machines. Clearly then, a virtual machine environment running diverse operating systems can present a difficult and challenging configuration to troubleshoot even for the most experienced systems administrators. It would be beneficial, therefore, to devise a technique to expose such administrative data across various virtual machines to make that data easier for systems administrators to access via the operating environment in which they are most comfortable. Moreover, it would be advantageous to enable systems administrators to be able to reconfigure various platform and VM resources via an interface they are familiar with.
In accordance with aspects of the invention, methods, architectures and software/firmware components are disclosed for monitoring and reporting administrative data among a virtual machine host and virtual machines running on the host, and for re-allocating platform resources. The disclosed techniques enable administrative data from one or more virtual machines and/or the virtual machine host to be viewed via a “unified” user interface running on the host or any other given virtual machine, regardless of what operating systems the host or virtual machines are running.
In accordance with one aspect of the invention, software and/or firmware components called “agents” are installed and run on a host operating system and respective operating systems running on multiple virtual machines. Optionally, agents may run directly on virtual machines. Generally, such agents may be distributed as modules that are included in the operating system distribution (that is, the files that make up the operating system) or it can be installed by a computer systems administrator or the like via a storage medium or network download. The agents obtain administrative data, such as performance and log data, user information, process information, memory and CPU usage and the like, from their respective host operating systems by monitoring, for example, key log files for changes and by querying processes running on hosts for information. In another implementation, an agent interfaces with its host operating system via an application programming interface that can be configured to notify the agent of changes to administrative data corresponding to its host operating system.
In accordance with another aspect, an agent running on the host operating system publishes administrative data to operating systems running in virtual machines or to agents running on such operating systems. An agent may publish the administrative data through a variety of mechanisms. These include a virtual machine application programming interface, direct writing to the operating systems running within the virtual machines, connection-oriented or connectionless communication mechanisms for publishing such data to software agents running on the operating systems in the virtual machines, and exposing administrative data through polling interfaces.
In accordance with yet another aspect, communication between agents may be implemented via a peer-to-peer, network protocol-based, bus-based or control-based transfer of administrative data among the host and virtual machines. As a result, any given agent, whether running on the host or one of the operating systems hosted by a virtual machine can act as the central point of collection and viewing of administrative data, thereby reducing the dependency on any given virtual machine or the host.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified:
a is a schematic drawing illustrating an implementation of a software-based architecture configured to run on an exemplary platform hardware configuration including a central processing unit coupled to a memory interface and an input/output interface.
b is a schematic drawing illustrating an implementation of a software-based agent architecture configured to run on platform hardware including a multi-core processor.
c is a schematic drawing illustrating an implementation of a firmware-based virtual machine architecture running on platform hardware including a multi-core processor, wherein the architecture includes a firmware-based virtual machine host and associated firmware agents.
d is a schematic drawing illustrating a variation of the architecture of
a is a drawing illustrating on embodiment of a unified user interface for display of the relationships among virtual machines and host operating system and performance data for the host system and each virtual machine.
b is an illustration one embodiment of a user interface for viewing virtual machine memory and disk configuration and processor allocation shown for multiple virtual machines.
a is an illustration of one embodiment of a user interface for configuring and re-configuring virtual machine memory allocation.
b is an illustration of one embodiment of a user interface for configuring virtual machine memory allocation.
Embodiments of methods, architectures, software/firmware and systems for enabling concurrent administration of host operating systems and virtual machine-hosted operating systems are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
A host OS agent 140 is employed to extract various administrative data from host operating system 120 and exposes that information through a plurality of mechanisms to one or more operating systems 1601-n running in respective virtual machines 1701-n, hosted by VM platform 110. For example the various administrative data may include performance and event data such as available and utilized disk space, memory usage, CPU usage, failed logon attempts, the starting and stopping of processes, and the consumption of resources by individual processes
In one embodiment, host OS agent 140 communicates with another software agent 1701-n installed in the virtual machine-hosted operating system 1601 (running on virtual machine 1701) to expose administrative data from host operating system 120 via the administrative data interfaces of operating system 1601. In one embodiment, operating system 1601 comprises WINDOWS Server 2003. In an alternative embodiment, agent 120 communicates directly with operating system 1602 to expose administrative data from host operating system 120 to the administrative data interfaces of operating system 1602.
As further depicted in
Generally, the various administrative data will be stored via some type of storage mechanism, such as a data structure in memory and/or one or more file system files. The storage mechanism is illustrative herein by various data stores, such as data stores 180 and 181 in
If the virtual machine platform does not support an application programming interface, then the logic proceeds to a decision block 210, wherein the host OS agent determines whether it is configured to publish administrative data only to clients that register with it, or to any recipient. If configured only to support publishing of administrative data to registered virtual machine clients, then the host OS agent waits for a registration request, as depicted by a decision block 212, in a low-priority threaded process that runs and waits for registration requests; when a registration request is received, the client virtual machine is added to client table 190, in block 208. Alternatively, if the software agent is configured to publish its information to any virtual machine client (either in encrypted or unencrypted form) without requiring registration, the host OS agent continues the process at a decision block 214.
Continuing at this decision block, the host OS agent determines its actions based on whether it is configured for real-time publishing or not. If it is, the logic proceeds to blocks 216, 218, and 220, wherein it respectively installs a performance and log monitor, which waits to be notified of changes to system performance, with the changes then being published to the operating systems running in virtual machines 1701-n, as applicable. If real-time publishing is not supported, then the host OS agent polls for changes in a block 222. In one embodiment, pooling may be performed at a user-specified Interval by running the df and ps system commands on RED HAT Enterprise LINUX 3, and processing the output of those commands. The host OS agent also checks the recorded timestamp and file size of system logs to determine if changes to those logs have occurred, as depicted by a decision block 224; in response to a detected change, e.g., if a change is detected in memory, CPU or disk usage, or if an important system event has occurred as indicated by a change to a system log file, that information is then published to the operating systems running in the virtual machines 1701-n, in block 220. The process repeats itself as more virtual machines register with the host OS agent, decision block 212, and as the host OS agent polls, block 222, or is alerted to changes, block 218, which changes are then published to the virtual machine-hosted operating systems in block 220.
If publication via a VM platform API is not supported, the logic proceeds to a decision block 306, wherein the host OS agent determines if it can write directly to the virtual machine using an interface provided by the operating system running in the virtual machine. If it can, then the administrative data is either published via an OS API or written to a file in a block 308. For example, if the operating system in the virtual machine supports publication via an API, such as WINDOWS Server 2003, the virtual machine publishes the administrative data about the host via that API in block 308. As an alternative, if the operating system running in the virtual machine employs file logs for logging administrative data, such as various LINUX implementations (e.g., RED HAT Enterprise LINUX 3), then the host OS agent writes a file directly to the operating system containing performance and event information about the host operating system in block 308.
An additional way to publish performance and event information about the host to the virtual machine clients involves publication of performance and event data to software agents running on the operating systems of the virtual machines. Accordingly, a determination is made in a decision block 310 to determine if this approach is available. Under this approach, the host OS agent first creates a performance event, as shown in a block 312. In one embodiment, the performance or event data contained in the event is formatted using the extensible Markup Language (XML), a text-based format used for descriptions of data. This enables the OS host agent to communicate one or more pieces of performance or event data by describing each piece of data in the XML. In one implementation, the text based XML description is then compressed into a binary format for more efficient transfer.
In a decision block 314, the host OS agent determines whether it should connect to one of the agents running in the virtual machines (which has previously registered with the host OS agent), to communicate new performance or event data to that agent. If no connection is demanded, then the agent broadcasts performance information and changes, as shown in a block 316. This may be accomplished, for example, using the Internet Protocol broadcast or multicast connectionless communication mechanisms. If a connection is demanded, then in a block 318, the software agent on the host connects to an agent in the virtual machine, transmits the event information, block 320, and closes the connection, block 322. It then determines in a decision block 324 whether performance information needs to be communicated to additional client agents (running in respective virtual machines); if so, the process is repeated, as shown by a loop back to decision block 314.
The OS host agent may support an additional communication mechanism, whereby it exposes performance and event data via a polling interface, as depicted by blocks 326 and 328. In one implementation, the host OS agent has a process that runs continuously, awaiting poll requests from agents running in the virtual machines; when poll requests are received, they are accepted in block 328, and the logic flows to decision block 314, with the OS host agent communicating performance data as previously described.
Flowchart 400 in
Under architecture 500 of
In
In further detail, communication between agents may be facilitated in one of many ways known to those skilled in the software and communications art. For instance, host OS agent 540 may communicate with a virtual machine via respective APIs 515 and 516 provided by the host OS agent and the VM. In turn, an agent 550 may then communicate with the virtual machine API 516 via an API 517, as illustrated in
As further shown in
In accordance with further aspects of inter-agent communication, various techniques may be employed to support peer-to-peer, software- and hardware-bus-based, and control-based transfer of administrative data among the host and virtual machines. In one embodiment of a peer-to-peer implementation the agents communicate with each other in an ad hoc manner, in which all of the agents are equal to each other. Agents talk directly to each other to send or receive performance data. Under one embodiment of a bus architecture, a common subsystem is implemented that transfers data between the agents. Each agent can join the bus to communicate data to other agents. In one software implementation, the bus comprises a set of ‘C’ functions that allow agents to perform a variety of operations. An agent can execute a remote function on another virtual machine via an agent on another virtual machine or the host. For example, an agent can execute a remote command to get CPU usage or disk space free data, or to set memory to disk swap allocation, regardless of the remote operating system. The bus supports synchronous and asynchronous operations; when used in asynchronous mode, callbacks are supported to return status to the calling agent once an operating is complete. The bus also supports common updates, such that a data value changed in one agent can be distributed by the bus to all other agents (whereas, in a peer to peer implementation, one agent would need to communicate to several other agents, which might then communicate with other agents, and so on.) In another instantiation, a hardware bus is implemented, which supports similar functions, but in hardware. In the control-based implementation, one agent acts as the master (e.g., a host OS agent), acting as the central control for retrieval and distribution of data.
If an agent is not configured to run in P2P mode, the agent determines whether it is configured to run in control mode in a decision block 614. If yes, then the agent serves as the central collection and publishing point for the performance data for the host and the virtual machines. In block 616, the agent exposes event data from its own system and from any client agents that have connected to it and provided it with performance data. If the agent is not running in control mode (meaning, it is only publishing data about its own system, not acting as a central collection point), then it publishes data to the control agent via communication mechanisms previously described, as shown in a block 618. Alternatively, if the software agent is configured in polling mode, a determined at a decision block 620, the agent configured as the control polls the other agents for performance information, as shown in a block 622.
In addition to using various forms of software agents for enabling communication of performance and event data between operating systems, all or a portion of similar functionality may be supported via firmware-based components. For example,
In general, components in
As before, the various agents 9501-N are enabled to communicate in a peer-to-peer manner. Additionally, each of agents 9501-N are enabled to communicate with firmware agent 940 through a firmware API discussed below in further detail. Host OS 920 may also communicate with firmware agent 940 via the firmware API.
Further details of various software and firmware implementations are shown in
As discussed above, the techniques disclosed herein enable various data in events corresponding to the operation of the various operating systems to be made available in a manner that only requires familiarity with a single operating system. For example, the agents can publish or otherwise make available performance and event data in a manner consistent with a first type of operating system that is familiar to IT professional, while at the same time the various operating systems that are deployed may include operating systems that are unfamiliar to the IT professional. For simplicity and clarity, an exemplary set of such data is shown as published data 1085, which includes performance and event data for each of operating systems 10601-N, as well as host OS 1020,
The configuration, performance and event data may be accessed via various mechanisms, depending on the particular implementation. For example, an API 1090 may be provided to provide a programmatic interface to published data 1085. In one embodiment, API 1090 supports native calls associated with a management console 1095 or the like, such as a WINDOWS management console or a LINUX management console. Thus, the management counsel may use native (to the associated operating system) calls to obtain published data 1085. API 1090 also supports native calls to reconfigure various platform components, such as memory and disk allocations. In another embodiment, API 1090 provides an XML-based interface, supporting an interface with published data 1085 via XML-formatted requests and posts.
a further shows an exemplary platform hardware configuration 1034 including a central processing unit (CPU) 1002 coupled to a memory interface 1004 and an input/output (I/O) interface 1006 via main buses 1008. For example, under the well-known Intel Northbridge/Southbridge architecture, memory interface 1004 may be implemented in a Northbridge chipset component (e.g., memory controller hub (MCH)), while I/O interface 1006 may be implemented in a Southbridge component (e.g., I/O controller hub (ICH)). System memory 1012 may be accessed via memory interface 1004.
In general, CPU 1002 may comprise a single core processor or a multi-core processor. Under architecture 1000A, operations for a multi-core processor are not segregated to a respective virtual machine. Rather, the virtual machines are run as threads running as tasks on the multiple processor cores.
I/O interface 1006 is employed to access several components, including a disk controller 1014, which in turn is used to access one or more disk drives 1016. I/O interface 1006 is also used to access a firmware store 1018 (e.g., a ROM or non-volatile memory) and a network interface controller (NIC) 1022. In addition to the hardware components shown, various other hardware 1024 may be accessed the appropriate hardware interfaces, including I/O interface 1006.
In general, the various software components discussed herein, including operating systems and software agents collectively illustrated as software components 1026, will typically be stored on a disk drive 1016 and loaded into system memory 1012 during system boot operations. Optionally, all or a portion of software components 1026 may be loaded from a server 1027 via a network 1028. Meanwhile, the various firmware components discussed herein, including platform firmware 1032 and firmware-based agents (as applicable) generally depicted as firmware components 1029, will generally be loaded from firmware store 1018 during system boot and loaded into a firmware space in system memory 1012.
Under a typical system boot for architecture 1000A, platform firmware 1032 will be loaded and configured in system memory 1012, followed by booting the host OS 1020. Subsequently, VM platform 1010, which may generally comprise an application running on host OS 1020, will be launched. VM platform 1010 can generally be configured to launch one or more virtual machines VM1-N, each of which will be configured to use various portions (i.e., address spaces) of system memory 1012. In turn, each virtual machine VM1-N may be employed to host a respective operating system 10601-N.
During run-time operations, the host OS agent 1040 and agents 10501-N are employed to publish various configuration, performance, and event data and enable reconfiguration of various system resources, such as system memory 1012 and disk drive(s) 1016. Generally, the virtual machines provide abstractions (in combination with VM platform 1010) between their hosted operating system and the underlying platform hardware 1034. From the viewpoint of each hosted operating system, that operating system “owns” the entire platform, and is unaware of the existence of other operating systems running on virtual machines. In reality, each operating system merely as access to only those resource spaces allocated to it.
Architecture 1000B of
Architecture 1000C of
In general, a “host” firmware agent may be deployed as part of virtual machine manager 1011, or as a component of a virtual machine VM1-N, as respectively depicted by firmware agents 1041 and 1042. Meanwhile, architecture 1000C may employ software agents 10501-N in a manner similar to architecture 1000A, or may employ firmware agents in the host virtual machines VM1-N in lieu of the software agents.
Platform hardware 1034C includes a multi-core processor 1001 having 2 or more (i.e., M) main processing cores 1003. In the illustrated embodiment, VMM 1011 is configured to allocate processing resources for each of cores 1003 to a respective virtual machine VM1-N. Depending on the particular multi-core architecture, system memory 1012 may have fixed mapping to the processor cores, or a shared interface may be employed in which address spaces in system memory 1012 are reconfigurable to enable different-size portions of the system memory to be allocated to the main processing cores.
In general, VMM 1011 will run on one of main processing cores 1003. For example, in one embodiment VMM 1011 is run on the first main core. Typically, VMM 1011 will be loaded into system memory during system initialization, as well as the various virtual machine instances. Various details of an exemplary technique for building a firmware framework are discussed below.
Another architecture 1000D for implementing a firmware-based scheme is shown in
Under one embodiment, one or more of virtual machine manager 1011, firmware agent 1041, and firmware agent 1042 is hosted by (i.e., run on) management core 1027. By providing a separate core for hosting these firmware components, the components may be run during operating system run-time without affecting the operation of the various operating systems. Moreover, the firmware components may be run in a manner that is transparent to the operating systems. In general, these firmware components may be run on a continuous or intermittent basis. In one embodiment, a periodic timer is implemented to periodically activate selected firmware components to facilitate various agent operations discussed herein.
In order to support a firmware-based implementation, there needs to be some mechanism to enable communication between software (e.g., operating systems and software agents, if employed) and firmware components (e.g., the VMM layer and firmware agents). Fortunately, today's firmware architectures include provisions for extending BIOS functionality beyond that provided by the BIOS code stored in a platform's BIOS device (e.g., flash memory). More particularly, the Extensible Firmware Interface (EFI) (specifications and examples of which may be found at http://developer.intel.com/technology/efi) is a public industry specification that describes an abstract programmatic interface between platform firmware and shrink-wrap operation systems or other custom application environments. The EFI framework include provisions for extending BIOS functionality beyond that provided by the BIOS code stored in a platform's BIOS device (e.g., flash memory). EFI enables firmware, in the form of firmware modules and drivers, to be loaded from a variety of different resources, including primary and secondary flash devices, option ROMs, various persistent storage devices (e.g., hard disks, CD ROMs, etc.), and even over computer networks.
Among many features, EFI provides an abstraction for storing persistent values in the platform firmware known as “variables.” Variables are defined as key/value pairs that consist of identifying information plus attributes (the key) and arbitrary data (the value). Variables are intended for use as a means to store data that is passed between the EFI environment implemented in the platform and EFI OS loaders and other applications that run in the EFI environment. Moreover, the firmware variables may be accessed during run-time operations using appropriate API's.
In accordance with one embodiment, a software-to-firmware communication framework is implemented via facilities provided by EFI.
The PEI phase provides a standardized method of loading and invoking specific initial configuration routines for the processor (CPU), chipset, and motherboard. The PEI phase is responsible for initializing enough of the system to provide a stable base for the follow on phases. Initialization of the platforms core components, including the CPU, chipset and main board (i.e., motherboard) is performed during the PEI phase. This phase is also referred to as the “early initialization” phase. Typical operations performed during this phase include the POST (power-on self test) operations, and discovery of platform resources. In particular, the PEI phase discovers memory and prepares a resource map that is handed off to the DXE phase. The state of the system at the end of the PEI phase is passed to the DXE phase through a list of position independent data structures called Hand Off Blocks (HOBs).
The DXE phase is the phase during which most of the system initialization is performed. The DXE phase is facilitated by several components, including the DXE core 1100, the DXE dispatcher 1102, and a set of DXE drivers 1104. The DXE core 1100 produces a set of Boot Services 1106, Runtime Services 1108, and DXE Services 1110. The DXE dispatcher 1102 is responsible for discovering and executing DXE drivers 1104 in the correct order. The DXE drivers 1104 are responsible for initializing the processor, chipset, and platform components as well as providing software abstractions for console and boot devices. These components work together to initialize the platform and provide the services required to boot an operating system. The DXE and the Boot Device Selection phases work together to establish consoles and attempt the booting of operating systems. The DXE phase is terminated when an operating system successfully begins its boot process (i.e., the BDS phase starts). Only the runtime services and selected DXE services provided by the DXE core and selected services provided by runtime DXE drivers are allowed to persist into the OS runtime environment. The result of DXE is the presentation of a fully formed EFI interface.
The DXE core is designed to be completely portable with no CPU, chipset, or platform dependencies. This is accomplished by designing in several features. First, the DXE core only depends upon the HOB list for its initial state. This means that the DXE core does not depend on any services from a previous phase, so all the prior phases can be unloaded once the HOB list is passed to the DXE core. Second, the DXE core does not contain any hard coded addresses. This means that the DXE core can be loaded anywhere in physical memory, and it can function correctly no matter where physical memory or where Firmware segments are located in the processor's physical address space. Third, the DXE core does not contain any CPU-specific, chipset specific, or platform specific information. Instead, the DXE core is abstracted from the system hardware through a set of architectural protocol interfaces. These architectural protocol interfaces are produced by DXE drivers 104, which are invoked by DXE Dispatcher 1102.
The DXE core produces an EFI System Table 1200 and its associated set of Boot Services 1106 and Runtime Services 1108, as shown in
The Boot Services comprise a set of services that are used during the DXE and BDS phases. Among others, these services include Memory Services, Protocol Handler Services, and Driver Support Services: Memory Services provide services to allocate and free memory pages and allocate and free the memory pool on byte boundaries. It also provides a service to retrieve a map of all the current physical memory usage in the platform. Protocol Handler Services provides services to add and remove handles from the handle database. It also provides services to add and remove protocols from the handles in the handle database. Addition services are available that allow any component to lookup handles in the handle database, and open and close protocols in the handle database. Support Services provides services to connect and disconnect drivers to devices in the platform. These services are used by the BDS phase to either connect all drivers to all devices, or to connect only the minimum number of drivers to devices required to establish the consoles and boot an operating system (i.e., for supporting a fast boot mechanism).
In contrast to Boot Services, Runtime Services are available both during pre-boot and OS runtime operations. One of the Runtime Services that is leveraged by embodiments disclosed herein is the Variable Services. As described in further detail below, the Variable Services provide services to lookup, add, and remove environmental variables from both volatile and non-volatile storage. As used herein, the Variable Services is termed “generic” since it is independent of any system component for which firmware is updated by embodiments of the invention.
As shown in
The services offered by each of Boot Services 1106, Runtime Services 1108, and DXE services 1110 are accessed via respective sets of API's 1112, 1114, and 1116. The API's provide an abstracted interface that enables subsequently loaded components to leverage selected services provided by the DXE Core.
After DXE Core 1100 is initialized, control is handed to DXE Dispatcher 1102. The DXE Dispatcher is responsible for loading and invoking DXE drivers found in firmware volumes, which correspond to the logical storage units from which firmware is loaded under the EFI framework. The DXE dispatcher searches for drivers in the firmware volumes described by the HOB List. As execution continues, other firmware volumes might be located. When they are, the dispatcher searches them for drivers as well.
There are two subclasses of DXE drivers. The first subclass includes DXE drivers that execute very early in the DXE phase. The execution order of these DXE drivers depends on the presence and contents of an a priori file and the evaluation of dependency expressions. These early DXE drivers will typically contain processor, chipset, and platform initialization code. These early drivers will also typically produce the architectural protocols that are required for the DXE core to produce its full complement of Boot Services and Runtime Services.
The second class of DXE drivers are those that comply with the EFI 1.10 Driver Model. These drivers do not perform any hardware initialization when they are executed by the DXE dispatcher. Instead, they register a Driver Binding Protocol interface in the handle database. The set of Driver Binding Protocols are used by the BDS phase to connect the drivers to the devices required to establish consoles and provide access to boot devices. The DXE Drivers that comply with the EFI 1.10 Driver Model ultimately provide software abstractions for console devices and boot devices when they are explicitly asked to do so.
Any DXE driver may consume the Boot Services and Runtime Services to perform their functions. However, the early DXE drivers need to be aware that not all of these services may be available when they execute because all of the architectural protocols might not have been registered yet. DXE drivers must use dependency expressions to guarantee that the services and protocol interfaces they require are available before they are executed.
The DXE drivers that comply with the EFI 1.10 Driver Model do not need to be concerned with this possibility. These drivers simply register the Driver Binding Protocol in the handle database when they are executed. This operation can be performed without the use of any architectural protocols. In connection with registration of the Driver Binding Protocols, a DXE driver may “publish” an API by using the InstallConfigurationTable function. This published drivers are depicted by API's 1118. Under EFI, publication of an API exposes the API for access by other firmware components. The API's provide interfaces for the Device, Bus, or Service to which the DXE driver corresponds during their respective lifetimes.
The BDS architectural protocol executes during the BDS phase. The BDS architectural protocol locates and loads various applications that execute in the pre-boot services environment. Such applications might represent a traditional OS boot loader, or extended services that might run instead of, or prior to loading the final OS. Such extended pre-boot services might include setup configuration, extended diagnostics, flash update support, OEM value-adds, or the OS boot code. A Boot Dispatcher 1120 is used during the BDS phase to enable selection of a Boot target, e.g., an OS to be booted by the system.
During the TSL phase, a final OS Boot loader 1122 is run to load the selected OS. Once the OS has been loaded, there is no further need for the Boot Services 1106, and for many of the services provided in connection with DXE drivers 1104 via API's 1118, as well as DXE Services 1206A. Accordingly, these reduced sets of API's that may be accessed during OS runtime are depicted as API's 1116A, and 1118A in
As shown in
Accordingly, a portion of the BFD's (or an auxiliary firmware storage device's) memory space may be reserved for storing persistent data, including variable data. In the case of flash devices and the like, this portion of memory is referred to as “NVRAM.” NVRAM behaves in a manner similar to conventional random access memory, except that under flash storage schemes individual bits may only be toggled in one direction. As a result, the only way to reset a toggled bit is to “erase” groups of bits on a block-wise basis. In general, all or a portion of NVRAM may be used for storing variable data; this portion is referred to as the variable repository.
As discussed above, under EFI, variables are defined as key/value pairs that consist of identifying information plus attributes (the key) and arbitrary data (the value). These key/value pairs may be stored in and accessed from NVRAM via the Variable Services. There are three variable service functions: GetVariable, GetNextVariableName, and SetVariable. GetVariable returns the value of a variable. GetNextVariableName enumerates the current variable names. SetVariable sets the value of a variable. Each of the GetVariable and SetVariable functions employs five parameters: VariableName, VendorGuid (a unique identifier for the vendor), Attributes (via an attribute mask), DataSize, and Data. The Data parameter identifies a buffer (via a memory address) to write or read the data contents of the variable from. The VariableName, VendorGuid parameters enable variables corresponding to a particular system component (e.g., add-in card) to be easily identified, and enables multiple variables to be attached to the same component.
Under a database context, the variable data are stored as 2-tuples <Mi, Bi>, wherein the data bytes (B) are often associated with some attribute information/metadata (M) prior to programming the flash device. Metadata M is implementation specific. It may include information such as “deleted”, etc., in order to allow for garbage collection of the store at various times during the life of the variable repository. Metadata M is not exposed through the Variable Services API but is just used internally to manage the store.
In accordance with aspects of some embodiments of the invention, the foregoing variable data storage and access scheme is augmented in a manner that supports access to and storage of configuration, performance, and event data. In general, the associated variable data may be stored in non-volatile memory or may be stored in system memory. Moreover, in some embodiments, these data may be written to a pre-allocated partition on a disk drive that is configured to be hidden to the operating systems hosted by the virtual machines running on the platform. As a result, these data may be accessed in the event of an operating system crash without need for operating system file system support.
Under firmware-based embodiments that do not employ separate processing facilities for handling run-time firmware separate from run-time software (e.g., that do not employ a management core or the like), there is a need for a mechanism to “switch” the processing mode to jump from processing software to processing firmware, and back to processing the software. One mechanisms for performing these functions in some Intel processors employs the System Management Mode (SMM) (for Intel 32-bit microprocessors, i.e., IA-32 processors), or the native mode of an Itanium-based processor with a Processor Management Interrupt (PMI) signal activation. In general, the state of execution of code in IA32 SMM is initiated by a System Management Interrupt (SMI) signal and that in Itanium™ processors is initiated by a PMI signal activation; for simplicity, these are generally referred to as SMM herein.
Details for one mechanism for implementing an extensible SMM framework that may be employed by embodiments of the invention are disclosed in U.S. Pat. No. 6,978,018 (SMM Loader and Execution Mechanism for Component Software for Multiple Architectures), which is incorporated by reference herein in its entirety. The mechanism allows for multiple drivers, possibly written by different parties, to be installed for SMM operation. An agent that registers the drivers runs in the EFI (Extensible Firmware Interface) boot-services mode (i.e., the mode prior to operating system launch) and is composed of a CPU-specific component that binds the drivers and a platform component that abstracts chipset control of the xMI (PMI or SMI) signals. The API's (application program interfaces) providing these sets of functionality are referred to as the SMM Base and SMM Access Protocol, respectively. These API's enable run-time software to call SMM facilities during run-time operations, causing one or more appropriate event handlers to be loaded in executed in a manner transparent to the run-time software. Such handlers can generally be employed for supporting firmware-based agent operations in accordance with some of the embodiments disclosed herein.
In accordance with further aspects of some embodiments, user interfaces are provided to enable administrators and the like to manage the operation of the various operating systems running on the platform, including both host operating systems and the operating systems running on software- and/or firmware-based virtual machine platforms. In some embodiments a unified user interface (e.g., unified console) is provided to enable management of the operating systems from a single viewpoint, providing operating system information such as resource allocation and consumption, performance measures, event data, etc. Moreover, the unified user interface, in alternative embodiments, may be accessed from a host operating system or one of the virtual machines. In this manner, such information may be provided using a console or the like that is familiar to administrators who typically work with a given type of operating system, but may not be familiar with other types of operating systems. The user interface also enables administrators to reconfigure platform and virtual machine resources.
By way of example,
The various data that may be displayed via the user interface are typically access via mechanisms particular to each type of operating system. For example, WINDOWS-based operating systems provide API's and the like for accessing system operating data, such as exemplified in the figures shown herein. Likewise, LINUX-based operating systems also provide API's and the like for accessing similar data. Such API's are also similarly provided by other types of operating systems not specifically shown herein, such as UNIX-based operating systems. Notably, the API's for the different types of operating systems are different. In view of this difference, the data associated with each operating system instance is gathered by the associated agent, and then the data may be aggregated in a single viewpoint by passing information between applicable agents. In this matter, the information shown in the user interface of
As shown in
a shows an illustration of an interface for configuring and re-configuring virtual machine memory allocation. The arrows are user-controllable elements of the interface; the user clicks on either arrow and moves the arrow to the left or right via a user input device such as a mouse or touchpad to adjust the virtual machine memory allocation. The user, given sufficient administrative privileges, can also modify the total amount of memory allocated from the host to all virtual machines; uniquely, this action can be performed via the user interface shown in
b shows an illustration of an alternative interface for configuring virtual machine memory allocation. In this interface, the user enters the amount of memory to be allocated to each virtual machine into the edit boxes. The user can enter an exact number in megabytes of memory, or a percentage, which the software agent then converts into a memory allocation via an appropriate API with its host operating system.
The machine instructions comprising the software components for enabling various agent operations discussed herein will likely be distributed on floppy disks or CD-ROMs (or other memory media) and stored in the hard drive until loaded into random access memory (RAM) for execution by the CPU. In some instance, all or a portion of the machine instructions may be pre-loaded on a computing platform (e.g., server or the like). Optionally, all or a portion of the machine instructions may be loaded via a computer network.
The firmware instructions comprising the firmware-based components will generally be stored on corresponding non-volatile rewritable memory devices, such as flash devices, EEPROMs, and the like. Firmware instructions embodied as a carrier wave may also be downloaded over a network and copied to a firmware device (e.g., “flashed” to a flash device), or may be originally stored on a disk media and copied to the firmware device.
Thus, embodiments of this invention may be used as or to support firmware and software instructions executed upon some form of processing core (such as the CPU of a computer) or otherwise implemented or realized upon or within a machine-readable medium. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium can include such storage means such as a read only memory (ROM); a magnetic disk storage media; an optical storage media; and a flash memory device, etc. In addition, a machine-readable medium can include propagated signals such as electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.).
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.