The present disclosure generally relates to the field of computing. More particularly, an embodiment generally relates to optimizing boot-time peak power consumption for server and/or rack systems.
When designing the power budget for a rack system's power supply, designers account for the maximum possible power consumption, which usually happens at server boot time. The worst case is when all mounted servers in a rack are powered up or rebooted at the same time. A server's peak power consumption happens only at some specific moments during boot process and may last tens of seconds and generally no longer than minutes. As such, a rack's power supply has to be capable enough to serve this peak power moment even though such usage is infrequent and for a relatively short duration. This raises the power supply cost and makes the rarely used headroom capacity a waste of resources.
The detailed description is provided with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, various embodiments may be practiced without the specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments. Further, various aspects of embodiments may be performed using various means, such as integrated semiconductor circuits (“hardware”), computer-readable instructions organized into one or more programs (“software”), or some combination of hardware and software. For the purposes of this disclosure reference to “logic” shall mean either hardware, software, firmware (FM), or some combination thereof.
Some embodiments provide techniques for optimizing boot-time peak power consumption for server and/or rack systems. Moreover, techniques discussed herein with reference to a “rack” system may be also applied to other types of server configurations. Also, as discussed above, when designing the power budget for a rack system's power supply (also referred to as a PSU or Power Supply Unit), designers account for the maximum possible power consumption. This in turn raises the electricity bill a server owner has to pay and the rack PSU cost, and makes the rarely used headroom capacity a waste of resources. To this end, an embodiment provides a way to lower a rack's peak power consumption without compromising each server's boot performance. This will in turn allow for the use of a lower capacity and cheaper rack PSU. Furthermore, costs may be reduced for the PSU, through power consumption reduction, and/or for rack space (especially when we consider how much can be saved in modern data centers where tens of thousands of racks are deployed).
In some embodiments, information regarding when and which BIOS (Basic Input Output System) module causes how much power consumption on each server is identified and logged/stored. Based on this information, it is determined how to coordinate among all target servers to adjust module execution sequence on each server and as a result lower the overall peak power consumption for all target servers during their respective boot process. For example, boot data may be automatically collected and the information used to compute and provide results to optimize boot sequence on target servers without human intervention. Such an approach would be highly productive and may be applied on any scale of servers with any hardware configurations, without reducing boot performance.
Moreover, certain initialization ordering during the boot process may have to be maintained, e.g., to maintain operational correctness. For example, the memory controller may need to be initialized before the memory to allow for access to the memory.
As discussed herein, a BIOS module refers to a component (such as software components/logic discussed herein with reference to various computing systems, including those of
Furthermore, while some embodiments are discussed with server/rack systems, embodiments are not limited to such high volume architectures and may be applied to smaller systems, e.g., with multiple processors or other components that use significantly more power during boot time than during runtime.
To describe details of various embodiments, assume a simplified rack system with two servers mounted (Server 1 and Server 2 shown in
Referring to
The start/end time and power consumption of each module on each server can all be determined from a boot log. Rack power consumption is then the sum of power consumption of server 1 and server 2. So, when both servers are powering up, the rack peak power consumption occurs at [18, 25], the peak value is 18+17=35. It is when module C on server 1 and module Z on server 2 are executed.
To this end, an embodiment optimizes the module execution sequence on each server. For example, in the case of
As can be seen in
The examples of
Referring to
At an operation 308, a new module dispatch sequence is determined for every involved server(s) (e.g., based on the computations/determinations of operation 306). At an operation 310, each of the dispatch sequence of operation 308 is sent back to the corresponding server (and the dispatch sequence information is stored in a storage unit, which is either local to the corresponding server or otherwise accessible by the corresponding server during its boot process (such as in flash or other type of non-volatile memory)). At an operation 312, next time any of the server(s) of operation 310 boot or reboot, the new module dispatch sequence of operation 308 will be applied.
Referring to
At an operation 322, two servers A and B, are picked from all the servers, where server A has #J modules and server B has #K modules. At an operation 324, for A and B, an optimized execution sequence is computed which can generate lower peak power consumption for A and B. The generated new timeline Q has the illustrated time quantums. At an operation 326, it is determined whether all involved servers are done.
As long as all servers are not done at operation 326, at an operation 328, a server R from the rest of the servers (all servers other than A and B) is picked. Then, this new R server is treated as server A in the former operation 324 as shown in
Once all servers are done, as determined at operation 326, an operation 334, the optimized module dispatch sequence for all servers has been found and are sent to each server at operation 336.
Referring to
If the current peak power is lower than any previously determined peak powers for module H (e.g., as determined at operation 360), then server B's current execution sequence is recorded as the optimal sequence for server B at an operation 362; otherwise, it is determined whether all speculative start points for H module have been considered at an operation 364. If other speculative start points remain for H module, method 324 resumes at operation 354. Otherwise, at an operation 366, server B's best module execution sequence is used as its new execution sequence (at this point server A and server B have an optimize module execution sequence). At an operation 368, the generated new timeline for servers A and B are recorded, as shown in box 368 of
Moreover, the processors 402 may have a single or multiple core design. The processors 402 with a multiple core design may integrate different types of processor cores on the same integrated circuit (IC) die. Also, the processors 402 with a multiple core design may be implemented as symmetrical or asymmetrical multiprocessors. Additionally, the operations discussed with reference to
For example, memory 412 may store the information discussed with reference to
A chipset 406 may also communicate with the interconnection network 404. The chipset 406 may include a Graphics and Memory Control Hub (GMCH) 408. The GMCH 408 may include a memory controller 410 that communicates with a memory 412. The memory 412 may store data, including sequences of instructions, that may be executed by the CPU 402, or any other device included in the computing system 400. In one embodiment, the memory 412 may include one or more volatile storage (or memory) devices such as random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), or other types of storage devices. Nonvolatile memory may also be utilized such as a hard disk. Additional devices may communicate via the interconnection network 404, such as multiple CPUs and/or multiple system memories.
The GMCH 408 may also include a graphics interface 414 that communicates with a display device 416. In one embodiment, the graphics interface 414 may communicate with the display device 416 via an accelerated graphics port (AGP) or Peripheral Component Interconnect (PCI) (or PCI express (PCIe) interface). In an embodiment, the display 416 (such as a flat panel display) may communicate with the graphics interface 414 through, for example, a signal converter that translates a digital representation of an image stored in a storage device such as video memory or system memory into display signals that are interpreted and displayed by the display 416. The display signals produced by the display device may pass through various control devices before being interpreted by and subsequently displayed on the display 416.
A hub interface 418 may allow the GMCH 408 and an input/output control hub (ICH) 420 to communicate. The ICH 420 may provide an interface to I/O device(s) that communicate with the computing system 400. The ICH 420 may communicate with a bus 422 through a peripheral bridge (or controller) 424, such as a peripheral component interconnect (PCI) bridge, a universal serial bus (USB) controller, or other types of peripheral bridges or controllers. The bridge 424 may provide a data path between the CPU 402 and peripheral devices. Other types of topologies may be utilized. Also, multiple buses may communicate with the ICH 420, e.g., through multiple bridges or controllers. Moreover, other peripherals in communication with the ICH 420 may include, in various embodiments, integrated drive electronics (IDE) or small computer system interface (SCSI) hard drive(s), USB port(s), a keyboard, a mouse, parallel port(s), serial port(s), floppy disk drive(s), digital output support (e.g., digital video interface (DVI)), or other devices.
The bus 422 may communicate with an audio device 426, one or more disk drive(s) 428, and a network interface device 430 (which is in communication with the computer network 403). Other devices may communicate via the bus 422. Also, various components (such as the network interface device 430) may communicate with the GMCH 408 in some embodiments. In addition, the processor 402 and the GMCH 408 may be combined to form a single chip and/or a portion or the whole of the GMCH 408 may be included in the processors 402 (instead of inclusion of GMCH 408 in the chipset 406, for example). Furthermore, the graphics accelerator 416 may be included within the GMCH 408 in other embodiments.
Furthermore, the computing system 400 may include volatile and/or nonvolatile memory (or storage). For example, nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive (e.g., item 428), a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto-optical disk, or other types of nonvolatile machine-readable media that are capable of storing electronic data (e.g., including instructions).
In an embodiment, components of the system 400 may be arranged in a point-to-point (PtP) configuration such as discussed with reference to
More specifically,
As illustrated in
In an embodiment, the processors 502 and 504 may be one of the processors 402 discussed with reference to
At least one embodiment may be provided within the processors 502 and 504. Also, the operations discussed with reference to
Other embodiments, however, may exist in other circuits, logic units, or devices within the system 500 of
The chipset 520 may communicate with a bus 540 using a PtP interface circuit 541. The bus 540 may communicate with one or more devices, such as a bus bridge 542 and 110 devices 543. Via a bus 544, the bus bridge 542 may communicate with other devices such as a keyboard/mouse 545, communication devices 546 (such as modems, network interface devices, or other communication devices that may communicate with the computer network 403), audio I/O device 547, and/or a data storage device 548. The data storage device 548 may store code 549 that may be executed by the processors 502 and/or 504.
In some embodiments, one or more of the components discussed herein can be embodied as a System On Chip (SOC) device.
As illustrated in
The I/O interface 640 may be coupled to one or more I/O devices 670, e.g., via an interconnect and/or bus such as discussed herein with reference to other figures. I/O device(s) 670 may include one or more of a keyboard, a mouse, a touchpad, a display (e.g., display 416), an image/video capture device (such as a camera or camcorder/video recorder), a touch screen, a speaker, or the like.
The following examples pertain to further embodiments. Example 1 includes an apparatus comprising: logic to determine a module execution sequence for a computing device to indicate a sequence of module execution during a boot process of the computing device, wherein logic to determine the module execution sequence is to determine the module execution sequence based at least partially on power consumption data and timeline data for each module of the computing device during the boot process of the computing device. Example 2 includes the apparatus of example 1, wherein logic to determine the module execution sequence for the computing device is to determine a plurality of module execution sequences for a plurality of computing devices based on power consumption data and timeline data for each module of each of the plurality of the computing devices during boot process of the plurality of computing devices. Example 3 includes the apparatus of example 2, wherein the plurality of computing devices are to be coupled via a rack system. Example 4 includes the apparatus of example 1, wherein the module is capable of having its execution sequence modified during the boot process. Example 5 includes the apparatus of example 1, wherein logic to determine the module execution sequence for the computing device is to determine the module execution sequence based on one or more speculative start points for each module of the computing device. Example 6 includes the apparatus of example 1, further comprising one or more sensors to detect the power consumption data and timeline data during the boot process. Example 7 includes the apparatus of example 1, wherein the module is capable of having its execution sequence modified during the boot process via a Basic Input Output System (BIOS). Example 8 includes the apparatus of example 1, wherein the module is capable of having its execution sequence modified during the boot process via a Unified Extensible Firmware Interface. Example 9 includes the apparatus of any of examples 1 to 8, wherein the logic, memory, and one or more processor cores are on a single integrated circuit device.
Example 10 includes a method comprising: determining a module execution sequence for a computing device to indicate a sequence of module execution during a boot process of the computing device, wherein determining the module execution sequence determines the module execution sequence based at least partially on power consumption data and timeline data for each module of the computing device during the boot process of the computing device. Example 11 includes the method of example 10, further comprising determining a plurality of module execution sequences for a plurality of computing devices based on power consumption data and timeline data for each module of each of the plurality of the computing devices during boot process of the plurality of computing devices. Example 12 includes the method of example 11, wherein the plurality of computing devices are coupled via a rack system. Example 13 includes the method of example 10, wherein the module is capable of having its execution sequence modified during the boot process. Example 14 includes the method of example 10, further comprising determining the module execution sequence based on one or more speculative start points for each module of the computing device. Example 15 includes the method of example 10, further comprising one or more sensors detecting the power consumption data and timeline data during the boot process. Example 16 includes the method of example 10, further comprising the module having its execution sequence modified during the boot process via a Basic Input Output System (BIOS). Example 17 includes the method of example 10, further comprising the module having its execution sequence modified during the boot process via a Unified Extensible Firmware Interface.
Example 18 includes a computing system comprising: one or more Central Processing Unit (CPU) cores; one or more Graphics Processor Unit (GPU) cores, wherein the one or more CPU or GPU cores are to be supplied power from a power supply unit; logic to determine a module execution sequence for a computing device to indicate a sequence of module execution during a boot process of the computing device, wherein the power supply unit is to provide power to each module of the computing device during the boot process of the computing device, wherein logic to determine the module execution sequence is to determine the module execution sequence based at least partially on power consumption data and timeline data for each module of the computing device during the boot process of the computing device. Example 19 includes the system of example 18, wherein logic to determine the module execution sequence for the computing device is to determine a plurality of module execution sequences for a plurality of computing devices based on power consumption data and timeline data for each module of each of the plurality of the computing devices during boot process of the plurality of computing devices. Example 20 includes the system of example 18, wherein the module is capable of having its execution sequence modified during the boot process. Example 21 includes the system of example 18, wherein logic to determine the module execution sequence for the computing device is to determine the module execution sequence based on one or more speculative start points for each module of the computing device. Example 22 includes the system of example 18, further comprising one or more sensors to detect the power consumption data and timeline data during the boot process. Example 23 includes the system of example 18, wherein the module is capable of having its execution sequence modified during the boot process via a Basic Input Output System (BIOS).
Example 24 includes an apparatus comprising means for performing a method as provided in any of examples 10 to 17.
Example 25 includes a machine-readable storage including machine-readable instructions, when executed, to implement a method or realize an apparatus as provided in any of examples 10 to 17.
Example 26 includes a computer-readable medium comprising one or more instructions that when executed on a processor configure the processor to perform one or more operations to: determine a module execution sequence for a computing device to indicate a sequence of module execution during a boot process of the computing device, wherein determining the module execution sequence determines the module execution sequence based at least partially on power consumption data and timeline data for each module of the computing device during the boot process of the computing device. Example 27 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to cause determining a plurality of module execution sequences for a plurality of computing devices based on power consumption data and timeline data for each module of each of the plurality of the computing devices during boot process of the plurality of computing devices. Example 28 includes the computer-readable medium of example 26, wherein the module is capable of having its execution sequence modified during the boot process. Example 29 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to cause determining the module execution sequence based on one or more speculative start points for each module of the computing device. Example 30 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to cause one or more sensors detecting the power consumption data and timeline data during the boot process. Example 31 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to cause the module having its execution sequence modified during the boot process via a Basic Input Output System (BIOS). Example 32 includes the computer-readable medium of example 26, further comprising one or more instructions that when executed on the processor configure the processor to perform one or more operations to cause the module having its execution sequence modified during the boot process via a Unified Extensible Firmware Interface. Example 33 includes the apparatus of any of examples 1 to 6 or 8, wherein the module is capable of having its execution sequence modified during the boot process via a Basic Input Output System (BIOS).
In various embodiments, the operations discussed herein, e.g., with reference to
Additionally, such computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals provided in a carrier wave or other propagation medium via a communication link (e.g., a bus, a modem, or a network connection).
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, and/or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase “in one embodiment” in various places in the specification may or may not be all referring to the same embodiment.
Also, in the description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. In some embodiments, “connected” may be used to indicate that two or more elements are in direct physical or electrical contact with each other. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.
Thus, although embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that claimed subject matter may not be limited to the specific features or acts described. Rather, the specific features and acts are disclosed as sample forms of implementing the claimed subject matter.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2013/084441 | 9/27/2013 | WO | 00 |