Computer system components, such as central processing units (CPUs), chipsets, graphics cards, and hard drives, produce large amounts of heat during operation. This heat must be dissipated in order to keep these components within safe operating temperatures. Overheated components generally exhibit a shorter life span and may also cause malfunction of the computer system.
The risk of overheating increases with increasing density of computer system components. In a typical blade server, a large number of heat generating blades may be closely placed in a single system enclosure. Limited open space in the system enclosure results in reduced air circulation and correspondingly reduced heat dissipation.
Because of these heat loads, many blade server enclosures include a thermal management system that uses both active (i.e., convection) and passive (e.g., heat sinks) cooling. Convection cooling generally relies on one or more fans that operate at either fixed or variable speeds. A variable speed fan generally is best for matching air flow to heat load. However, the setting of this variable fan speed presents a design problem. Ideally, the cooling fans would operate at a speed that does not waste energy while maintaining the blades at the optimum operating temperature. More specifically, the blades may be cooled simply by operating the fans at a constant high speed. This approach causes a waste of energy when the blades are not operating at their maximum capacity. One approach is to use temperature-sensing devices in the fans, where the temperature-sensing devices directly measure how much heat the server generates in the exhaust air stream. When the fan detects that the server exhaust temperatures are increasing, the fan's microcontroller can increase fan speed. However, this approach has its limitations because servers can heat up very quickly, and the server's ROM could trip on a thermal shutdown before the fans could create enough additional cooling.
A computer-implemented method optimizes air mover performance to minimize temperature variations in a computer system enclosure. The computer system includes one or more modules and at least one air mover. The method includes collecting thermal data from the modules; using the collected thermal data, determining a maximum value of the thermal data; comparing the determined maximum value of the thermal data to a current maximum value of the thermal data; using the determined and the current maximum values, determining a desired operating characteristic of the air mover; and adjusting the air mover to the desired operating characteristic.
The detailed description will refer to the following drawings, in which like numerals refer to like elements, and in which:
To remove heat from a computer system enclosure, a cooling system, and method of operation thereof, are disclosed. The computer system includes one or more modules, installed in an enclosure, that generate heat as a result of operation. The cooling system and method rely on the use of one or more air movers installed within, or adjacent to, the computer system enclosure. In an embodiment, the computer system is a blade server, the modules are blades, and the air movers are fans.
In an embodiment, the cooling fans 125 are pulse-width modulation (PWM) fans. PWM fans are well known to those skilled in the art. The speed of a PWM fan is controlled by a PWM control signal. The fan speed response to the PWM control signal is a continuous and monotonic function of the duty cycle of the signal, from 100 percent to the minimum specified revolutions per minute (RPM).
As used hereinafter, the term “PWM fan” or “fan” refers not only to fans attached to a computer chassis, but may also be intended to signify any other computer fans, such as CPU fans, graphics processing unit (GPU) fans, chipset fans, power supply unit (PSU) fans, hard disk drive (HDD) fans, or peripheral component interconnect (PCI) slot fans. PWM fans can be of various sizes and power. Common computer fans have sizes range between 40 mm to 120 mm in diameter.
Although the fans 125 are shown as actual, physical fans, the cooling system (described later) used in conjunction with the server 100 may invoke the concept of virtual PWM fans. The concept of virtual PWM fans will be described later.
Although
The fans 125 shown in
Since the enclosure 120 may contain, for example, 10 fans and 16 server blades, determination of the specific fan speed needed to cool the server blades, and then setting each fan to that speed is complicated. More specifically, each type of blade 110 may have its own unique cooling requirements. These cooling requirements depend on various factors including the number of processors on the blade, the amount of installed memory, the number of installed hard drives, and blade utilization.
To take into account all the variants, the on-board administrator module 130 incorporates a thermal management program that is used to monitor all aspects of the enclosure 120. Thermal control of the enclosure 120 is accomplished by the module 130 polling, using an intelligent platform management interface (IPMI) (not shown), the inserted blades 110 for a “virtual fan” reading. A virtual fan reading is simply the fan reading that a particular server blade 110 would need in order to ensure that server blade was adequately cooled, given its specific operating condition. That is, a “virtual” fan reading is a calculated fan speed that is based on some measurable factor associated with the blade. If the blade actually had a fan, the real fan would be able to cool the blade under its current load by running at the “virtual” fan speed. These fan readings may be determined by the blade's management module (not shown) and may be based on a temperature sensor reading on the specific blade 110 or by some other means of assessing blade operation, such as percent of total processor utilization on the blade 110, for example. These virtual fan readings may be provided by the blade's management module as a “virtual” PWM fan reading. The virtual PWM fan readings may be contained in memory provided with each blade's management module, and such readings may be chosen to reflect the unique characteristics (e.g., number of processors) of that particular blade or blade type. The on-board administrator module 130 uses the collection of these virtual PWM readings to select a specific RPM value for the fan speed. If the fans 125 are not currently operating at the determined RPM, the on-board administrator module 130 writes the necessary command to each fan to establish the new RPM fan speed.
Coupled to the monitor module 210 is comparison module 230, which uses comparison routines 240 to determine if fan speed should be adjusted. The module 230 also accesses database 250 to retrieve data from the PWM/RPM tables 150 in order to determine the correct “new” RPM for the fans 125, should the executed comparison routine 240 indicate a new (higher or lower) fan speed is required. Finally, coupled to the comparison module 130 is action module 150, which is used to write commands to the fans' microcontrollers to adjust fan speed, as needed.
In determining the correct “new” RPM for the fans 125, the thermal management program 200 receives and reads the virtual PWM fan readings from all the blades 110 in the enclosure 120. The received virtual PWM fan readings are those that the individual blade's management module has calculated as the ideal setting that a fan should run at in order to cool the blade at the time the reading was requested from the polling module 220. The comparison module 230 reads the virtual PWM fan readings from all the blades and selects the maximum reading. The comparison module 230 then compares the just read maximum virtual PWM fan reading to the previous maximum reading. If the new maximum reading is greater than the previous maximum reading, the comparison module 230 uses the table 150 to look up a RPM value that maps to the new maximum PWM value. If the looked up RPM value is different from the current fan RPM setting, the action module 260 writes a command to each fan to establish the new RPM value. If the new maximum PWM reading is less than the previous PWM maximum value, the new, lower maximum PWM also is mapped to an RPM value. However, when a lower PWM value is mapped to a RPM value, a hysteresis value is applied to the PWM/RPM look up table 150. The hysteresis value prevents small increasing or decreasing PWM changes from causing constant fan RPM changes.
Although the on-board administrator module 130 sets the RPM values of the fans 125, the module 130 does not verify that the fans 125 actually go to the requested RPM. Each of the fans 125 includes a PIC microcontroller. The PIC microcontroller sends an interrupt signal to the module 130 if its associated fan cannot reach the requested RPM within a few seconds. The PIC microcontroller also send an interrupt signal whenever the PIC microcontroller detects any type of internal fan hardware problem.
In an embodiment, the on-board administrator module 130 sends the same speed control signal to all fans 125 installed in the enclosure 120. In another embodiment, the server blades 110 and fans 125 may be grouped into zones, and fan speed may be determined based on virtual PWM fan readings on a zone by zone basis. That is, fans 125 in one zone may operate at RPM different from fans 125 in another zone. In yet another embodiment, temperature data from other modules (i.e., from components other than blades) may be used to establish fan speed.
In block 325, the comparison module 230 compares the newly determined maximum virtual PWM fan reading to the current maximum. If the new maximum is greater than the current maximum, the method 300 moves to block 330 and the comparison module 230, using the PWM/RPM table 150, looks up the RPM corresponding to the new maximum virtual PWM reading. In block 335, the comparison module 230 determines if the looked up RPM differs from the current fan RPM (theoretically, any such difference would be such that the looked up RPM is greater than the current RPM). In block 335, if the RPM do not differ, the method 300 moves to block 370 and ends. If in block 335, the RPM differ, the method 300 moves to block 360 and the action module 260 writes a command to each fan 125 to achieve the looked up RPM. The method 300 then ends, block 370.
Returning to block 325, if the new maximum virtual PWM fan reading is not greater than the current maximum, the method 300 moves to block 340 and the comparison module 230 looks up the RPM corresponding to the new maximum virtual PWM fan reading (which is less than or equal to the current maximum). The method 300 then moves to block 345 where the looked up RPM is compared to the fans' current RPM. If the RPMs differ, the method moves to block 350 and the comparison module 230 determines if the RPM difference is within the range of the hysteresis value. If the RPM difference is greater than the hysteresis value, the method moves to block 360, and the action module writes a command to each fan 125 indicating the desired new RPM (which should be less than the current RPM). The method 300 then ends, block 370. If in block 350, the RPM difference is within the hysteresis range, the method moves to block 370 and ends.
Returning to block 345, if the RPMs do not differ, the method 300 moves to block 370 and ends.
As noted above, the thermal control program 200 of
In the various embodiments in accordance with the present invention, embodiments are implemented as a method, system, and/or apparatus. As one example, exemplary embodiments are implemented as one or more computer software programs to implement the methods described herein. The software is implemented as one or more modules (also referred to as code subroutines, or “objects” in object-oriented programming). The location of the software will differ for the various alternative embodiments. The software programming codes, for example, is accessed by a processor or processors of the computer or server from long-term storage media of some type, such as a CD-ROM drive or hard drive. The software programming code is embodied or stored on any of a variety of known media for use with a data processing system or in any memory device such as semiconductor, magnetic and optical devices, including a disk, hard drive, DC-ROM, ROM, etc. The code is distributed on such media, or is distributed to users from the memory or storage of one computer system over a network of some type to other computer systems for use by users of such other systems. Alternatively, the programming code is embodied in the memory (such as memory of the handheld portable electronic device) and accessed by the processor using the bus. The techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be further discussed herein.
The terms and descriptions used herein are set forth by way of illustration only and are not meant as limitations. Those skilled in the art will recognize that many variations are possible within the spirit and scope of the invention as defined in the following claims, and their equivalents, in which all terms are to be understood in their broadest possible sense unless otherwise indicated.
This application claims priority from U.S. Provisional Application 60/943,192 filed Jun. 11, 2007 entitled “METHOD OF OPTIMIZING AIR MOVER PERFORMANCE CHARACTERISTICS TO MINIMIZE TEMPERATURE VARIATIONS IN A COMPUTING SYSTEM ENCLOSURE” the content of which is incorporated herein in its entirety to the extent that it is consistent with this invention and application.
Number | Name | Date | Kind |
---|---|---|---|
4817865 | Wray | Apr 1989 | A |
5557551 | Craft | Sep 1996 | A |
6041850 | Esser et al. | Mar 2000 | A |
6198245 | Du et al. | Mar 2001 | B1 |
6236184 | Baker | May 2001 | B1 |
6826456 | Irving et al. | Nov 2004 | B1 |
6889908 | Crippen et al. | May 2005 | B2 |
6961242 | Espinoza-Ibarra et al. | Nov 2005 | B2 |
6987370 | Chheda et al. | Jan 2006 | B2 |
7098617 | Oljaca et al. | Aug 2006 | B1 |
7167778 | Yazawa et al. | Jan 2007 | B2 |
7310738 | Bhagwath et al. | Dec 2007 | B2 |
7337018 | Espinoza-Ibarra et al. | Feb 2008 | B2 |
7349828 | Ranganathan et al. | Mar 2008 | B1 |
7373268 | Viredaz et al. | May 2008 | B1 |
7375486 | Ku et al. | May 2008 | B2 |
7394217 | Marando | Jul 2008 | B2 |
7425812 | Goldberg | Sep 2008 | B2 |
7545617 | Foster, Sr. | Jun 2009 | B2 |
7721120 | Bodner et al. | May 2010 | B2 |
7742844 | Coxe, III | Jun 2010 | B2 |
7751910 | Gross et al. | Jul 2010 | B2 |
8140196 | Rozzi et al. | Mar 2012 | B2 |
20030115000 | Bodas | Jun 2003 | A1 |
20030137267 | Blake | Jul 2003 | A1 |
20030193777 | Friedrich et al. | Oct 2003 | A1 |
20050049729 | Culbert et al. | Mar 2005 | A1 |
20070055793 | Huang et al. | Mar 2007 | A1 |
20070098374 | Fujiwara | May 2007 | A1 |
20070130481 | Takahashi et al. | Jun 2007 | A1 |
20070162160 | Chang et al. | Jul 2007 | A1 |
20070180117 | Matsumoto et al. | Aug 2007 | A1 |
20070234124 | Chen | Oct 2007 | A1 |
20070255430 | Sharma et al. | Nov 2007 | A1 |
20070260417 | Starmer et al. | Nov 2007 | A1 |
20070297893 | Alon et al. | Dec 2007 | A1 |
20080040622 | Duran et al. | Feb 2008 | A1 |
20080147924 | Lambert et al. | Jun 2008 | A1 |
20080205286 | Li et al. | Aug 2008 | A1 |
20080259555 | Bechtolsheim et al. | Oct 2008 | A1 |
20080288193 | Claassen et al. | Nov 2008 | A1 |
20080307134 | Geissler et al. | Dec 2008 | A1 |
20080320136 | Holt et al. | Dec 2008 | A1 |
20090044036 | Merkin | Feb 2009 | A1 |
20090077280 | Anderson et al. | Mar 2009 | A1 |
20100281094 | Holt et al. | Nov 2010 | A1 |
20120030394 | Bird | Feb 2012 | A1 |
20120173755 | Margulis | Jul 2012 | A1 |
Number | Date | Country | |
---|---|---|---|
20080306635 A1 | Dec 2008 | US |
Number | Date | Country | |
---|---|---|---|
60943192 | Jun 2007 | US |