1. Field of the Invention
This invention relates to the cooling of data processing devices, and, more particularly, to the operation of redundant data processing devices in a manner optimizing the cooling of these devices.
2. Summary of the Background Art
For a number of computer components, increases in processing speeds and in component capabilities are increasing the rates at which heat is generated within the components, while advances in component design and in manufacturing processes are decreasing the size of the components. Both of these trends make it increasingly difficult to adequately remove heat from the components as they are operated. For example, increases in the operating speed of microprocessors are pushing the requirements on the speeds at which data is written to and read from computer memory to new levels, while a long-capacity that can occupy a particular space within a computer system.
Conventional methods for increasing the effectiveness with which computer components are cooled include the use of larger cooling devices, such as fans and heatsinks and the utilization of new technologies for cooling, such as heat pipes. Problems with such methods include the additional cost and system space required for their implementation, an increased noise level associated with the use of larger fans or additional fans, and a decrease in reliability resulting from a need to rely on increasingly complex structures of cooling devices.
Due to increasing levels of reliance on data generated and stored within computer systems, a number of methods have been adapted to prevent data loss through redundancy. A common method for providing redundancy relies on mirroring, with the same data being stored in two locations, so that information can be read from one of these locations even if the data in the other location is lost. Multiple locations for data storage may include separate hard disk units, separate locations in system memory, or even separate storage devices connected by a network. Redundancy is also used to provide increased levels of performance through an ability to use multiple instances of redundant resources, such as processors, as well as data storage. What is needed is a method for using redundant resources so that the cooling of components is optimized, preferably without a need to rely on the use of larger or additional conventional cooling devices.
U.S. Pat. No. 5,732,215 describes an array of direct access storage devices (DASDs), such as disk drive devices that includes a temperature sensor for sensing the operating temperature of each DASD. The operating temperatures of the DASDs of the array are equalized by allocating frequently accessed data to drives with relatively low operating temperatures and allocating infrequently accessed data to drives with relatively high operating temperatures. Operating temperature information is used to identify a DASD that has a high probability of failure, so that the DASD can be shut down for replacement prior to actual failure by moving all data from the DASD to one or more other DASDs of the array. What is needed is a method for determining how to store data for optimized cooling without a need to predetermine how frequently the data will be accessed, since such information may be unavailable when the data is stored. Additionally, since many forms of data are read many times as often as they are written, and since such data is often redundantly recorded in different locations, what is needed is a way to control the reading of redundantly recorded data in a manner optimizing the cooling of DASD units.
U.S. Pat. Nos. 5,900,007 and 5,787,462 each describe a system including a large array of small disk files, in which individual disk files are assigned to be used in a manner minimizing the overheating of the disk files during normal operation of the system. In accordance with U.S. Pat. No. 5,900,007, before the disk array is used, it is configured into subsets, called clusters, by the configuration management system in a mapping process calculated to disperse the disk files in each cluster so that all files in the cluster can be active simultaneously without creating a localized thermal overload situation, or “hot spot,” within the disk array. Then, during operation, the power manager also maintains the disk array in conformance with thermal and power constraints to avoid excessive power consumption or thermal overload while keeping active the optimal subset of the disk array based on the storage requests pending at any point in time. In accordance with U.S. Pat. No. 5,787,462, a configuration management subsystem assigns heat producing devices, such as the disk files, to dusters so that the number of devices activated will not create overheating, regardless of which limited set of clusters is activated. The subsystem receives the dimensions of a critical box that defines an arrangement of the devices into cells so that. if a device is assigned to each cell of the critical box and all devices are operated simultaneously. then thermal operating restrictions of the devices will be exceeded.
U.S. Pat. No. 6,470,238 describes a method for controlling device temperature. The method involves determining access rate to a component, comparing the access rate with a predetermined threshold modified by a weighted value and controlling the temperature of the component through corrective action.
U.S. Pat. App. Pub. No. 2003/0125900 A1 describes a system that includes a microprocessor and a thermal control subsystem. The microprocessor includes execution resources to support processing of instructions and consumes power. The microprocessor also includes at least one throttling mechanism to reduce the amount of heat generated by the microprocessor. The thermal control subsystem is configured to estimate an amount of power used by the microprocessor and to control the throttling mechanism based on the estimated amount of current power usage to ensure that junction temperature will not exceed the maximum allowed temperature.
In accordance with a first aspect of the invention, data processing apparatus is provided, including a data source, a number of data processing devices, a number of thermal sensors, and a control device. The data source provides data for processing within the data processing apparatus. The data processing devices are connected to the data source to receive data for processing. Each of the thermal sensors measures and operating temperature of a device in the number of data processing devices and produces a temperature signal indicating the operating temperature measured. The control device operates in response to the temperature signal from each of the thermal sensors to select a data processing device within the number of data processing devices having a temperature cooler than at least one other data processing device within the plurality of data processing devices and to cause data to be processed within a selected data processing device.
Preferably, the control device includes a microprocessor programmed to execute instructions of a temperature tracking routine generating a temperature tracking data structure including data indicating an operating temperature measurement within each of the data processing devices, and to execute instructions of a device allocation routine determining which data processing device processes data from the data source. The data processing apparatus then additionally includes storage holding the temperature tracking data structure.
The data processing devices are understood to include devices, such as computer systems, that perform various functions, including comparisons and arithmetic manipulations on data, devices, such as adapter circuits, that manipulate data signals so that the data can be moved from one hardware interface to another, and data storage devices, such as hard disk drive devices, that store data for later retrieval. In each case, the data processing devices are understood to be redundant, in that two or more devices are available to perform the desired function. While the control device is provided to choose among the redundant devices on the basis of measured operating temperatures, the control device may also determine if the redundant devices are available and suitable for the intended purpose.
In accordance with another aspect of the invention, a method for allocating redundant data processing devices is provided, with the method including generating a temperature tracking data structure and allocating one of the data processing devices. The temperature tracking data structure includes a data record indicating an operating temperature of each of the devices. The device allocated for processing data has an operating temperature within the temperature tracking data structure to be lower than at least one other of the data processing devices. The allocation of a data processing device within a number of data processing devices is understood to mean the enablement of the particular device to perform the data processing function. Such enablement may be provided by conditioning the device to perform the processing function or by directing a data signal to the particular device.
In accordance with yet another aspect of the invention, a method for storing and retrieving data within redundant data storage devices. In this context, data storage devices are considered to be a form of data processing devices. The method includes writing data to be stored in two data storage devices within the redundant data storage devices, determining which of two redundant data storage devices in which data is to be retrieved is cooler, and reading the data to be retrieved from the cooler of the two redundant data storage devices. This method may also include determining which data storage devices are the two coolest data storage devices from data stored within the temperature tracking data structure and writing data to these two coolest data storage devices.
Alternately, the redundant data storage devices may be arranged in pairs, with the temperature tracking data record including a data record associated with each of the pair of data storage devices. The data records may then be stored in order of the measured operating temperature of the hotter device within each pair, so that data is written to the pair of data storage devices having in which the hotter storage device is the coolest for all of the pairs of data storage devices.
In accordance with a first version of this embodiment, data is written to the coolest two storage devices 6 that are determined to be available to receive the data. Then, the data is read from the cooler of the two devices in which it has been recorded.
In accordance with a second version of this embodiment, the storage devices the storage devices 6 are arranged in pairs, with the same data being written to both of the storage devices 6 in a pair. The data is written to the pair of storage devices 6 determined to be available to receive the data, for which the hotter of the two storage devices 6 in the pair is determined to be the coolest of the hotter devices within each of the pairs. In this way, the temperature of the hotter of the two devices in each pair is used to determine which pair of devices is to be written to, so that data will not be written to the hottest of the devices in the entire group of devices. For such a determination to be made, there must be at least four storage devices 6, forming two pairs. If there is only one pair, data will always be written to both of its devices. Then, when the data is to be read, it is read from the cooler of the two devices.
In accordance with a third version of this embodiment, the storage devices the storage devices 6 are again arranged in pairs, with the same data being written to both of the storage devices 6 in a pair. Data is written to the storage devices in a conventional manner, without making the recording of data dependent on the determination of which devices are cooler. Then, when data is to be read, it is read from the cooler of the two devices in which it has been written.
In
Data and instructions for programs executing within the microprocessor 12 are read from nonvolatile storage within the data storage devices to be loaded into RAM 16. The portable computer system 10 additionally includes a drive 36 for reading information from a removable medium 38, such as an optical disk, with the drive 36 being connected to the I/O bus 22 through a drive adapter 40. For example, a LAN (local area network) 42 is additionally connected to the I/O bus 22 through a network interface circuit 44.
Program instructions to be executed within the portable computer system 10 may be loaded from the removable medium 38, which forms an example of a computer readable medium, through the drive 36, to be stored in nonvolatile storage within the data storage devices 28 or to be stored for execution within the RAM 26, with magnetic storage within the devices 28 and the RAM 16 additionally forming examples of computer readable media. Alternately, such instructions can be received in the form of a computer data signal embodied on a carrier wave from the LAN 42 through the network interface circuit 44. User inputs to the portable computer system 10 are provided through a keyboard 46 and through a pointing device 48, such as a mouse or touch pad, with the keyboard 46 and pointing device 48 both being attached to the I/O bus 22 through an adapter 50.
In accordance with a preferred version of the invention, at least some of the data stored within non-volatile storage 27 is stored redundantly, within two of the data storage devices 28, so that the data will not be lost if one of the drives 28 fails. During normal operation of the computer system 10, such data can be read from either of the two data storage devices 28 in which it is stored. This capability is used to provide a way to prevent overheating of the data storage devices 28 by reading data from the cooler of the two drives on which it is stored whenever possible. This method is particularly effective when it is used in an application in which data, once written, is read many times, so that most of the use of the data storage devices 28 is encountered during read operations. An example of such an application is an Internet web server, in which the data describing a web page is read for transmission may times after it is written. The temperatures measured by the sensors 32 may additionally be used to determine data storage devices 28 to which data is written, providing an additional means for balancing the operating temperatures within the data storage devices 28 operate. For the data storage devices 28 to be allocated for storing data according to the temperature of the devices 28, there must be at least three such devices 28.
In accordance with the invention, the information stored within nonvolatile storage 27 further includes a temperature tracking routine 60 and a storage allocation routine 62. The temperature tracking routine 60 receives data on a periodic basis from each of the sensors 30 to track the temperature of the data storage device 28 associated the sensor 30 and writes data indicating the measured temperature to a temperature data structure 64, additionally stored within non-volatile storage 27. The storage allocation routine 62 is called when there is a request to read from the data storage devices 28 and whenever there is a request to write data to one of these data storage devices 28. Such data is written redundantly, in two different drives 28. Therefore, when the storage allocation is called to read data, it examines an allocation data structure 66 stored within non-volatile storage 27 to determine the locations of the two data storage devices 28 where the data to be read can be found, and then examines the temperature data structure 64 to determine which of these two data storage devices 28 is cooler. Then the data is then read from the cooler one of these two devices.
Preferably, two temperature data structures 64 are stored within nonvolatile storage 27, with data periodically being written to the two data structures 64 in an alternating fashion so that, while data is being written to one of the data structures 64 by the temperature tracking routine 60, the other data structure 64 can be used by the storage allocation routine 62 to determine which drive that can be used to read or write data is the coolest. Either or both of the routines 60, 62 may be subroutines called by another program or routine and returning to the calling routine upon completion.
After starting in step 86, the temperature tracking routine 60 proceeds to step 88 to wait for a pulse indicating that it is time to determine the operating temperature of each of the data storage devices 28. When it is determined that this pulse is occurring, the routine 60 proceeds to step 90, in which a device counter is reset to start the process of recording temperatures with the first drive 28 to be examined. Then, in step 92, data within the inactive temperature data structure 64 us erased so that this data structure 64 can be used to store the new temperature data. Then, in step 94, the temperature of the data storage 28 indicated by the device counter is measured. Next, in step 96, it is determined whether the temperature measured in step 94 indicates that an “overtemp” condition has been reached, in which the particular data storaged device 28 can continue to operate reliably and without damage. If such an overtemp condition has been reached, the drive device 28 is turned off in step 98. Then, in step 100, the device counter is incremented so that the temperature of the next drive device 28 is measured.
On the other hand, if it is determined in step 96 that an overtemp condition has not been reached, the routine 60 proceeds to step 101, in which data identifying the data storage device 28 for which a temperature has last been measured in step 94 and describing the temperature value that was measured is written to the inactive temperature data structure 64. After the data for this device 28 has been written in step 101, the temperature tracking routine 60 proceeds to step 102, in which it is determined whether the device 28 that has been most recently measured in step 94 is actually the last device for which such a measurement is to be made. If it is not, the routine 60 returns to step 100 to increment the device counter, so that the temperature of the next device 28 is measured in step 94, with this process continuing until it is determined in step 102 that the temperature of the last device 28 has been measured. Then, the active and inactive data structures are switched in step 103. This may be accomplished by switching a pointer that points to the active data structure. Then, the routine 60 returns to step 88 to wait for the next timing pulse, with the most recently measured temperatures being available in the active temperature data structure 64.
If it is determined in step 104 that the data storage device 28 for which an operating temperature has been most recently measured in step 94 is not the first device for such a measurement is made, the temperature tracking routine 60 proceeds to step 106, in which the data from the first data record 70 in the inactive temperature data structure 64 is read. Then, in step 108, the device temperature most recently measured in step 94 is compared with the temperature most recently read from the inactive data structure 64. If the device temperature is greater than the temperature read from the data structure, the routine 60 proceeds from step 110 to read the temperature from the next data record 70 within the inactive data structure 64. This process continues until it is determined in step 114 that data has been read from the last data record 70 within the data structure 64, with the routine 60 returning each time data is read from a new data record 70 to step 108 to compare the device temperature most recently measured in step 94 with the temperature most recently read from a data record 70.
Whenever it is determined in step 108 that the device temperature is not higher than the temperature read from the data record, the routine 60 proceeds to step 116, in which the data records 70 within the inactive temperature data structure 64 below the data record 70 that has just been read are shifted downward to provide a space for writing data associated with the device for which a temperature has been most recently measured in step 94. (This process of shifting data may be achieved by actually moving data or by adjusting a pointer pointing to the location of stored data.) Then, in step 118, the data associated with the device for which a temperature has most recently been measured in step 94 is written in the space provided in the data structure 64 in step 116. If it is determined in step 114 that the data for the last record within the inactive temperature data structure 64 has been read, the data for the device for which a temperature has most recently been measured in step 94 is written in step 120 to form a new record 70 at the end of this data structure 64.
Thus, the data associated with the data storage device 28 that has most recently been measured in step 94 is inserted within or appended to the data within the inactive temperature data structure 64 in an order of ascending device temperature, with data for the coolest device 28 being stored within the first data record 70, and with data for the hottest device 28 being stored within the last data record 70.
Since two of the devices 28 must be selected for recording data, the storage allocation routine 62 proceeds to step 140 to read data from the next data record 70 within the active temperature data structure 64 to determine the device 28 that was determined to be the next coolest of the devices 28 during the last temperature measurements taken by the temperature tracking routine 60. Then, in step 142, a further determination is made of whether this device is available to have the data written to it. If it is ready, this next device 28 is selected in step 144. In either case, the routine 62 proceeds to step 146, in which a determination is made of whether two data storage devices 28 have been selected. If they have not, the routine 62 returns to step 140 to read data from the next data record 70. If two devices 28 have been selected, in step 148, the data to be written is written to the selected devices 28, and data identifying the selected devices 28 is written to a new data record 76 within the allocation data structure 68, so that the appropriate devices 28 can be accessed later when the data is to be read. After this data is stored in step 148, the storage allocation routine 62 ends, returning to the calling routine in step 150.
On the other hand, if it is determined in step 132 that the a request has been made to read data instead of to write data, the storage allocation routine 62 proceeds to step 152, in which data is read from the data record 76 within the allocation data structure 68 corresponding to the data to be read to determine the identities of the two data storage devices 28 in which the data to be read has been stored. Then, in step 154, the first of the devices identified in step 152 is found within a data record 70, with the examination of the data records 70 beginning with the first data record 70 within the active temperature data structure 64 and continuing with successive data records 70, so that the first of the devices 28 found is the cooler of the two such devices 28. This is achieved due to the order in which data records 70 are stored in the temperature data structure 64, which has been explained in detail in reference to
In accordance with a second version of the first embodiment of the invention, the data storage devices 28 within nonvolatile storage 27 are arranged in pairs, with identical data being written to both of the devices 28 in a pair.
The processes occurring during the execution of the temperature tracking routine 60 in accordance with the second version of the first embodiment of the invention are generally as described above in reference to
On the other hand, if it is determined in step 184 that the a request has been made to read data instead of to write data, the storage allocation routine 180 proceeds to step 194, in which data is read from the data record 170 within the allocation data structure 68 corresponding to the data to be read to determine the identities of the two data storage devices 28 in which the data to be read has been stored. Then, in step 196, the data record 160, associated with both of the drive devices 28 storing the data to be read, is found within the temperature data structure 64, with the cooler of these two devices being identified by the location bit stored in the third data field 166 of this data record 160. Next, the requested data is read from the cooler of these two devices in step 198, before the routine 180 ends in step 192.
In accordance with a third version of the first embodiment of the invention, data is redundantly stored within pairs of the drive devices 28 in a conventional manner, without being stored in locations determined according to the temperatures of the various devices. Then, in response to a request to read the previously recorded data, the data is read from the cooler of the two drive devices on which it has been recorded. While some of the effectiveness of the first and second versions of the first embodiment, described above, in directing usage to the coolest devices are thus lost, significant simplification in operation may be achieved. This third version has a particular advantage in an application in which data is written once to be read many times, such as in the storage of data for an Internet web site, in which data written once is read many times in response to client browsers requesting the data.
For example, data is written to paired devices 28, as in the second version of the first embodiment, but without regard to first finding the pair of devices 28 in which the hotter of the two paired devices 28 is the coolest such device within non-volatile data storage 27, to subsequently be read from the cooler of the two devices 28 within the pair. The temperature tracking routine operates generally as described above in reference to
A remote network server including multiple communication adapter circuits connected to a network such as the public switched telephone network and to one another through an SC bus to direct calls among the communication adapter circuits is described in U.S. Pat. No. 6,195,359, the disclosure of which is incorporated herein by reference.
For operation in accordance with the present invention, each of the communication adapter circuits 228 includes a thermal sensor 238, located to sense the operating temperature of a temperature-sensitive region within the circuit 228. The thermal sensors 238 are connected to the I/O bus 22 through a sensor adapter circuit 240.
The server 250 periodically receives data indicating the operating temperature of each of the computer systems 256, maintaining a temperature tracking data structure similar to the temperature tracking data structure 64, described above in reference to
While the invention has been described in its preferred forms or embodiments with some degree of particularity, it is understood that this description has been given only by way of example, and that many variations can be achieved without departing from the spirit and scope of the invention, as defined within the appended claims.