Densification in data centers is becoming so extreme that the power density of the systems in the center is growing at a rate unmatched by technological developments in data center heating, ventilation, and air-conditioning (HVAC) designs. Current servers and disk storage systems, for example, generate thousands of watts per square meter of footprint. Telecommunication equipment generates two to three times the heat of the servers and disk storage systems.
Computer designers are continuing to invent methods that extend the air-cooling limits of individual racks of computers (or other electronic heat-generating devices) that are air-cooled. High heat capacity racks, however, require extraordinary amounts of air to remove the heat dissipated by the racks and use expensive and large air handling equipment.
Some electrical devices, such as liquid-cooled mainframe computers, do use liquid cooling. In some situations, liquid cooling provides significant improvements over air-cooled systems. For instance, liquid cooling can more effectively remove large amounts of heat from data centers or even single servers.
Prior liquid cooling systems, however, are not fault tolerant. In other words, a server or data center can unexpectedly shutdown if a failure occurs with the cooling system. For example, if the cooling line in a building fails, then a single server or an entire data center will not be sufficiently cooled. As such, the server or entire data center can overheat and shutdown. As another example, if a non-fault tolerant cooling system needs serviced, then all servers or data centers on this cooling system would be temporarily shutdown while the system is repaired.
The data center 104 is situated in a building or room that has a floor 110 above a floor slab 112. A network of pipes 114 extends between the floor 110 and floor slab 112. The pipes carry a cooling fluid to and from the liquid cooling unit 103 and the electronic device 102. By way of example, the cooling fluids include, but are not limited to, water, refrigerant, single phase fluids, two phase fluids, etc. Further, although the pipes are shown within the floor, they can be located in various places, such as, but not limited to, the ceiling, the walls, on top of the floor, underground, etc.
As shown, fluid initially enters the liquid cooling unit 103 along one or more supply lines 116 and exits the liquid cooling unit along one or more return lines 118. Specifically, the fluid passes into a heat exchanger 120, such as a liquid-to-liquid heat exchanger. This heat exchanger 120 is connected to a liquid cooling loop 122 that extends between the liquid cooling unit 103 and the electronic device 102. A pump 123 pumps the fluid along one or more supply lines 124 from the heat exchanger 120 to heat generating components or electronics 128. After cooling the electronics 128, the fluid is pumped along one or more return lines 130 back to the heat exchanger 120.
The electronics 128 generate heat that is removed by the fluid and transferred away through the return line 130. In turn, the heat exchanger 120 removes, dissipates, and/or exchanges this heat so cooled fluid pumped along the supply line 124 can remove heat from the electronics 128. Embodiments in accordance with the present invention are not limited to any particular type of heat exchanger 128. Various types of heat exchangers, now known or developed in the future, are applicable with embodiments of the invention. By way of example, the heat exchanger 128 can use one or more of thermal dissipation devices, heat pipes, heat spreaders, refrigerants, heat sinks, liquid cold plates or thermal-stiffener plates, evaporators, refrigerators, thermal pads, air flows, and/or other devices adapted to remove or dissipate heat.
In one exemplary embodiment, the cooling system 100 is fault tolerant since two or more independent or alternative fluid paths are provided to cool the electronic device 102. For example, if one or more of the supply lines 116/124 or return lines 118/130 breaks, fails, needs serviced, or otherwise shuts-down, then the electronic device 102 (example, servers or racks in data center 104) will not immediately or contemporaneously overheat and shutdown. In the event of such a failure or servicing in the cooling system 100, the liquid cooling loop 122 extending between the liquid cooling unit 103 and the electronic device 102 continues to cool the electronics 128 using an alternative or redundant supply and return lines.
One exemplary embodiment uses a combination of a liquid transfer switch (LTS) and one or more redundant supply lines and return lines to provide fault tolerance for cooling system 100. For simplicity of illustration,
In one embodiment, the liquid transfer switch 150 includes one or more sensors 152 and valves 154. The liquid transfer switch provides an automated mechanism to manage fluid flow to and from the electronic device. The sensor 152 senses one or more conditions in the system in order to actuate (example, open or close) the one or more valves 154.
In one embodiment, one or more components in the heat exchanger 103 are controlled with an algorithm. For instance, information from the sensor 152 is used to open and control the valves 154 to regulate fluid flow to and from the electronic device 102.
One embodiment consists of using a mechanical valve that is controlled by the algorithm. When a failure occurs, the algorithm activates one or more valves to maintain continuous fluid supply to the electronic device. For example, if one of the supply or return lines fail, then the alternate or redundant supply or return line is utilized so cooling to the electronic device is not disrupted. The electronic device thus continuously operates and/or remains online while a fluid flow path to the electronic device is altered or adjusted. In one embodiment, the alternate or redundant supply or return line is opened to provide fluid. Alternatively, if the alternate or redundant supply or return line were already open, then fluid flow can be increased if necessary to compensate for loss flow through the failed line.
By way of example, failure includes, but is not limited to, loss or disruption of electrical power in the cooling system, loss of pressure in a liquid line, pump failure, chiller failure, loss in temperature control or any other failure that can shut down or put the supply fluid out of tolerance.
In one embodiment, the valve 154 can be open, semi-open or closed, and the liquid transfer switch monitors or senses the fluid in the multiple supply and return lines. By way of example, sensing is performed using one or more of fluid flow or flow rate, fluid pressure, fluid temperature, etc.
In
In one exemplary embodiment, the liquid cooling unit 206 is modular and a self-contained unit. For example, the liquid cooling unit is removable, serviceable, and replaceable into and out of the server 202.
As noted, the liquid cooling unit and/or liquid transfer switch can be modular. As such, the rack 302 can continue to operate while the liquid cooling unit 310 or liquid transfer switch 306 is serviced, replaced, or otherwise repaired. For example, if the primary pump 320 or heat exchanger 324 is temporary shutdown or otherwise fails, the liquid transfer switch 306 senses the failure and automatically actuates the second pump to maintain uninterrupted fluid flow to the rack 302.
A modular liquid cooling unit 340 includes a heat exchanger 342 and a primary pump 344. As shown, the liquid supply and return lines 346 connect to both the rack 302 and liquid cooling unit 340. A liquid cooling loop 345 includes a supply line 347 and a return line 348 that circulate fluid to a second rack 360.
The rack 360 includes a liquid transfer switch 370, liquid cooled electronics 372, and a backup pump 374. Plural valves 380 and couplings 382 are used to actuate fluid flow through the secondary pump 374.
In one exemplary embodiment, the primary pump 344 pumps fluid to cool the electronics 372 during normal operations. When the LTS 370 detects a failure, valves 380 are opened and backup pump 374 is activated. Coolant or fluid continues to circulate in rack 360 to cool electronics 372.
The coolant converters receive source or building coolant (such as water, refrigerant, air, compressed air, coolanol or any other generally accepted coolant known in the art) and convert it to the desired coolant (such as water, refrigerant, air, compressed air, coolanol or any other generally accepted coolant know in the art) for internal use to the computer or computer system. For example, each coolant converter forms part of a separate and independent cooling loop. These coolant converters 410A and 410B perform several functions. First, they isolate internal electrical parts or components of the server from unconditioned building coolant. Second, they allow the server manufacturer to select an optimum cooling media internal to their equipment while using building coolant or other coolant supplies. Third, they control internal coolant temperatures, flow rates, and quality of the fluid that touches or cools the internal electrical parts or components of the server. Fourth, in conjunction with one or more liquid transfer switches, they provide redundant cooling to the server 400. For example, the liquid transfer switch can switch cooling load from one coolant converter to another and/or adjust the amount of cooling load at each coolant converter in response to a failure.
The coolant converters 410A and 410B can utilize a redundant power supply, such as a dual grid power system coupled to the server 400. As shown, the server uses two independent power supplies, a bulk power supply A (430A) and bulk power supply B (430B). Specifically, power supply 430A couples to coolant converter 410A, and power supply 430B couples to coolant converter 410B. Each power supply has an independent power source, shown as alternating current AC source A for power supply 430A and AC source B for power supply 430B. The electronics and pumps of coolant converter 410A are powered from power supply 430A, while the electronics and pumps of coolant converter 410B are powered from power supply 430B.
Thus, the server 400 has both redundant power supplies and redundant cooling systems, and redundant sensing of fluid conditions using redundant liquid transfer switches. In one exemplary embodiment, the coolant converters 410A and 410B are identical. In another exemplary embodiment, the coolant converters are different (example, one is a primary coolant converter and one is a backup coolant converter). Further, one coolant converter is a liquid converter and one coolant converter is or utilizes air cooling. Further, since coolants A and B are independent and input separately, these coolants can be the same (example, both water or both refrigerants) or different.
The liquid transfer switch can be located in various locations in accordance with exemplary embodiments. By way of example, the switch can be located in or near the rack, in or on the data center floor, in the heat exchanger, any location in the date center, or any location remote from the data center.
According to block 920, a question is asked whether a failure is detected. If the answer to this question is “no” then flow proceeds back to block 910. If the answer to this question is “yes” then flow proceeds to block 930. By way of example, a sensor senses one or more conditions to indicate a failure with respect to temperature, pressure, flow rate, etc.
According to block 930, the fluid condition is automatically adjusted to maintain cooling to the electronic device. When a failure occurs, opens or closes one or more valves to a redundant fluid line to maintain continuous fluid supply to the electronic device. For example, if one of the supply or return lines fail, then the alternate or redundant supply or return line is opened or adjusted (example, flow rate increased) so cooling to the electronic device is not disrupted.
Although embodiments in accordance with the present invention are generally directed to liquid cooling systems, such systems can also use or combine airflow for cooling. For example, active heatsinks include one or more fans to assist in cooling.
The liquid transfer switch is a unit that is modular and replaceable. In some embodiments, each unit or module is constructed with standardized units or dimensions for flexibility and replaceability for use in the electronic devices. As such, the units connected to or removed from the electronic devices (example, a server) without connecting, removing, or replacing other components in the electronic device (example, the heat-generating components, other liquid cooling units, other coolant converters, heat exchangers, etc.). As such, the unit can be serviced (example, replaced or repaired) without shutting down or turning off the respective electronic device (example, server housing the unit or converter).
As used herein, the term “module” means a unit, package, or functional assembly of electronic components for use with other electronic assemblies or electronic components. A module may be an independently-operable unit that is part of a total or larger electronic structure or device. Further, the module may be independently connectable and independently removable from the total or larger electronic structure (such as liquid cooling units or coolant converters being modules and connectable to servers in data centers).
In one exemplary embodiment, one or more blocks or steps discussed herein are automated. In other words, apparatus, systems, and methods occur automatically. As used herein, the terms “automated” or “automatically” (and like variations thereof) mean controlled operation of an apparatus, system, and/or process using computers and/or mechanical/electrical devices without the necessity of human intervention, observation, effort and/or decision.
The methods in accordance with exemplary embodiments of the present invention are provided as examples and should not be construed to limit other embodiments within the scope of the invention. For instance, blocks in diagrams or numbers (such as (1), (2), etc.) should not be construed as steps that must proceed in a particular order. Additional blocks/steps may be added, some blocks/steps removed, or the order of the blocks/steps altered and still be within the scope of the invention. Further, methods or steps discussed within different figures can be added to or exchanged with methods of steps in other figures. Further yet, specific numerical data values (such as specific quantities, numbers, categories, etc.) or other specific information should be interpreted as illustrative for discussing exemplary embodiments. Such specific information is not provided to limit the invention.
In the various embodiments in accordance with the present invention, embodiments are implemented as a method, system, and/or apparatus. As one example, exemplary embodiments and steps associated therewith are implemented as one or more computer software programs to implement the methods described herein (such as being implemented in a server, CDU, or liquid cooling unit). The software is implemented as one or more modules (also referred to as code subroutines, or “objects” in object-oriented programming). The location of the software will differ for the various alternative embodiments. The software programming code, for example, is accessed by a processor or processors of the computer or server from long-term storage media of some type, such as a CD-ROM drive or hard drive. The software programming code is embodied or stored on any of a variety of known media for use with a data processing system or in any memory device such as semiconductor, magnetic and optical devices, including a disk, hard drive, CD-ROM, ROM, etc. The code is distributed on such media, or is distributed to users from the memory or storage of one computer system over a network of some type to other computer systems for use by users of such other systems. Alternatively, the programming code is embodied in the memory and accessed by the processor using the bus. The techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be further discussed herein.
The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.