With the emergence of high performance centralized computing (such as cloud computing, artificial intelligence, big data computing, etc.) the computational demands being placed on the underlying electronics cause the electronics to generate significant amounts of heat. As such, engineers are focused on improving the ways in which heat can be removed from the electronics.
The immersion bath chamber 103 is also fluidically coupled through a “primary” loop 109 to a coolant distribution unit (CDU) 104 that includes a pump 105 and heat exchanger 106. During continued operation of the electronic components, the liquid's temperature will rise as a consequence of the heat it receives from the operating electronics 101. The pump 105 draws the warmed liquid 102 from the immersion bath chamber 103 to the heat exchanger 106. The heat exchanger 106 transfers heat from the warmed fluid to another liquid within a secondary cooling loop 107 that is fluidically coupled to a cooling tower and/or chilling unit 108. The removal of the heat from the liquid 102 by the heat exchanger 106 reduces the temperature of the liquid which is then returned to the chamber 103 as cooled liquid.
In a high computing environment, such as a data center, the respective CDUs of multiple immersion bath chambers are coupled to the secondary loop 107, and, the cooling tower and/or chilling unit 108 removes the heat generated by the electronics within the multiple immersion bath chambers from the data center.
Unfortunately, within the chamber 103, the flow of cooled liquid 102 over the chip packages 113 of the higher performance semiconductor chips that generate the most heat (e.g., CPUs, GPUs, etc.) is insufficient to adequately cool these chips. To alleviate the problem, the chip packages 113 can have heat sinks 114 mounted thereon to improve the transfer of heat from the chip packages 113 to the cooling liquid 102.
Unfortunately, the chip packages 113 and their mounted heat sinks 114 are commonly located towards the middle of their respective circuit boards 101, and, the circuit boards 101 themselves are arranged with narrow spacings between them.
As a consequence, the open space around the packages 113 and heat sinks 114 through which the liquid 102 is supposed flow is narrow, which, in turn, presents a high fluidic impedance that causes the cooled liquid within the chamber 103 to flow substantially around the circuit boards 101 rather than through the narrow spaces between them. As such, there is reduced fluid flow through the heat sinks 114 resulting in reduced heat transfer from the chips to the liquid 102.
An improved approach, observed in
Here, as described in more detail below, the CDU 204 not only provides a first primary loop of cooled fluid 209_1 that cools the electronics 201 in the chamber 203 similar to the system described just above with respect to
The manifold 223 directs the second primary fluid 209_2 into fluidic channels 222 that are coupled to the respective cold plates 221 that are mounted to the high performance chip packages 213 within the immersion bath 202. The second primary fluid 209_2 removes heat from the high performance semiconductor chips as it flows through the cold plates 221 and then exits the cold plates 221 and flow into the chamber immersion bath 202. The warmed first and second primary fluids 209_1, 209_2 are then directed from the chamber 203 back to the CDU 204 through a warmed fluid line 224. The CDU 204 then generates the first and second primary cooled loop fluids 209_1, 209_2 from the received warmed fluid and the process repeats.
Importantly, the second primary fluid 209_2, as provided by the CDU 204, is colder than the first primary fluid 209_1 and/or has a higher flow rate (and/or higher volumetric flow rate) than the first primary fluid 209_1. As such, the high performance semiconductor chips have their heat removed by a fluid flow 209_2 that is colder and/or faster than the fluid flow 209_1 that would otherwise cool them in the system of
Here, the manifold 223, fluidic channels 222 and cold plates 221 overcome the high fluidic impedance associated with the narrow regions of the immersion bath 202 in the vicinity of the high performance semiconductor chip packages. Because colder and/or faster fluid flow corresponds to greater heat removal capability, the system of
As observed in
In the particular embodiment of
If pump 305_2 pumps the second primary loop 309_2 at a faster rate than pump 305_1 pumps the first primary loop 309_1, the liquid of the second primary loop 309_2 can be both colder and faster than the liquid of the first primary loop 309_1 if there exists a large enough temperature difference between the first and second secondary loops 307_1, 307_2.
Here, the faster a primary loop fluid is pumped through a heat exchanger, the less heat the heat exchanger will transfer to a secondary primary loop. Thus, increasing the pump speed for the second primary loop 309_2 will cause the second heat exchanger 306_2 to transfer less heat to the second secondary loop 307_2 than a slower pump speed would. Nevertheless, even if the second primary loop 309_2 is pumped to a faster flow rate than the first primary loop 309_1, more heat can still be removed from the second primary loop 309_2 than the first primary loop 309_1 if the second secondary loop 307_2 is significantly colder than the first secondary loop 307_1. In this case, the second primary loop 309_2 will be both colder and have a faster flow rate than the first primary loop fluid 309_1, which might be ideal for cooling the high performance semiconductor chips.
Other parameters that can be set to effect more heat transfer from the second primary loop heat exchanger 306_2 than from the first primary loop heat exchanger 306_1 is the relative volumetric flow rates of the first and second secondary loops 307_1, 307_2. Specifically, if the second secondary loop 307_2 flows at a faster volumetric flow rate than the first secondary loop 307_1, more heat will be absorbed by the second secondary loop 307_2 as it flows through the second heat exchanger 306_2. As a consequence, the second heat exchanger 306_2 will remove more heat from the second primary loop 309_2 than the first heat exchanger 306_1 will remove from the first primary loop 309_1.
The CDU embodiment of
The CDU embodiments of
The above described CDU embodiments create first and second primary loops 309_1, 309_2 that are isolated from one another after the split in the warmed fluid return feed 324. By contrast, the CDU embodiment of
Comparing the approach of
The approach of
The CDU embodiment of
The CDU embodiments of
As observed in
Conceivably, the hook-ups of the conduit lines 422 between the manifold 423 and the cold plates 421 for the second primary loop 409_2 can be more easily made with the embodiment of
As observed in
A cooling system having first and second primary loops as described above can also be used to realize better energy consumption efficiencies. For example, if the electronics in the immersion system are being underutilized, one of the primary loops can be stopped (its pump is shut down). For example, the first primary loop that provides cooled fluid to the immersion bath within the chamber can be shut down and only the second primary loop that feeds the cold plates of the high performance semiconductor chips remains operative.
Here, because the liquid volume of the second primary loop can be much less than the liquid volume of the first primary loop, the second primary loop's pump(s) can be less powerful and therefore consume less energy than the first primary loop's pump(s). Thus, shutdown of the more powerful first primary loop pump(s) results in considerable energy consumption savings.
Likewise, both primary loops are placed in operation when the electronics are being more heavily utilized. Nevertheless, the pump rates of the first and/or second primary loop pumps can be adjusted in view of the overall electronics' workload (first primary loop pump speed is adjusted) and/or the workload placed on the high performance semiconductor chips (second primary loop speed is adjusted).
Networked based computer services, such as those provided by cloud services and/or large enterprise data centers, commonly execute application software programs for remote clients. Here, the application software programs typically execute a specific (e.g., “business”) end-function (e.g., customer servicing, purchasing, supply-chain management, email, etc.). Remote clients invoke/use these applications through temporary network sessions/connections that are established by the data center between the clients and the applications. A recent trend is to strip down the functionality of at least some of the applications into more finer grained, atomic functions (“micro-services”) that are called by client programs as needed. Micro-services typically strive to charge the client/customers based on their actual usage (function call invocations) of a micro-service application.
In order to support the network sessions and/or the applications' functionality, however, certain underlying computationally intensive and/or trafficking intensive functions (“infrastructure” functions) are performed.
Examples of infrastructure functions include routing layer functions (e.g., IP routing), transport layer protocol functions (e.g., TCP), encryption/decryption for secure network connections, compression/decompression for smaller footprint data storage and/or network communications, virtual networking between clients and applications and/or between applications, packet processing, ingress/egress queuing of the networking traffic between clients and applications and/or between applications, ingress/egress queueing of the command/response traffic between the applications and mass storage devices, error checking (including checksum calculations to ensure data integrity), distributed computing remote memory access functions, etc.
Traditionally, these infrastructure functions have been performed by the CPU units “beneath” their end-function applications. However, the intensity of the infrastructure functions has begun to affect the ability of the CPUs to perform their end-function applications in a timely manner relative to the expectations of the clients, and/or, perform their end-functions in a power efficient manner relative to the expectations of data center operators.
As such, as observed in
As observed in
Notably, each pool 601, 602, 603 has an IPU 607_1, 607_2, 607_3 on its front end or network side. Here, each IPU 607 performs pre-configured infrastructure functions on the inbound (request) packets it receives from the network 604 before delivering the requests to its respective pool's end function (e.g., executing application software in the case of the CPU pool 601, memory in the case of memory pool 602 and storage in the case of mass storage pool 603).
As the end functions send certain communications into the network 604, the IPU 607 performs pre-configured infrastructure functions on the outbound communications before transmitting them into the network 604. The communication 612 between the IPU 607_1 and the CPUs in the CPU pool 601 can transpire through a network (e.g., a multi-nodal hop Ethernet network) and/or more direct channels (e.g., point-to-point links) such as Compute Express Link (CXL), Advanced Extensible Interface (AXI), Open Coherent Accelerator Processor Interface (OpenCAPI), Gen-Z, etc.
Depending on implementation, one or more CPU pools 601, memory pools 602, mass storage pools 603 and network 604 can exist within a single chassis, e.g., as a traditional rack mounted computing system (e.g., server computer). In a disaggregated computing system implementation, one or more CPU pools 601, memory pools 602, and mass storage pools 603 are separate rack mountable units (e.g., rack mountable CPU units, rack mountable memory units (M), rack mountable mass storage units (S)).
In various embodiments, the software platform on which the applications 605 are executed include a virtual machine monitor (VMM), or hypervisor, that instantiates multiple virtual machines (VMs). Operating system (OS) instances respectively execute on the VMs and the applications execute on the OS instances. Alternatively or combined, container engines (e.g., Kubernetes container engines) respectively execute on the OS instances. The container engines provide virtualized OS instances and containers respectively execute on the virtualized OS instances. The containers provide isolated execution environment for a suite of applications which can include, applications for micro-services.
Notably, the respective electronic boards/components of the data center components described above can be cooled according to the teachings described above with respect to
Embodiments of the invention may include various processes as set forth above. The processes may be embodied in program code (e.g., machine-executable instructions). The program code, when processed, causes a general-purpose or special-purpose processor to perform the program code's processes. Alternatively, these processes may be performed by specific/custom hardware components that contain hard wired interconnected logic circuitry (e.g., application specific integrated circuit (ASIC) logic circuitry) or programmable logic circuitry (e.g., field programmable gate array (FPGA) logic circuitry, programmable logic device (PLD) logic circuitry) for performing the processes, or by any combination of program code and logic circuitry.
Elements of the present invention may also be provided as a machine-readable storage medium for storing the program code. The machine-readable medium can include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, FLASH memory, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards or other type of media/machine-readable medium suitable for storing electronic instructions.
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2023/123185 | Oct 2023 | WO | international |