How is the Industry Going to Cool the 400W Chip?
Unlocking the Power of Cooling in Data Center Retrofits & New Builds
IT cooling needs are outpacing facility infrastructures. High-performance applications like AI, VR, real-time visual imaging, pattern recognition, and predictive analytics drive higher densities, while top CPU processor performance now allows 400W TDP. Air-cooling is reaching its limits and water cooling carries the risk of a meltdown. Scaling up cooling capability is essential.
Last month, 451 Research hosted a webinar to explore a new waterless liquid cooling technology capable of heat dissipation of 1,000W at the chip level and cabinet power densities in the tens of kW. Affiliated Engineers and Cupertino Electric joined manufacturer ZutaCore to discuss HyperCool2 direct-on-chip, closed-loop, phase-change liquid cooling. This approach catapults past the limitations of sensible heat transfer by leveraging the phenomenon of latent heat with a rapid-boiling and -evaporating dielectric refrigerant. The hardware system optimizes energy efficiency and performance, integrating the server load with the facility cooling system.
Data center capacity worldwide is projected to expand at a compound annual rate of roughly 7.5 percent. HyperCool2 is site- and climate-agnostic, able to optimize infrastructure performance and cost, whether in new builds or retrofits. Affiliated Engineers is currently providing independent verification and co-authoring a technical white paper with ZutaCore on HyperCool2 technology and we will post that paper here. Until then, this link accesses a recording of the webinar, and the following elaborations by Affiliated Engineers, Cupertino Electric, 451 Research, and ZutaCore respond to webinar participants’ questions.
About the technology
How is the ZutaCore technology different from direct-on-chip water cooling?
Mechanically, the HyperCool2 is similar in that its ‘heat sink’ is a sealed device that interfaces with the chip. However, unlike the phenomena of specific/sensible heat transfer used when circulating liquid, the HyperCool2 is leveraging a latent heat phenomenon, absorbing all the heat generated by the processor by the boiling point and evaporation of the dielectric refrigerant. ZutaCore currently uses 3M’s Novec 7000 fluid, which exhibits a low boiling point and relatively high vaporization energy. The process removes thermal energy from the chip and, by way of heat rejection, the gaseous coolant then flows to a condenser, becomes liquid by the rejection of the heat before being pumped back again at low pressure, typically around 1 Bar gauge. Lastly, and importantly, by using a dielectric refrigerant we eliminate the risk of IT meltdown by eliminating the use of water.
What makes fluid cooling so effective at capturing stranded power?
The solution allows the installation of added rack capacity making use of stranded power conditions in the data center. The flexibility of the solution provides for a seamless transition in existing data centers without disruption to current HVAC systems, lower the PUE, and reduction of equipment square footage.
Does this solution require water to support cooling?
No. The HyperCool2 system can work with either an air heat-rejection-unit (HRU) or water-based ones.
Discuss the pressure within the refrigerant lines connected to the chips.
Across the system, the pressure is self-regulated to be under 1 Bar gauge.
How do you deal with hybrid cooling - the DLC or ZutaCore cools the one chip, what about the rest of the electronics?
It is typical in electronic boards that the primary heat generating devices such as CPUs, GPUs and FPGAs would present 70% and more of the overall heat generation of that board. Those devices would enjoy the benefit of the ZutaCore HyperCool2 direct-on-chip cooling. The rest of the components, typically representing about 30%, or less, of the heat generation on that board, would be cooled by air, typically using onboard fans. However, those components typically can operate at higher onboard ambient temperature, translating into a significantly lower fan power needed, typically 10%, or less, of the available fan power. Furthermore, with the heat from the major heat contributor removed from the total heat needed to be removed, and with the trajectory primary devices increasing in power, that ratio would further increase, leaving very little for fans to take care of.
A single jacket covering the chip lends itself to a single point of failure. How do you deal with a fluid delivery failure?
Very much like in normal server operations, chips not receiving the necessary cooling will throttle their clock speed to accommodate the change in condition. Also, as a reminder, using dielectric refrigerant means the risk of IT meltdown does not exist should there be a water leak.
When it comes to tier 4 continuous and redundant cooling, how does the solution fit what is traditionally deployed in full redundancy to allow for cooling failures and concurrent maintenance and n-1 failures? For tier 4 do you need two liquid systems to feed A or B racks etc.? If the system is down, then would many racks be affected? Perhaps new deployments need A and B separate IT equipment and deploy more like a cloud-based strategy?
The HyperCool2 technology can be tailored for various levels of redundancies. In its base form, the few mechanically operating parts in the system, namely pumps and fans, are redundant. Otherwise, subassemblies of the system such as heat-rejection-units (HRUs), Refrigerant Distribution Units (RDUs) and mainlines, can be set in redundancy. And yes, on top, the equipment can be deployed in a ‘mesh’ topography, similar to a cloud-based strategy. I think a similar question exists with the server in a maintenance condition with the server. In the world today most servers are highly virtualized and can fail between systems to other servers.
Availability and deployment
When do you expect to have standard model production and pricing? A la carte ordering and incremental deployment?
System components are readily available for purchase. It is typical that for both greenfield and retrofits, data center solution engineering will be involved up front to ensure the most efficient system will be defined for specific project realities. Between ZutaCore, Affiliated Engineers Inc. (AEI), and Cupertino Electric Inc. (CEI), those services are readily available.
Can you give an example of your facility level deployment in brief?
Our current approach, dependent on facility type, would be an initial overview of the client objectives followed by the following:
1. Determine size of the system and logistical challenges
2. Site survey and 60% IFC drawings review
3. 100% IFC drawing review and permitting
4. Schedule development
5. Submittal review and approval
6. Development and review of live facility MOP
7. Equipment delivery and inspections
8. Installation commences per MOP
9. Installation complete
10. Commissioning complete
11. Go live with the system
What are some important considerations when designing ZutaCore into a project?
A few important considerations include overhead or under floor pipe routing, coordination in the rear of the cabinet for the refrigerant distribution unit (RDU), ensuring there’s appropriate cooling to provide heat rejection for the remaining heat not accommodated through the ENE, and maximizing the system efficiency through reuse of the facility cooling water through the water-cooled HRU.
How do you solve the issue of server warranty when installing the ENE?
With the benefit of being able to retrofit any server, IT system integrators are well suited to provide warrantied server retrofits to clients. On March 4th, 2019, UNIXPlus announced it is the first integrator to provide an extended warranty for servers retrofit with ZutaCore’s HyperCool2TM technology. There will be more to come in this space.
Does the ZutaCore liquid cooling application require specific servers? Or is it universal to any server?
No, the ZutaCore HyperCool2 does not require specific servers. It actually is designed for ease of integration into any.
How would this solution be deployed for a few high-density cabinets?
The ZutaCore HyperCool2 is built off the parallel deployment of the Enhanced Nucleation Evaporators, ENEs, the direct-on-chip evaporators. Each one is self-regulated and provides instantaneous, on-demand cooling, independent of each other, maintaining a constant temperature for each device. Collected via an in-rack refrigerant distribution unit, it allows for ultimate scalability within the rack and across them, connecting via mainlines as needed.
Benefits and validation
Elaborate on the CAPEX and OPEX benefits.
Essentially, delivering a constant, industry lowest Power Usage Effectiveness (PUE), the ZutaCore HyperCool2 provides for significant OPEX savings. At the same time, being a single, closed-loop system, the HyperCool2 allows for the elimination of standard cooling infrastructure and reduction of overall surface area. At the same time, computing is densified, delivering significantly higher value per square foot. The economic & ROI model has been designed by data center engineering experts at Affiliated Engineering Inc., AEI, allowing for running specific project scenarios, from the chip level through server, rack and data center levels.
Are you pursuing nationally recognized test lab certifications for your product models and systems?
Yes, we will be pursuing national testing lab certifications. This could include ETL, UL, FCC, CE or others.