US has lagged behind China for years. Can HPE's new architecture help the DoE catch up?

With China threatening to build the world's first exascale supercomputer before the US, the US Department of Energy has awarded a research grant to Hewlett Packard Enterprise to develop an exascale supercomputer reference design based on technology gleaned from the The Machine, a project that aims to "reinvent the fundamental architecture of computing."

The DoE historically operated most of the world's top supercomputers, but in recent years China has taken over in dramatic fashion. China's top supercomputer, Sunway TaihuLight, currently has five times the peak performance (93 petaflops) of Oak Ridge's Titan (18 petaflops). The US has gesticulated grandiosely about retaking the supercomputing crown with an exascale (1,000 petaflops, 1 exaflops) supercomputer that would be operational by 2021ish, but China is seemingly forging ahead at a much faster clip: in January, China's national supercomputer centre said it would have a prototype exascale computer built by the end of 2017 and operational by 2020.
To create an effective exascale supercomputer from scratch, you must first invent the universe solve three problems: the inordinate power usage (gigawatts) and cooling requirements; developing the architecture and interconnects to efficiently weave together hundreds of thousands of processors and memory chips; and devising an operating system and client software that actually scales to one quintillion calculations per second.

You can still physically build an exascale supercomputer without solving all three problems—just strap together a bunch of CPUs until you hit the magic number—but it won't perform a billion-billion calculations per second, or it'll be untenably expensive to operate. That seems to be China's approach: plunk down most of the hardware in 2017, and then spend the next few years trying to make it work.

The DoE, on the other hand, is wending its way down a more sedate path by funding HPE (and supercomputer makers) to develop an exascale reference design. The funding is coming from a DoE programme called PathForward, which is part of its larger Exascale Computing Project (ECP). The ECP, which was set up under the Obama administration , has already awarded tens of millions of dollars to various exascale research efforts around the US. It isn't clear how much funding has been received by HPE.

Exascale computing in the USA

So, what's HPE's plan? And is there any hope that HPE can pass through three rounds of the DoE funding programme and build an exascale supercomputer before China?

HPE is proposing to build a supercomputer based on an architecture it calls Memory-Driven Computing, which is derived from parts of The Machine. Basically, HPE has developed a number of technologies that allow for a massive amount of addressable memory—apparently up to 4,096 yottabytes, or roughly the same number of atoms in the universe—to be pooled together by a high-speed, low-power optical interconnect that's driven by a new silicon photonics chip. For now this memory is volatile, but eventually—if HP ever commercialises its memristor tech or embraces Intel's 3D XPoint—it'll be persistent.

In addition, and perhaps most importantly, HPE says it has developed software tools that can actually use this huge pool of memory, to derive intelligence or scientific insight from huge data sets—every post on Facebook; the entirety of the Web; the health data of every human on Earth; that kind of thing. Check out this quote from CTO Mark Potter, who apparently thinks HPE's tech can save humankind: “We believe Memory-Driven Computing is the solution to move the technology industry forward in a way that can enable advancements across all aspects of society. The architecture we have unveiled can be applied to every computing category—from intelligent edge devices to supercomputers."

In practice I think we're some way from realising Potter's dream, but HPE's tech is certainly a good first step towards exascale. If we compare HPE's efforts to the three main issues I outlined above, you'd probably award a score of about 1.5: they've made inroads on software, power consumption, and scaling, but there's a long way to go, especially when it comes to computational grunt.

After the US government banned the export of Intel, Nvidia, and AMD chips to China, China's national chip design centre created a 256-core RISC chip specifically for supercomputing. All that HPE can offer is the Gen-Z protocol for chip-to-chip communications, and hope that a logic chip maker steps forward. Still, this is just the first stage of funding, where HPE is expected to research and develop core technologies that will help the US reach exascale; only if it gets to phase two and three will HPE have to design and then build an exascale machine.

Assuming all of the other parts fall into place, Intel's latest 72-core/288-thread Xeon Phi might just be enough for the US to get there before China—but with an RRP of $6,400, and roughly 300,000 chips required to hit 1 exaflops, it won't be cheap.
Most of the DoE's exascale funding has so far been on software. Just before this story published, we learnt that the DoE is also announcing funding for AMD, Cray, IBM, Intel, and Nvidia under the same PathForward programme. In total, the DoE is handing out $258 million over three years, with the funding recipients also committing to spend at least $172 million of their own funds over the same period. What we don't yet know is what those companies are doing with that funding; hopefully we'll find out more soon.

Now read about how cheap RAM changes computing...