22 February 2008

New Institute Plans Exascale Computing

by Kate Melville

The new Institute for Advanced Architectures, launched jointly at Sandia and Oak Ridge national laboratories, is planning to build exascale computers that will perform a million trillion calculations per second.

The idea behind the new institute, according to Sandia project leader Sudip Dosanjh, is "to close critical gaps between theoretical peak performance and actual performance on current supercomputers. We believe this can be done by developing novel and innovative computer architectures." The institute is funded by congressional mandate and is supported by the National Nuclear Security Administration and the Department of Energy's Office of Science.

One key aim of the design process, Dosanjh says, is to reduce or eliminate the growing mismatch between data movement and processing speeds. Data movement refers to the act of getting data from a computer's memory to its processing chip and then back again. The larger the machine, the farther away from a processor the data may be stored and the slower the movement of data.

"In an exascale computer, data might be tens of thousands of processors away from the processor that wants it," says Sandia computer architect Doug Doerfler. "But until that processor gets its data, it has nothing useful to do. One key to scalability is to make sure all processors have something to work on at all times."

Compounding the problem is new technology that has enabled designers to split a processor into first two, then four, and now eight cores on a single die. Some special-purpose processors have 24 or more cores on a die. Dosanjh suggests there might eventually be hundreds operating in parallel on a single chip. "In order to continue to make progress in running scientific applications at these [very large] scales," says Jeff Nichols, who heads the Oak Ridge branch of the institute, "we need to address our ability to maintain the balance between the hardware and the software. There are huge software and programming challenges and our goal is to do the critical R&D to close some of the gaps."

Operating in parallel means that each core can work its part of the puzzle simultaneously with other cores on a chip, greatly increasing the speed a processor operates on data. The method does not require faster clock speeds, measured in faster gigahertz, which would generate unmanageable amounts of heat to dissipate as well as current leakage. The new method bolsters the continued relevance of Moore's Law, the 1965 observation of Intel cofounder Gordon Moore that the number of transistors placed on a single computer chip will double approximately every two years.

Another problem for the institute is to reduce the amount of power needed to run a future exascale computer. "The electrical power needed with today's technologies would be many tens of megawatts - a significant fraction of a power plant. A megawatt can cost as much as a million dollars a year," says Dosanjh. "We want to bring that down."

Moore's Law No More?

Source: Sandia Corporation