Heterogeneous High-Performance Computer Engines
What I find amazing, personally, is that it appears that performance on the order of 1012 floating-point operations per second—a teraflops or a teraops, depending on your culture—is really achievable by 1995 with known technology extrapolation. I can remember when we were first putting together the High Performance Computing Initiative back in the mid-1980s and asking ourselves what a good goal would be. We said that we would paste a really tough one up on the wall and go for a teraflops. Maybe we should have gone for a petaflops. The only way to achieve that goal is by parallel processing. Even today, at the high end, parallel processing is ubiquitous.
There isn't an American-made high-end machine that is not parallel. The emergence of commercially available massively parallel systems based on commodity parts is a key factor in the compute-engine market—another change since 1983. Notice that it is the same commodity parts, roughly, that are driving the workstation evolution as are driving these massively parallel systems.
We are still unsure of the tradeoffs—and there was a lively debate about this at this meeting—between fewer and faster processors versus more and slower processors. Clearly the faster processors are more effective on a per-processor basis. On a system basis, the incentive is less clear. The payoff function is probably application dependent, and we are still searching for it. Fortunately, we have enough commercially available architectures to try out so that this issue is out of the realm of academic discussion and into the realm of practical experiment.
Related to that is an uncertain mapping of the various architectures available to us into the applications domain. A part of this meeting was
the discussion of those application domains and what the suitable architectures for them might be. Over the next few years I'm sure we'll get a lot more data on this subject.
It was also brought out that it's important that one develops balanced systems. You have to have appropriate balancing of processor power, memory size, bandwidth, and I/O rates to have a workable system. By and large, it appears that there was consensus in this conference on what that balance should be. So at least we have some fairly good guidelines.
There was some discussion at this conference of new or emerging technologies—gallium arsenide, Josephson junction, and optical—which may allow further speedups. Unfortunately, as was pointed out, gallium arsenide is struggling, Josephson junction is Japanese, and optical is too new to call.