Is the Supercomputer Industry Hastening Its Own Demise?
The supercomputer industry and its patrons appear to be doing many things that hasten an early demise. Fundamentally, the market for supercomputers is only a billion dollars, and the R&D going into supers is also on the order of a billion. This simply means too many companies are attempting to build too many noncompatible machines for too small a market. Much of the R&D is redundant, and other parts are misdirected.
The basic architecture of the "true" supercomputer was clearly defined as a nonscalable, vector multiprocessor. Unfortunately, the larger it is made to get the highest peak or advertising speed, the less cost effective it becomes for real workloads. The tradeoff inherent in making a high-performance computer that is judged on the number of GFLOPS it can calculate, based on such a design, seems to be counterproductive. The supercomputer has several inconsistencies (paradoxes) in its design and use:
1. In providing the highest number of MFLOPS by using multiprocessors with multiple pipe vector units to support one to 1.5 times the number of memory accesses as the peak arithmetic speed, memory latency is increased. However, to have a well-balanced, general-purpose supercomputer that executes scalar code well, the memory latency needs to be low.
2. In building machines with the greatest peak MFLOPS (i.e., the advertising speed), many processors are required, raising the computer's cost and lowering per-processor performance. However, supercomputers are rarely used in a parallel mode with all processors; thus, supers are being built at an inherent dis-economy of scale to increase the advertising speed.
3. Having many processors entails mastering parallelism beyond that obtainable through automatic parallelization/vectorization. However, supercomputer suppliers aren't changing their designs to enable scaleability or to use massive parallelism.
4. In providing more than worst-case design of three pipelines to memory, or 1.5 times as many mega-accesses per second as the machine has MFLOPS, the cost effectiveness of the design is reduced at least 50 per cent. However, to get high computation rates, block algorithms are used that ensure memory is not accessed. The average amount of computation a super delivers over a month is only five to 10 per cent of the peak, indicating the memory switch is idle most of the time.
In addition to these paradoxes, true supers are limited in the following ways:
1. Not enough is being done to train users or to make the super substantially easier to use. Network access needs to be much faster and more transparent. The X-Terminal server interface can potentially show the super to have a Macintosh-like interface. No companies provide this at present.
2. The true supercomputer design formula seems flawed. The lack of caches, paging, and scaleability make it doomed to chase the clock. For example, paradox 4 above indicates that a super could probably deliver two to four times more power by doubling the number of processors but without increasing the memory bandwidth or the cost.
3. Cray Research describes a massively parallel attached computer. Cray is already quite busy as it attempts to enter into the minisupercomputer market. Teaming with a startup such as Thinking Machines Corporation (which has received substantial government support) or MasPar for a massively parallel facility would provide a significantly higher return on limited brain power.
4. The U.S. has enough massively parallel companies and efforts. These have to be supported in the market and through use before they perish. Because these computers are inherently specialized (note the figure), support via continued free gifts to labs and universities is not realistic in terms of establishing a real marketplace.