previous sub-section
Lessons Learned
next sub-section

Interconnect Performance, System Versatility, and Delivered Performance

Related to the number of users that can use a system at the same time is the number of different kinds of problems that a system can solve. The more flexible a system is in terms of programming paradigms that it supports, the more solutions per year can be delivered. As we learned while adapting a wide range of applications to the early Butterfly systems, it is much more productive to program using a paradigm that is natural to the problem at hand than to attempt to force-fit the code to a machine-dependent paradigm. Specialized architectures do have a place running certain applications in which the specialized system's


462

architecture provides very high performance and the same code will be run a large number of times. However, many problems are not well suited to these systems.

Current mainstream systems provide a very flexible programming environment in which to develop algorithms. Based on shared-memory architectures, these systems have demonstrated their applicability in a wide range of applications, from scientific problems to commercial applications. BBN's experience with our parallel systems indicates that shared-memory architectures are the best way to provide a multiparadigm environment comparable to mainstream systems. For example, the TC2000 uses the Butterfly® switch (BBN 1990) to provide a large, globally addressable memory space that is shared by the processors yet is physically distributed: "distributed shared memory." This provides the convenience of the shared-memory model for those applications to which it is suited while providing the scalable bandwidth of distributed memory. The TC2000's Butterfly switch also makes it an ideal system for programming with the message-passing paradigm, providing low message-transfer latencies.

Another key difference between mainstream systems and primitive massively parallel computers is the system's delivered performance on applications that tend to randomly access large amounts of memory. According to John Gustafson of NASA/Ames Laboratory, "Supercomputers will be rated by dollars per megabyte per second more than dollars per megaflop . . . by savvy buyers." A system's ability to randomly access memory is called its random-access bandwidth, or RAB. High RAB is needed for such applications as data classification and retrieval, real-time programming, sparse matrix algorithms, adaptive grid problems, and combinational optimization (Celmaster 1990).

High-RAB systems can deliver high performance on a wider range of problems than can systems with low RAB and can provide the programmer with more options for developing algorithms. This is one of the strengths of the traditional supercomputers and mainframes and is a key reason why most massively parallel systems do not run certain parallel applications as well as one would expect. The TC2000 is capable of RAB that is comparable to, and indeed higher than, that of mainstream systems. Table 1 compares the TC2000 with several other systems on the portable random-access benchmark (Celmaster undated). For even higher RAB, the Monarch project (Rettberg et al. 1990) at BBN explored advanced switching techniques and designed a very-large-scale MIMD computer with the potential for several orders of magnitude more RAB than modern supercomputers. Lastly, BBN's experience using early


463
 

Table 1. Comparison of Random-Access Bandwidth



System


Number of Processors

RAB (kilo random-access words per second)

TC2000

1

258

 

40

9,447

 

128

23,587

i860 Touchstone

1

~2.5

 

128

~300

IRIS 4D/240

1

349

 

4

779

CRAY Y-MP/832

1

28,700

Butterfly systems in real-time simulation and communications applications indicated that special capabilities were required for these areas. The real-time model places very demanding constraints on system latencies and performance and requires software and hardware beyond what is provided by typical non-real-time systems. These capabilities include a low-overhead, real-time executive, low-latency access to shared memory, hardware support such as timers and globally synchronized, real-time clocks, and support for the Ada programming language.


previous sub-section
Lessons Learned
next sub-section