Future Supercomputing Elements
Bob Ewald
Robert H. Ewald is Executive Vice President for Development at Cray Research, Inc., in Chippewa Falls, Wisconsin, having joined the company in 1984. From 1977 to 1984, he worked at Los Alamos National Laboratory, serving during the last five years of his tenure as Division Leader in the Computing and Communications Division.
This discussion will focus on the high-performance computing environment of the future and will describe the software challenges we see at Cray Research, Inc., as we prepare our products for these future environments.
Figure 1 depicts the environment that Cray Research envisions for the future and is actively addressing in our strategic and product plans. This environment consists of machines that have traditional, or general-purpose, features, as well as special architectural features that provide accelerated performance for specific applications. It's currently unclear how these elements must be connected to achieve optimum performance—whether by some processor, memory, or network interconnect. What is clear, however, is that the successful supercomputer architecture of the future will be an optimal blend of each of these computing elements. Hiding the architectural implementation and delivering peak performance, given such a heterogeneous architecture, will certainly be a challenge for software. We currently have a number of architectural study teams looking at the best way to accomplish this.
Another key consideration in software development is what the future workstation architecture will look like. The view that we have today is something like that depicted in Figure 2. These will be very fast scalar machines with multiple superscalar and specialized processors combined to deliver enhanced real-time, three-dimensional graphics. Not only will tomorrow's software be required to optimize application performance on a heterogeneous-element supercomputer, as described earlier, but it will also be required to provide distributed functionality and integration with these workstations of tomorrow.
Figure 3 is Cray Research's view of what our customer networks look like or will look like. It is a heterogeneous network with systems and network components from a broad array of vendors. The four key elements of the network are workstations, networks of networks, compute servers (either general purpose or dedicated), and file servers. The networks are of varying speed, depending on the criticality and bandwidth of the resources that are attached. The key point of this scenario is that every resource is "available" to any other resource. Security restrictions may apply and the network may be segmented for specialized purposes, but basically the direction of networking technology is toward more open architectures and network-managed resources.
Figure 4 portrays the conceptual model that we are employing in our product development strategy. The idea of the client-server model is that the workstation is the primary user interface, and it transparently draws upon other resources in the network to provide specialized services. These resources are assigned to optimize the delivery of a particular service to the user. Cray Research's primary interest is in providing the highest performance compute and file servers. This hardware must be complemented with the software to make these systems accessible to the user in a transparent manner.
Currently, Cray Research provides a rich production environment that incorporates the client-server model. Our UNICOS software is based on AT&T UNIX System V with Berkeley Standard Distribution extensions and is designed for POSIX compliance. This enables application portability and a common cross-system application development environment. Through the use of X Windows and the network file system, CRAY-2s, X-MPs, and Y-MPs have connected to a variety of workstations from Sun Microsystems, Inc., Silicon Graphics IRIS, IBM, Digital Equipment Corporation, Apollo (Hewlett-Packard), and other vendors. Coupled with our high-speed interconnects, distributing an application across multiple Cray systems is now a practical possibility. In fact, during the summer of 1990, we achieved 3.3 × 109 floating-point operations per second (GFLOPS) on a matrix multiply that was distributed between a CRAY-2 and a Y-MP. Later that summer, a customer prospect was able to distribute a three-dimensional elastic FEM code between three Cray systems at the Cray Research Computing Center in Minnesota and achieved 1.7 GFLOPS sustained performance. Aside from the performance, what was incredible about this is that this scientist was running a real application, had only three hours of time to make the coding changes, and did this all without leaving his desk in Tokyo. The technology is here today to do this kind of work as a matter of course. I think
you'll see significant progress in the next year or two in demonstrating sustained high performance on real-world distributed applications.
These technologies present a number of software challenges to overcome. First, there is a growing need to improve automatic recognition capabilities. We must continue to make progress to improve our ability to recognize and optimize scalar, vector, and parallel constructs. Data-element recognition must utilize long vectors or massively parallel architectures as appropriate. Algorithm recognition must be able to examine software constructs and identify the optimal hardware element to achieve maximum performance. As an example, an automatic recognition of fast Fourier transforms might invoke a special processor optimized for that function, similar to our current arithmetic functional units.
Second, because this environment is highly distributed, we must develop tools that automatically partition codes between heterogeneous systems in the network. Of course, a heterogeneous environment also implies a mix of standard and proprietary software that must be taken
into account. Security concerns will be a big challenge, as well, and must extend from the supercomputer to the fileserver and out to the workstation.
Not only must we make it easy to distribute applications, but we must also optimize them to take advantage of the strengths of the various network elements. The size of the optimization problem is potentially immense because of different performance characteristics of the workstation, network, compute, and fileserver elements. The problem is exacerbated by discontinuities in such performance characteristics as a slow network gateway or by such functional discontinuities as an unavailable network element. Ultimately, we will have to develop expert systems to help distribute applications and run them.
Finally, to operate in the computing environment of the future, the most critical software components will be the compilers and the languages that we use. If we cannot express parallelism through the languages we use, we will have limited success in simulation because the very things we are trying to model are parallel in nature. What we have been doing for the last 30 years is serializing the parallel world because we lack the tools to represent it in parallel form. We need to develop a new, non-von Neumann way of thinking so that we do not go through this parallel-to-serial-back-to-parallel computational gyration. A language based on physics or some higher-level abstraction is needed.
In terms of existing languages, Fortran continues to be the most important language for supercomputer users, and we expect that to continue. Unfortunately, its current evolution may have some problems. Because of the directions of the various standards organizations, we may see three different Fortran standards emerge. The American National Standards Institute (ANSI), itself, will probably have two standards: the current Fortran 77 standard and the new Fortran 9X standard. The International Standardization Organization may become so frustrated at the ANSI developments that they will develop yet another forward-looking Fortran standard. So what was once a standard language will become an unstandard set of standards. Nevertheless, we still envision that Fortran will be the most heavily used language for science and engineering through the end of this decade.
The C language is gaining importance and credibility in the scientific and engineering community. The widespread adoption of UNIX is partly responsible for this. We are seeing many new application areas utilizing C, and we expect this to continue. We also can expect to see additions to C for parallel processing and numeric processing. In fact,
Cray Research is quite active with the ANSI numerical-extensions-to-C group that is looking at improving its numeric processing capabilities.
Ada is an important language for a segment of our customer base. It will continue to be required in high-performance computing, in part because of Department of Defense mandates placed on some software developed for the U.S. government.
Lisp and Prolog are considered important because of their association with expert systems and artificial intelligence. In order to achieve the distributed-network optimization that was previously discussed, expert systems might be employed on the workstation acting as a resource controller on behalf of the user. We need to determine how to integrate symbolic and numeric computing to achieve optimal network resource performance with minimal cost. A few years ago we thought that most expert systems would be written in Lisp. We are seeing a trend, however, that suggests that C might become the dominant implementation language for expert systems.
In summary, today's languages will continue to have an important role in scientific and engineering computing. There is also, however, a need for a higher-level abstraction that enables us to express the parallelism found in nature in a more "natural" way. Nevertheless, because of the huge investment in existing codes, we must develop more effective techniques to prolong the useful life of these applications on tomorrow's architectures.
On the systems side, we need operating systems that are scalable and interoperable. The implementation of the system may change "under the hood" and, indeed, must change to take advantage of the new hybrid architectures. What must not change, however, is the user's awareness of the network architecture. The user interface must be consistent, transparent, and graphically oriented, with the necessary tools to automatically optimize the application to take advantage of the network resources. This is the high-performance computing environment of the future.