Systems and Software
George A. Spix has been Director, Software Development, at Supercomputer Systems, Inc., since 1987. From 1983 to 1987, Mr. Spix was Chief Engineer, Software Development, MP Project, at Cray Research, Inc., responsible for the design and development of the user environment and operating system for the MP. From 1980 to 1983, he served in a variety of positions at Cray Research, at Los Alamos National Laboratory, and at Lawrence Livermore National Laboratory. Mr. Spix holds a B.S. in electrical engineering from Purdue University.
At Supercomputer Systems, Inc. (SSI), my main responsibility is software and systems. Our teams are responsible for the user environments; operating systems; peripherals and networks; design verification, diagnostics, testing, and quality assurance; and documentation, publications, and technical operations. The operating-systems activity, of course, has to support what we are doing in the compilation arena in terms of exposing the power of the machine to the user.
I have responsibility for peripherals and networks. This is a large area, especially because we change the balance and the resources in the machines. Every time we do that, we basically are asking the user to reprogram the application. Again, as the definition of the supercomputer changes in terms of those resources, the codes also change. Every time the codes have to be changed, that results in a loss of almost a year in terms of work hours and resources spent.
Design verification and quality assurance are currently my most difficult areas because software is typically the first victim of a hardware problem. As a result, 1990 is the year that we finished testing the machine before we built it. Also, the system does not do you much good unless you have written up how to use it, and that is another part of the challenge.
Probably our main objective at SSI is what we call minimum-time solution. This means that the users of the instrument decide they have a problem that they want to solve at the time they understand the solution. Indeed, from the first days of the company, we have been focused on that objective, which starts at the user arena in terms of how you set up your applications to the operating-system level to the I/O level. We are not just trying to build a throughput machine; we are really trying to solve the general problem and trying to lower the response time for an individual application.
Our approach has been to build architecture and hardware that have the highest performance applications and that are parallel at every level, even by default. I think, as we review hardware issues, we will see that we have succeeded in a fairly rational fashion at exploiting parallelism at almost every layer of the architecture on the machine.
Another objective at SSI is that we are focused on a visual laboratory paradigm. As you know, the early work of von Neumann at Los Alamos National Laboratory focused on the bandwidth match between the eyeball and the machine. A last, but not least, objective is to make sure that reliable software is delivered on time.
We believe we have a comprehensive parallel-processing strategy that does not leave too many stones unturned, although the massively parallel developers might look at the machine and say that we have not focused enough on massive parallelism. Our approach is to prepare for the customer an application base in terms of the relationships that SSI President and CEO Steve Chen and the company have set up with various important industrial customers or prospects, plus the relationships we have with the national laboratories. Unlike systems that were started 15 to 20 years ago, we are starting from quite a different point. We can focus our energies less on operating-system issues per se, in terms of building another type of operating system, and more on compiler problems.
There are a lot of penalties for parallel processing in the traditional world, some of which are perceptual, some architectural, and some very real. There is certainly a perception of high parallel-processing overhead because oftentimes you bring in a $20 or $30 million machine, and the administrative scheduling priorities are such that if you have 20 users
and you freeze 19 of them out to support one user well, that is not acceptable. Accounting issues relative to parallel processing come back and bite you, especially if you are doing something "automagically." If you take a 10-hour job, and it runs in two hours because the system decided that was the best way to run it, and you charge the user for 20 hours, your customer will probably be quite upset.
Another point is that the expertise required to optimize spans a lot of disciplines. If you take the massively parallel approach, you have a physicist who not only has to understand the problem very well but also has to understand the topology of the machine that the problem is being mapped to. In the case of classical parallel processing, you have to teach a practicing scientist or engineer what it means to deal with operating-systems-type asynchronous issues and all of the problems those issues cause. We have not done a lot in terms of the Fortran development or standards development to alleviate those problems. On top of that, you get nonreproducible results. If you give a programmer asynchronous behavior in a problem, then you have to provide the tools to help with that.
I think that although we talk about parallel processing, we are actually working it in traditional fashion, somewhat against the user's interest in the sense that we are placing additional algorithmic constraints on the user's ability to get the work done. We have not provided standard languages and systems to the point where you can implement something and move it across systems. A whole different set of disciplines is required in terms of understanding and mapping a problem. In that, I think we are creating a full-employment role for system programmers, although I suspect that is against our interest as a country. We need to basically deliver tools to get the job done for the end user who is being productive and the end user who is not a programmer.
In 1976, Seymour Cray said that one of the problems with being a pioneer is that you always make mistakes. In view of that remark, I never, never want to be a pioneer; it is always best to come second, when you can look at the mistakes the pioneers made. Put another way, the fast drives out the slow, even if the fast is wrong, which kind of goes with the
idea that good software is never noticed. As a software developer, I would not like to be in a position of saying that our software will never be noticed.
Lagging I/O is certainly a focus at SSI because as the machines change balances, you add compute power and memory power and do not necessarily deal with the I/O problem. You end up again bending the user's view of the machine and bending the user's ability to get an application to run.
I think Los Alamos National Laboratory is to be commended for Don Tolmie's efforts in the arena of high-performance parallel interface (HIPPI). I think because of the 200-megabyte HIPPI, there is some chance that we will be able to bring I/O back into balance in the next generation of machines.
We can start looking at data-parallel applications and start talking about what it means to really deliver 1012 floating-point operations per second (TFLOPS), or even hundreds of TFLOPS. And having done so, we will realize what it means when memory is the processor. The challenge of the future architectures is less about the processor as an entity unto itself with some memory access and more about the memory actually being a processor.
My bottom line is that the massively parallel work that is going on—in which people actually look at algorithms in terms of data-parallel applications and look at vector machines and multiple-processor vector machines—does us nothing but good in the traditional world. The memory-as-processor is probably what will develop in the next five years as the real solution to get a TFLOPS at a reasonable rate, perhaps bypassing the people that have focused on the floating-point, 32- or 64-bit domain.