Planning for a Supercomputing Future[*]
Norm Morse
For most of the past decade, Norman R. Morse has served as leader of the Computing and Communications Division at Los Alamos National Laboratory. He has a bachelor's degree in physics from Texas A&I University, Kingsville, and a master of science in electrical engineering and computer science from the University of New Mexico, Albuquerque. Under his leadership, the Laboratory's computing facilities were expanded to include the newly constructed Data Communications Center and the Advanced Computing Laboratory. In addition, he promoted the use of massively parallel computers, and through his efforts, the facilities now house three Connection Machines—two CM-2s and one CM-5. Norm recently returned to a staff position, where he plans to pursue research in clustered workstation paradigms for high-performance computing.
Over the past few years I, together with other people from Los Alamos National Laboratory, have been examining the state of computing at Los Alamos and have been thinking about what our desired future state of computing might be. I would like to share with you some of the insights
that we have had, as well as some thoughts on the forces that may shape our future.
We used a strategic planning model to guide our thinking. Figure 1 shows that model. From a particular current state of computing, there are many possible future states into which institutional computing can evolve, some of them more desirable than others. There are drivers that determine which one of these possible future states will result. We have been interested in examining this process and trying to understand the various drivers and how to use them to ensure that we arrive at what we perceive to be our desired state of computing.
In the late 1960s and early 1970s, the current state of computing was batch. We had evolved from single-user systems into a batch-processing environment. The future state was going to evolve from that state of technology and from the influences of that time.
The evolution from batch computing to timesharing computing came from a variety of drivers (Figure 2). One major driver was technology: terminals were invented about that time, and some rudimentary network capabilities were developed to support the needs for remote computing. The mainframes essentially didn't change—they were von Neumann central supercomputers. Software was developed to support a timesharing model of computing. And an important, nontechnical factor was that the money for computing came in through the applications that
people were interested in solving. The people who were developing and running those applications were interested in individual productivity.
It may have been in this period of time that we stopped thinking that machine productivity was the most important issue that we had to deal with, and we began to think that the productivity of the individuals who were using the machine should be maximized. So we evolved into a timesharing environment in the 1970s. We recognized the value of a centrally managed network and central services to support timesharing. Mass storage and high-quality print services became an important part of the network.
In the 1980s we went from a timesharing environment to a distributed environment (Figure 3). And again, the important influences that drove us from timesharing to distributed included advances in technology. But there were also other factors: a large user community required more control of their computing resource, and they valued the great increase in interactivity that came from having a dedicated computer on their desks.
The 1980s became the era of workstations—a lot of computational power that sat on your desk. Networks became more reliable and universal. We began to understand that networks were more than just wires that tied computers together. Users needed increased functionality, as well as more bandwidth, to handle both the applications and the user interfaces. Many of the centralized services began to migrate and were managed on user networks. We started thinking about doing visualization. Von Neumann central supercomputers, along with departmental-class mainframes, were still the workhorses of this environment. Massively parallel supercomputers were being developed.
The next environment hasn't sorted itself out yet. The future picture, from a hardware and software technology viewpoint, is becoming much more complicated. We're calling the next environment the high-performance computing environment (Figure 4).
Again, there are N possible future states into which we could evolve. The drivers or enablers that are driving the vector from where we are now to where we want to be in the future are getting more complicated, and they're not, in all cases, intuitively obvious.
The high-performance computing model that I see evolving, at least at Los Alamos, is one composed of three major parts: parallel workstations, networks, and supercomputers. I think that general-purpose computing is going to be done on workstations. The supercomputers are going to end up being special-purpose devices for the numerically intensive
portion of client tasks. In fact, I personally think they've always been special purpose, regardless of how much work we've put in to try to make them general-purpose devices. I think in the long term, supercomputers have to be massively parallel to achieve the speeds required to solve the grand-challenge problems. That's not to say that the workhorses of today will go away; the von Neumann vector pipeline machines and the departmental-class machines have been around for a long time and will remain essential for a large class of problems.
This high-performance computing environment will evolve and will consist of these three major elements. The real questions are, what will be the balance among them, and how well does that balance satisfy the needs of a particular organization?
Constituency is one of the important drivers. The workstations sitting on people's desks and the computers that people typically learn to use in universities are a very important part of the high-performance computing environment. Of the 8000 clients who use our computing center, virtually every one of them uses a personal computer. Somewhere around 250 people use 90 per cent of our supercomputing cycles in any given month. So when we looked for people to argue for various parts of this high-performance computing environment, we could find 8000 people who would argue for the workstation part. The Laboratory internet is becoming increasingly important to a broad population in the Laboratory because of the need to communicate with colleagues locally and internationally. So we could find 8000 people who would argue for networks. But on the other hand, there are only a few hundred who will argue vehemently for the supercomputing environment. This kind of support imbalance can shift the future state to one in which there is a strong workstation and network environment but a very weak supercomputing capability.
I would guess that the statistics at most sites are similar to this. There are a small number of people who dominate the use of the supercomputing resources, doing problems that are important to the mission of the institution. And if the institution is in the business of addressing grand-challenge problems, it takes a lot of supercomputing cycles to address those problems.
Given the environment described, I feel that the low-end technology will drive this evolution and is going to drive the top end. That is, the massively parallel supercomputers of the future will be made up of building blocks (hardware and software) developed for the workstation market. There are many reasons driving this trend, but one of the most important is the fact that the workstation market is huge compared with
the supercomputer market. A tremendous effort is under way to develop hardware and software for the workstation market. If supercomputers are made from workstation building blocks, the remaining question is whether the supercomputing capability will be closely coupled/closely integrated or loosely coupled/closely integrated. The marketplace will shake out the answer in the next few years.
Standards are going to be even more important in this new environment. For example, UNIX is going to run across the whole environment. It should be easy for people to do as much of their work on workstations as possible, and, when they run out of the power to do their work there, they will be able to use other, more powerful or less heavily used resources in the network to finish their jobs. This means that the UNIX systems must be compatible across a large variety of computing platforms. Computing vendors need to cooperate to build software systems that make this easy.
Another important driver, the funding method, may be different from what we've seen in the past. Traditionally, the money has come in through the applications, driving the future state of computing. The applications people drive the capabilities that they need. With the High Performance Computing Initiative, there is the potential, at least, for money to come directly into building a computing capability. And I think we need to be very careful that we understand what this capability is going to be used for. If we end up building a monument to computing that goes unused, I think we will not have been very successful in the High Performance Computing Initiative.
One last issue that I'll mention is that there are a lot of social issues pushing us toward our next state of computing. Local control seems to be the most important of those. People like to eliminate all dependencies on other folks to get their jobs done, so local control is important. We need to make the individual projects in an organization cognizant of the mission of the organization as a whole and to maintain capabilities that the organization needs to secure its market niche.