previous chapter
The John von Neumann Computer Center: An Analysis
next chapter

The John von Neumann Computer Center:
An Analysis

Al Brenner

Alfred E. Brenner is Director of Applications Research at the Supercomputing Research Center, a division of the Institute for Defense Analysis, in Bowie, Maryland. He was the first president of the Consortium for Scientific Computing, the corporate parent of the John von Neumann Computer Center. Previously, he was head of the department of Computing at Fermi National Accelerator Laboratory. He has a bachelor's degree in physics and a Ph.D. in experimental high-energy physics from MIT.

Introduction

I have been asked to discuss and analyze the factors involved in the demise of the NSF Office of Advanced Scientific Computing (OASC) in Princeton, New Jersey—the John von Neumann Center (JVNC). My goal is to see if we can extract the factors that contributed to the failure to see whether the experience can be used to avoid such failures in the future. Analysis is much easier in hindsight than before the fact, so I will try to be as objective as I can in my analysis.

The "Pre-Lax Report" Period

During the 1970s, almost all of the supercomputers installed were found in government installations and were not generally accessible to the university research community. For those researchers who could not


470

gain access to these supercomputers, this was a frustrating period. A few found it was relatively easy to obtain time on supercomputers in Europe, especially in England and West Germany.

By the end of the decade, a number of studies, proposals, and other attempts were done to generate funds to make available large-scale computational facilities for some of the university research community. All of this was happening during a period when U.S. policy was tightening rather than relaxing the mechanisms for acquiring large-scale computing facilities.

The Lax Report

The weight of argument in the reports from these studies and proposals moved NSF to appoint Peter Lax, of New York University, an NSF Board Member, as chairman of a committee to organize a Panel on Large-Scale Computing in Science and Engineering. The panel was sponsored jointly by NSF and the Department of Defense in cooperation with the Department of Energy and NASA. The end product of this activity was the "Report of the Panel on Large-Scale Computing in Science and Engineering," usually referred to as the Lax Report, dated December 26, 1982.

The recommendations of the panel were straightforward and succinct. The overall recommendation was for the establishment of a national program to support the expanded use of high-performance computers. Four components to the program were

• increased access to supercomputing facilities for scientific and engineering research;

• increased research in computational mathematics, software, and algorithms;

• training of personnel in high-performance computing; and

• research and development for the implementation of new supercomputer systems.

The panel indicated that insufficient funds were being expended at the time and suggested an interagency and interdisciplinary national program.

Establishment of the Centers

In 1984, once the NSF acquired additional funding from Congress for the program, NSF called for proposals to establish national supercomputer centers. Over 20 proposals were received, and these were evaluated in an extension of the usual NSF peer-review process. In February 1985, NSF selected four of the proposals and announced awards to establish four national supercomputer centers. A fifth center was added in early 1986.


471

The five centers are organizationally quite different. The National Center for Supercomputing Applications at the University of Illinois-Urbana/Champaign and the Cornell Theory Center are formally operated by the universities in which those centers are located. The JVNC is managed by a nonprofit organization, the Consortium for Scientific Computing, Inc. (CSC), established solely to operate this center. The San Diego Supercomputer Center is operated by the for-profit General Atomics Corporation and is located on the campus of the University of California at San Diego. Finally, the Pittsburgh Supercomputing Center is run jointly by the University of Pittsburgh, Carnegie Mellon University, and Westinghouse Electric Corporation. NSF established the OASC that reported directly to the Director of NSF as the NSF program office through which to fund these centers.

While the selected centers were being established (these centers were called Phase 2 centers), NSF supported an extant group of supercomputing facilities (Phase 1 centers) to start supplying cycles to the research community at the earliest possible time. Phase 1 centers included Purdue University and Colorado State University, both with installed CYBER 205 computers; and the University of Minnesota, Boeing Computer Services, and Digital Productions, Inc., all with CRAY X-MP equipment. It is interesting to note that all these centers, which had been established independent of the OASC initiative, were phased out once the Phase 2 centers were in operation. All Phase 1 centers are now defunct as service centers for the community, or they are at least transformed rather dramatically into quite different entities. Indeed, NSF "used" these facilities, supported them for a couple of years, and then set them loose to "dry up."

From the very beginning, it was evident there were insufficient funds to run all Phase 2 centers at adequate levels. In almost all cases, the centers from the beginning have been working within very tight budgets, which has resulted in difficult decisions to be made by management and a less aggressive program than the user community demands. However, with a scarce and expensive resource such as supercomputers, such limitations are not unreasonable. During the second round of funding for an additional five-year period, the NSF has concluded that the JVNC should be closed. The closing of that center will alleviate some of the fiscal pressure on the remaining four centers. Let us now focus on the JVNC story.


472

The John von Neumann Center

The Proposal

When the call for proposals went out in 1984 for the establishment of the national supercomputer centers, a small number of active and involved computational scientists and engineers, some very closely involved with the NSF process in establishing these centers, analyzed the situation very carefully and generated a strategy that had a very high probability of placing their proposal in the winning set. One decision was to involve a modest number of prestigious universities in a consortium such that the combined prominence of the universities represented would easily outweigh almost any competition. Thus, the consortium included Brown University, Harvard University, the Institute for Advanced Study, MIT, New York University, the University of Pennsylvania, Pennsylvania State University, Princeton University, Rochester Institute, Rutgers University, and the Universities of Arizona and Colorado. (After the establishment of the JVNC, Columbia University joined the consortium.) This was a powerful roster of universities indeed.

A second important strategy was to propose a machine likely to be different from most of the other proposals. At the time, leaving aside IBM and Japan, Inc., the only two true participants were Cray Research and Engineering Technology Associates Systems (ETA). The CRAY X-MP was a mature and functioning system guaranteed to be able to supply the necessary resources for any center. The ETA-10, a machine under development at the time, had much potential and was being designed and manufactured by an experienced team that had spun off from Control Data Corporation (CDC). The ETA-10, if delivered with the capabilities promised, would exceed the performance of the Cray Research offerings at the time. A proposal based on the ETA-10 was likely to be a unique proposal.

These two strategic decisions were the crucial ones. Also, there were other factors that made the proposal yet more attractive. The most important of these was the aggressive networking stance of the proposal in using high-performance communications links to connect the consortium-member universities to the center.

Also, the plan envisioned a two-stage physical plant, starting with temporary quarters to house the center at the earliest possible date, followed by a permanent building to be occupied later. Another feature was to contract out the actual operations functions to one of the firms experienced in the operation of supercomputing centers at other laboratories.


473

Finally, the proposal was nicely complemented with a long list of proposed computational problems submitted by faculty members of the 12 founding institutions. Although these additional attributes of the proposal were not unique, they certainly enhanced the strong position of a consortium of prestigious universities operating a powerful supercomputer supplied by a new corporation supported by one of the most prominent of the old-time supercomputer firms. It should surprise no one that on the basis of peer reviews, NSF found the JVNC proposal to be an attractive one.

I would like now to explore the primary entities involved in the establishment, operation, funding, and oversight of the JVNC.

Consortium for Scientific Computing

The CSC is a nonprofit corporation formed by the 12 universities of the consortium for the sole purpose of running the JVNC. Initially, each university was to be represented within the consortium by the technical representative who had been the primary developer of the proposal submitted to NSF. Early in the incorporation process, representation on the consortium was expanded to include two individuals from each university—one technical faculty and one university administrator. The consortium Board of Directors elected an Executive Committee from its own membership. This committee of seven members, as in normal corporate situations, wielded the actual power of the consortium. The most important function of the CSC included two activities: (1) the appointment of a Chief Operating Officer (the President) and (2) the establishment of policies guiding the activities of the center. As we analyze what went wrong with the JVNC, we will see that the consortium, in particular the Executive Committee, did not restrict itself to these functions but ranged broadly over many activities, to the detriment of the JVNC.

The Universities

The universities were the stable corporate entities upon which the consortium's credibility was based. Once the universities agreed to go forth with the proposal and the establishment of the consortium, they played a very small role.

The proposal called for the universities to share in the support of the centers. Typically, the sharing was done "in kind" and not in actual dollars, and the universities were involved in establishing the bartering chips that were required.


474

The State of New Jersey

The State of New Jersey supported the consortium enthusiastically. It provided the only, truly substantial, expendable dollar funding to the JVNC above the base NSF funding. These funds were funneled through the New Jersey State Commission for Science and Technology. The state was represented on the consortium board by one nonvoting member.

The NSF

NSF had moved forward on the basis of the proposals of the Lax Report and, with only modest previous experience with such large and complex organizations, established the five centers. The OASC reported directly to the Director of NSF to manage the cooperative agreements with the centers. Most of the senior people in this small office were tapped from other directorates within NSF to take on difficult responsibilities, and these people often had little or no prior experience with supercomputers.

ETA

In August 1983, ETA had been spun off from CDC to develop and market the ETA-10, a natural follow-on of the modestly successful CYBER 205 line of computers. The reason for the establishment of ETA was to insulate from the rest of CDC the ETA development team and its very large demands for finances. This was both to allow ETA to do its job and to protect CDC from an arbitrary drain of resources.

The ETA machine was a natural extension of the CYBER 205 architecture. The primary architect was the same individual, and much of the development team was the same team, that had been involved in the development of the CYBER 205.

Zero One

The JVNC contracted the actual daily operations of its center to an experienced facilitator. The important advantage to this approach was the ability to move forward as quickly as possible by using the resources of an already extant entity with an existing infrastructure and experience.

Zero One, originally Technology Development Corporation, was awarded the contract because it had experience in operating supercomputing facilities at NASA Ames, and they appeared to have an adequate, if not large, personnel base. As it turned out, apart from a small number of people, all of the personnel assigned to the JVNC were newly hired.


475

JVNC

During the first half of 1985, the consortium moved quickly and initiated the efforts to establish the JVNC. One of the first efforts was to find a building. Once all the factors were understood, rather than the proposed two-phase building approach, it was decided to move forward with a permanent building as quickly as possible and to use temporary quarters to house personnel, but not equipment, while the building was being readied.

The site chosen for the JVNC was in the Forrestral Research Center off Route 1, a short distance from Princeton University. The building shell was in place at the time of the commitment by the consortium and it was only the interior "customer modification" that was required. Starting on July 1, 1986, the building functioned quite well for the purposes of the JVNC.

A small number of key personnel were hired. Contracts were written with the primary vendors. The Cooperative Agreement to define the funding profile and the division of responsibility between the consortium and the NSF was also drawn up.

What Went Wrong?

The Analysis

The startup process at JVNC was not very different from the processes at the other NSF-funded supercomputing centers. Why are they still functioning today while the JVNC is closed? Many factors contributed to the lack of strength of the JVNC. As with any other human endeavor, if one does not push in all dimensions to make it right, the sum of a large number of relatively minor problems might mean failure, whereas a bit more strength or possibly better luck might make for a winner.

I will first address the minor issues that, I believe, without more detailed knowledge, may sometimes be thought of as being more important than they actually were. I will then address what I believe were the real problems.

Location

Certainly, the location of the JVNC building was not conducive to a successful intellectual enterprise. Today, with most computer accesses occurring over communications links, it is difficult to promote an intellectually vibrant community at the hardware site. If the hardware is close by, on the same campus or in the same building where many of the user


476

participants reside, there is a much better chance of generating the collegial spirit and intellectual atmosphere for the center and its support personnel. The JVNC, in a commercial industrial park sufficiently far from even its closest university customers, found itself essentially in isolation.

Furthermore, because of the meager funding that allowed no in-house research-oriented staff, an almost totally vacuous intellectual atmosphere existed, with the exception of visitors from the user community and the occasional invited speaker. For those centers on campuses or for those centers able to generate some internal research, the intellectual atmosphere was certainly much healthier and more supportive than that at the JVNC.

Corporate Problems

Some of the problems the JVNC experienced were really problems that emanated from the two primary companies that the JVNC was working with: ETA and Zero One. The Zero One problem was basically one of relying too heavily on a corporate entity that actually had very little flex in its capabilities. At the beginning, it would have been helpful if Zero One had been able to better use its talent elsewhere to get the JVNC started, but it was not capable of doing that, with one or two exceptions. The expertise it had, although adequate, was not strong, so the relationship JVNC had with Zero One was not particularly effective in establishing the JVNC. Toward the end of June 1989, JVNC terminated its relationship with Zero One and took on the responsibility of operating the center by itself. Consequently, the Zero One involvement was not an important factor in the long-term JVNC complications.

The problems experienced in regard to ETA were much more fundamental to the demise of JVNC. I believe there were two issues that had a direct bearing on the status of the JVNC. The first was compounded by the inexperience of many of the board members. When the ETA-10 was first announced, the clock cycle time was advertised as five nanoseconds. By the time contractual arrangements had been completed, it was clear the five-nanosecond time was not attainable and that something more like seven or eight nanoseconds was the best goal to be achieved. As we know, the earliest machines were delivered with cycle times twice those numbers. The rancor and associated interactions concerning each of the entities' understanding of the clock period early in the relationship took what could have been a cooperative interaction and essentially poisoned it. Both organizations were at fault. ETA advertised more than they could deliver, and the consortium did not accommodate the facts.


477

Another area where ETA failed was in its inability to understand the importance of software to the success of the machine. Although the ETA hardware was first-rate in its implementation, the decision to make the ETA-10 compatible with the CYBER 205 had serious consequences. The primary operating-system efforts were to replicate the functionality of the CYBER 205 VSOS; any extensions would be shells around that native system. That decision and a less-than-modern approach to the implementation of the approach bogged down the whole software effort. One example was the high-performance linkages; these were old, modified programs that gave rise to totally unacceptable communications performance. As the pressures mounted for a modern operating system, in particular UNIX, the efforts fibrillated, no doubt consuming major resources, and never attained maturity. The delays imposed by these decisions certainly were not helpful to ETA or to the survival of the JVNC.

NSF, Funding, and Funding Leverage

We now come to an important complication, not unique to the JVNC but common to all of the NSF centers. To be as aggressive as possible, NSF extended itself as far as the funding level for the OASC would allow and encouraged cost-sharing arrangements to leverage the funding. This collateral funding, which came from universities, states, and corporate associates interested in participating in the centers' activities, was encouraged, expected, and counted upon for adequate funding for the centers.

As the cooperative agreements were constructed in early 1985, the funding profiles for the five-year agreements were laid out for each individual center's needs. The attempt to meet that profile was a painful experience for the JVNC management, and I believe the same could be said for the other centers as well. For the JVNC, much of the support in kind from universities was paper; indeed, in some cases, it was closer to being a reverse contribution.

As the delivery of the primary computer equipment to JVNC was delayed while some of the other centers were moving forward more effectively, the cooperative agreements were modified by NSF to accommodate these changes and stay within the actual funding profile at NSF. Without a modern functioning machine, the JVNC found it particularly difficult to attract corporate support. The other NSF centers, where state-of-the-art supercomputer systems were operational, were in much better positions to woo industrial partners, and they were more successful. Over the five-year life of the JVNC, only about $300,000 in corporate support was obtained; that was less than 10 per cent of the proposed


478

amount and less than three-quarters of one per cent of the actual NSF contribution.

One corporate entity, ETA, contributed a large amount to the JVNC. Because the delivery of the ETA-10 was so late, the payment for the system was repeatedly delayed. The revenue that ETA expected from delivery of the ETA-10 never came. Thus, in a sense, the hardware that was delivered to the JVNC—two CYBER 205 systems and the ETA-10—represented a very large ETA corporate contribution to the JVNC. The originally proposed ETA contribution, in discounts on the ETA-10, personnel support, and other unbilled services, was $9.6 million, which was more than 10 per cent of the proposed level of the NSF contribution.

A year after the original four centers were started, the fiscal stress in the program was quite apparent. Nevertheless, NSF chose to start the fifth center, thereby spreading its resources yet thinner. It is true that the NSF budgets were then growing, and it may have seemed to the NSF that it was a good idea to establish one more center. In retrospect, the funding level was inadequate for a new center. Even today, the funding levels of all the centers remain inadequate to support dynamic, powerful centers able to maintain strong, state-of-the-art technology.

Governance

I now come to what I believe to be the most serious single aspect that contributed to the demise of the JVNC: governance. The governance, as I perceive it, was defective in three separate domains, each defective in its own right but all contributing to the primary failure, which was the governance of the CSC. The three domains I refer to are the universities, NSF, and the consortium itself.

Part of the problem was that the expectations of almost all of the players far exceeded the possible realities. With the exception of the Director of NSF, there was hardly a person directly or indirectly involved in the governance of the JVNC who had any experience as an operator of such complex facilities as the supercomputing centers represented. Almost all of the technical expertise was as end users. This was true for the NSF OASC and for the technical representatives on the Board of Directors of the consortium. The expertise, hard work, maturation, and planning needed for multi-million-dollar computer acquisitions were unknown to this group. Their expectations both in time and in performance levels attainable at the start-up time of the center were totally unrealistic.

At one point during the course of the first year, when difficulties with ETA meeting its commitments became apparent, the consortium


479

negotiated the acquisition of state-of-the-art equipment from an alternate vendor. To move along expeditiously, the plan included acquiring a succession of two similar but incompatible supercomputing systems from that vendor, bringing them up, networking them, educating the users, and bringing them down in sequence—all over a nine-month period! This was to be done in parallel with the running of the CYBER 205, which was then to be the ETA interim system—all of this with the minuscule staff at JVNC. At a meeting where these plans were enunciated to NSF, the Director of NSF very vocally expressed his consternation of and disbelief in the viability of the proposal. The OASC staff, the actual line managers of the centers, had no sense of the difficulty of the process being proposed.

At a meeting of the board of the consortium, the board was frustrated by the denial of this alternate approach that had by then been promulgated by NSF. A senior member of the OASC, who had participated in the board meeting but had not understood the nuances of the problem, when given the opportunity to make clear the issues involved, failed to do so, thereby allowing to stand misconceptions that were to continue to plague the JVNC. I believe that incident, which was one of many, typified a failure in governance on the part of NSF's management of the JVNC Cooperative Agreement.

With respect to the consortium itself, the Executive Committee, which consisted of the small group of people who had initiated the JVNC proposal, insisted on managing the activities as they did their own individual research grants. On a number of occasions, the board was admonished by the nontechnical board members to allow the president to manage the center. At no point did that happen during the formation of the JVNC.

These are my perceptions of the first year of operation of the JVNC. I do not have first-hand information about the situation during the remaining years of the JVNC. However, leaving aside the temporary management provided by a senior Princeton University administrator on a number of occasions, the succession of three additional presidents of the consortium over the next three years surely supports the premise that the problems were not fixed.

Since NSF was not able to do its job adequately in its oversight of the consortium, where were the university presidents during this time? The universities were out of the picture because they had delegated their authority to their representatives on the board. In one instance, the president of Princeton University did force a change in the leadership of


480

the Board of Directors to try to fix the problem. Unfortunately, that action was not coupled to a simultaneous change of governance that was really needed to fix the problem. One simple fix would have been to rotate the cast of characters through the system at a fairly rapid clip, thereby disengaging the inside group that had initiated the JVNC.

Although the other centers had to deal with the same NSF management during the early days, their governance typically was in better hands. Therefore, they were in a better position to accommodate the less-than-expert management within the NSF. Fortunately, by the middle of the second year, the NSF had improved its position. A "rotator" with much experience in operating such centers was assigned to the OASC. Once there was a person with the appropriate technical knowledge in place at the OASC, the relationship between the centers and the NSF improved enormously.

Conclusions

I have tried to expose the problems that contributed to the demise of the JVNC. In such a complex and expensive enterprise, not everything will go right. Certainly many difficult factors were common to all five centers. It was the concatenation of the common factors with the ones unique to the JVNC that caused its demise and allowed the other centers to survive. Of course, once the JVNC was removed, the funding profile for the other centers must have improved.

In summary, I believe the most serious problem was the governance arrangements that controlled the management of the JVNC. Here seeds of failure were sown at the inception of the JVNC and were not weeded out. A second difficulty was the lack of adequate funding. I believe the second factor continues to affect the other centers and is a potential problem for all of them in terms of staying healthy, acquiring new machines, and maintaining challenging environments.

I have tried as gently as possible to expose the organizations that were deficient and that must bear some responsibilities for the failure of the JVNC. I hope, when new activities in this vein are initiated in the future, these lessons will be remembered and the same paths will not be traveled once again.


481

previous chapter
The John von Neumann Computer Center: An Analysis
next chapter