Frontiers of Supercomputing II "d0e15697"

The Demise of ETA Systems

Lloyd Thorndyke

Lloyd M. Thorndyke is currently the CEO and chairman of DataMax, Inc., a startup company offering disk arrays. Before joining DataMax, he helped to found and was President and CEO of Engineering Technology Associates Systems, the supercomputer subsidiary of Control Data Corporation. At Control Data he held executive and technical management positions in computer and peripheral operations. He received the 1988 Chairman's Award from the American Association of Engineering Societies for his contributions to the engineering professions.

In the Beginning

Engineering Technology Associates Systems, or just plain ETA, was organized in the summer of 1983, and as some of you remember, its founding was announced here at the first Frontiers of Supercomputing conference in August 1983. At the start, 127 of the 275 people in the Control Data Corporation (CDC) CYBER 205 research and applications group were transferred to form the nucleus of ETA. This was the first mistake—we started with too many people.

The original business plan called for moving the entire supercomputer business, including the CYBER 205 and its installed base of 40 systems, to ETA. That never happened, and the result was fragmentation of the product line strategies and a split in the management of the CYBER 200 and ETA product lines. As a consequence it left the CDC CYBER 205 product line without dedicated management and direction and

― 490 ―

undermined any upgrade strategy to the ETA-10. Another serious consequence was the lack of a migration path for CYBER 200 users to move to the ETA-10.

ETA was founded with the intention of eventually becoming an independent enterprise. Initially, we had our own sales and marketing groups because the CYBER 205 was planned as part of ETA. Because the CYBER 205 was retained by CDC, the sales and marketing organizations of the two companies were confused. It seemed that CDC repeatedly reorganized to find the formula for success. The marketing management at CDC took responsibility when ETA was successful and returned responsibility to us when things did not go well. Without question, the failure to consolidate the CYBER 200 product line and marketing and sales management at ETA was a major contributing factor to the ETA failure.

There was an initial major assumption that the U.S. government would support the entry of a new supercomputer company through a combination of R&D funding and orders for early systems. The blueprint of the 1960s was going to be used over again. We read the tea leaves wrong. No such support ever came forth, and we did not secure orders from the traditional leading-edge labs. Furthermore, we did not receive R&D support from the funding agencies for our chip, board, and manufacturing technologies. We had four meetings with U.S. government agencies, and they shot the horse four times: the only good result was that they missed me each time.

The lack of U.S. government support was critical to our financial image. The lack of early software and systems technical help also contributed to delays in maturing our system. Other vendors did and still do receive such support, but such was not case with ETA. Our planning anticipated that the U.S. government would help a small startup. That proved to be a serious error.

Control Data played the role of the venture capitalist at the start and owned 90 per cent of the stock, with the balance held by the principals. The CDC long-range intent was to dilute to 40 per cent ownership through a public offering or corporate partnering as soon as possible. The failure to consummate the corporate partner, although we had a willing candidate, was a major setback in the financial area.

The first systems shipment was made to Florida State University (FSU) in December of 1986—three years and four months from the start. From that standpoint, we feel we reduced the development schedule of a complex supercomputer by almost 50 per cent.

At the time of the dynamiting of ETA on April 17, 1989, there were six liquid nitrogen-cooled systems installed. Contrary to the bad PR you

― 491 ―

might have heard, the system at FSU was a four-processor G system operating at seven nanoseconds. In fact, for a year after the closing, the system ran for a year at high levels of performance and quality, as FSU faculty will attest.

We had about 25 air-cooled systems installed (you may know them as Pipers) at customer sites. Internally, there were a total of 25 processors, both liquid- and air-cooled, dedicated to software development. Those are impressive numbers if one considers the costs of carrying the inventory and operating costs.

Hardware

From a technology viewpoint, I believe the ETA-10 was an outstanding hardware breakthrough and a first-rate manufacturing effort. We used very dense complementary metal oxide semiconductor (CMOS) circuits, reducing the size of the supercomputer processor to a single 16- by 22-inch board. I'm sure many of you have seen that processor. The CMOS chips reduced the power consumption of the processor to about 400 watts—that's watts, not 400 kilowatts. The use of CMOS chips operating in liquid nitrogen instead of ambient air resulted in doubling the speed of the CMOS. As a result of the two cooling methods and the configuration span from a single air-cooled processor to an eight-processor, liquid-cooled machine, we achieved a 27-to-one performance range. That range was able to use the same software and training for the diagnostics, operating-system software, and manufacturing checkout. We had broad commonality on a product line and inventory from top to bottom. We paid for the design only once, not many times. Other companies have proposed such a strategy—we executed it.

The liquid-nitrogen cryogenic cooling was a critical part of our design. I would suggest liquid nitrogen cooling as a technology other people should seriously consider. For example, a 20-watt computer will boil off one gallon of liquid nitrogen in an eight-hour period. Liquid nitrogen can be bought in bulk at a price cheaper than milk—it is as low as 25 cents a gallon in large quantities. This equals eight hours of operation for $0.40, assuming $0.40 per gallon. We get about 90 cubic feet of -200°C nitrogen gas. This gas can also help cool the rest of your computer room, greatly reducing the cooling requirements.

The criticism that liquid nitrogen resulted in a long mean time to repair was erroneous because at the time of the ETA closure, we could replace a processor in a matter of hours. The combination of CMOS and liquid-nitrogen cooling coupled with the configuration range provided a broad

― 492 ―

product family. These were good decisions—not everything we did was wrong.

The ETA-10 manufacturing process was internally developed and represented a significant advance in the state of the art. The perfect processor board yield at the end was 65 per cent for a board that was 16 by 22 inches with 44 layers and a 50-ohm controlled impedance. Another 30 per cent were usable with surface ECO wires. The remaining five per cent were scrap. This automated line produced enough boards to build two computers a day with just a few people involved.

For board assembly, we designed and built a pick-and-place robot to set the CMOS chips onto the processor board, an operation it could perform in less than four hours. The checkout of the computer took a few more hours. We really did have a system designed for volume manufacturing.

Initially, the semiconductor vendor was critical to us because it was the only such vendor in the United States that would even consider our advanced CMOS technology. In retrospect, our technology requirements and schedule were beyond the capabilities of the vendor to develop and deliver. Also, this vendor was not a merchant semiconductor supplier and did not have the infrastructure or outside market to support the effort. We were expected to place enough orders and supply enough funding to keep them interested in our effort. Our mistake was teaming with a nonmerchant vendor needing our resources to stay in the commercial semiconductor business.

We believed that we should work with U.S. semiconductor vendors because of the critical health of the U.S. semiconductor business. I would hasten to point out that the Japanese were very willing to supply us with the technology, both logic and memory that met or exceeded what we needed. Still, we stayed with the U.S. suppliers longer than good judgment warranted because we thought there was value to having a U.S.-made supercomputer with domestic semiconductors. We believed that our government encouraged such thinking, but ETA paid the price. In essence, we were trying to sell a computer with 100 per cent U.S. logic and memory components against a computer with 90 per cent Japanese logic and memory components, but we could not get any orders. I found the government's encouragement to us to use only U.S. semiconductor components and the subsequent action of buying competitive computers with the majority of their semiconductor content produced in Japan inconsistent and confusing.

Very clearly, the use of Japanese components does not affect the salability of the system in the U.S.—that message should be made clear to everyone. This error is not necessarily ETA's alone, but if the U.S.

― 493 ―

government wants healthy U.S. semiconductor companies, then it must create mechanisms to encourage products with high U.S. semiconductor content only and to support R&D to keep domestic suppliers up to the state of the art.

Software

It is difficult to say much good about the early ETA software and its underlying strategy, although it was settling down at the end. A major mistake was the early decision to develop a new operating system, as against porting the CYBER 205 VSOS operating system. Since the CYBER 205 remained at CDC, we did not have product responsibility or direction, and the new operating system seemed the best way at the time.

In hindsight there has been severe criticism for not porting UNIX to the ETA-10 at the beginning—that is, start with UNIX, only. But in 1983 it was not that clear. I now hear comments from people saying, "If ETA would have started with UNIX, I would have bought." It was only two years later that they said, "Well, you should have done UNIX." However, we did not get UNIX design help, advice, or early orders for a UNIX system.

After we completed a native UNIX system and debugged the early problems, the UNIX system stabilized and ran well on the air-cooled systems, and as a result, several additional units were ordered. While the ETA UNIX lacked many features needed for supercomputer operation, users knew that these options were coming, but we were late to market. In hindsight, we should have ported VSOS and then worked only on UNIX.

Industry Observations

To be successful in the commercial supercomputer world, one must have any array of application packages. While we recognized this early on, as a new entrant to the business, we faced a classical problem that was talked about by other presenters: you can't catch up if you can't catch up.

We were not able to stimulate the applications vendors' interest because we didn't have a user base. Simultaneously, it's hard to build a user base without application packages. This vicious circle has to be broken because all the companies proposing new architectures are in the same boat. Somehow, we must figure out how to get out of it, or precious few new applications will be offered, except by the wealthiest of companies.

We need to differentiate the true supercomputer from the current situation, where everyone has a supercomputer of some type. The PR people have confiscated the supercomputer name, and we must find a new name. Therefore, I propose that the three or four companies in this

― 494 ―

business should identify their products as superprocessor systems. It may not sound sexy, but it does the job. We can then define a supercomputer system as being composed of one or more superprocessors.

The supercomputer pursuit is equivalent to a religious crusade. One must have the religion to pursue the superprocessors because of the required dedication and great, but unknown, risks. In the past, CDC and some of you here pioneered the supercomputer. Mr. Price had the religion, but CDC hired computer executives who did not, and in fact, they seemed to be supercomputer atheists. It was a major error by CDC to hire two and three levels of executives with little or no experience in high-performance or supercomputer development, marketing, or sales and place them in the computer division. Tony Vacca, ETA's long-time technologist, now at Cray Research, Inc. (see Session 2), observed that supercomputer design and sales are the Super Bowl of effort and are not won by rookies. It seems CDC has proved that point.

Today we all use emitter-coupled-logic (ECL), bipolar, memory chips, and cooling technologies in product design because of performance and cost advantages. Please remember that these technologies were advanced by supercomputer developers and indirectly paid for by supercomputer users.

If Seymour Cray had received a few cents for every ECL chip and bipolar chip and licensing money for cooling technology, he wouldn't need any venture capital today to continue his thrust. However, that is not the case. I believe that Seymour is a national treasure, but he may become an endangered species if the claims I have heard at this conference about massively parallel systems are true. However, remember that claims alone do not create an endangered species.

I have learned a few things in my 25 years in the supercomputer business. One is that the high-performance computers pioneer costly technology and bear the brunt of the startup costs. The customers must pay a high price partly because of these heavy front-end costs. Followers use this developed technology without the heavy front-end costs and then argue supercomputers are too costly without considering that the technology is low-cost because supercomputers footed the early bills.

Somehow, some way, we in the U.S. must find a way to help pay the cost of starting up a very expensive, low-volume gallium arsenide facility so that all of us can reap the performance and cost benefits of the technology. Like silicon, the use will develop when most companies can afford to use it. That occurs only after someone has paid to put the technology in production, absorbed the high learning-curve costs, proved the performance, and demonstrated the packaging. Today we are asking

― 495 ―

one company to support those efforts. Unfortunately, we hear complaints that supercomputers with new technology cost too much. We should all be encouraging Seymour's effort, not predicting doom, and we should be prepared to share in the expenses.

The Japanese supercomputer companies are vertically integrated—an organizational structure that has worked well for them. Except for IBM and AT&T, the U.S. companies practice vertical cooperation. However, vertical cooperation must change so that semiconductor vendors will underwrite a larger part of the development costs. The user cannot continue to absorb huge losses while the vendor is making a profit and still expecting the relationship to flourish. This is not vertical cooperation; it is simply a buyer-seller relationship. To me, vertical cooperation means that the semiconductor vendors and the application vendors underwrite their costs for part of the interest in the products. That is true cooperation, and the U.S. must evolve to this or ignore costly technology developments and get out of the market.

I have been told frequently by the Japanese that they push the supercomputer because it drives their semiconductor technology to new components leading to new products that they know will be salable in the marketplace. In their case, vertical integration is a market-planning asset. I maintain that vertical cooperation can have similar results.

I believe that we have seen the gradual emergence of parallelism in the supercomputers offered by Cray and the Japanese—I define those architectures as Practical Parallelism. During the past two days, we have heard about the great expectations for massively parallel processors and the forecasted demise of the Cray dynasty. I refer to these efforts as Research Parallelism, and I want to add that Research Parallelism will become a practicality not when industry starts to buy them but when the Japanese start to sell them. The Japanese are attracted to profitable markets. Massively parallel systems will achieve the status of Practical Parallelism when the Japanese enter that market—that will be the sign that users have adopted the architecture, and the market is profitable.

I would like to close with a view of the industry. I lived through the late 1960s and early 1970s, when the U.S. university community was mesmerized by the Digital Equipment Corporation VAX and held with the belief that VAXs could do everything and there was no need for supercomputers. A few prophets like Larry Smarr, at the University of Illinois (Session 10), kept saying that supercomputers were needed in universities. That they were right is clearly demonstrated by the large number of supercomputers installed in universities today.

― 496 ―

Now I hear that same tune again. We are becoming mesmerized with superperformance workstations: they can do everything, and there is again no need for supercomputers. When will we learn that supercomputers are essential for leading-edge work? It is not whether we need supercomputers or super-performance workstations but that we need both working in unison. The supercomputer will explore new ideas, new applications, and new approaches. Therefore, I believe very strongly that it is both and not one or the other. The supercomputer has a place in our industry, so let's start to hear harmonious words of support in place of the theme of supercomputers being too costly and obsolete while massively parallel systems are perfect, cheap, and the only approach.

― 497 ―