Overview of Industrial Supercomputing
Kenneth W. Neves
Kenneth W. Neves is a Technical Fellow of the Boeing Company (in the discipline of scientific computing) and Manager of Research and Development Programs for the Technology Division of Boeing Computer Services. He holds a bachelor's degree from San Jose State University, San Jose, California, and master's and doctorate degrees in mathematics from Arizona State University, Tempe, Arizona. He developed and now manages the High-Speed Computing Program dedicated to exploration of scientific computing issues in distributed/parallel computing, visualization, and multidisciplinary analysis and design.
Abstract
This paper summarizes both the author's views as panelist and chair and the views of other panelists expressed during presentations and discussions in connection with the Industrial Supercomputing Session convened at the second Frontiers of Supercomputing conference. The other panel members were Patric Savage, Senior Research Fellow, Computer Science Department, Shell Development Company; Howard E. Simmons, Vice President and Senior Advisor, du Pont Company; Myron Ginsberg, Consultant Systems Engineer, EDS Advanced Computing Center, General Motors Corporation; and Robert Hermann, Vice President for Science and Technology, United Technologies Corporation. Included in these remarks is an overview of the basic issues related to high-performance computing needs of private-sector industrial users. Discussions
that ensued following the presentations of individual panel members focused on supercomputing questions from an industrial perspective in areas that include cultural issues and problems, support issues, efficiency versus ease of use, technology transfer, impediments to broader use, encouraging industrial use, and industrial grand challenges.
Introduction
The supercomputer industry is a fragile industry. In 1983, when this conference first met, we were concerned with the challenge of international competition in this market sector. In recent times, the challenge to the economic health and well-being of this industry in the U.S. has not come from foreign competition but from technology improvements at the low end and confusion in the primary market, industry. The economic viability of the supercomputing industry will depend on the acceptance by private industrial users. Traditional industrial users of supercomputing have come to understand that using computing tools at the high end of the performance spectrum provides a competitive edge in product design quality. Yet, the question is no longer one of computational power alone. The resource of "supercomputing at the highest end" is a very visible expense on most corporate ledgers.
In 1983 a case could be made that in sheer price/performance, supercomputers were leaders and if used properly, could reduce corporate computing costs. Today, this argument is no longer true. Supercomputers are at the leading edge of price/performance, but there are alternatives equally competitive in the workstation arena and in the midrange of price and performance. The issue then, is not simply accounting but one of capability. With advanced computing capability, both in memory size and computational power, the opportunity exists to improve product designs (e.g., fuel-efficient airplanes), optimize performance (e.g., enhanced oil recovery), and shorten time from conceptual design to manufacture (e.g., find a likely minimal-energy state for a new compound or medicine). Even in industries where these principles are understood, there still are impediments to the acquisition and use of high-performance computing tools. In what follows we attempt to identify these issues and look at aspects of technology transfer and collaboration among governmental, academic, and industrial sectors that could improve the economic health of the industry and the competitiveness of companies that depend on technology in their product design and manufacturing processes.
Why Use Supercomputing at All?
Before we can analyze the inhibitors to the use of supercomputing, we must have a common understanding of the need for supercomputing. First, the term supercomputer has become overused to the point of being meaningless, as was indicated in remarks by several at this conference. By a supercomputer we mean the fastest, most capable machine available by the only measure that is meaningful—sustained performance on an industrial application of competitive importance to the industry in question. The issue is not which machine is best, at this point, but that some machines or group of machines are more capable than most others, and this class we shall refer to as "supercomputers." Today this class is viewed as large vector computers with a modest amount of parallelism, but the future promises to be more complicated, since one general type of architecture probably won't dominate the market.
In the aerospace industry, there are traditional workhorse applications, such as aerodynamics, structural analysis, electromagnetics, circuit design, and a few others. Most of these programs analyze a design. One creates a geometric description of a wing, for example, and then analyzes the flow over the wing. We know that today supercomputers cannot handle this problem in its full complexity of geometry and physics. We use simplifications in the model and solve approximations as best we can. Thus, the traditional drivers for more computational power still exist. Smaller problems can be run on workstations, but "new insights" can only be achieved with increased computing power.
A new generation of computational challenges face us as well (Neves and Kowalik 1989). We need not simply analysis programs but also design programs. Let's consider three examples of challenging computing processes. First, consider a program in which one could input a desired shock wave and an initial geometric configuration of a wing and have the optimal wing geometry calculated to most closely simulate the desired shock (or pressure profile). With this capability we could greatly reduce the wing design cycle time and improve product quality. In fact, we could reduce serious flutter problems early in the design and reduce risk of failure and fatigue in the finished product. This type of computation would have today's supercomputing applications as "inner loops" of a design system requiring much more computing power than available today. A second example comes from manufacturing. It is not unusual for a finalized design to be forwarded to manufacturing just to find out that the design cannot be manufactured "as designed" for some
unanticipated reason. Manufacturability, reliability, and maintainability constraints need to be "designed into" the product, not discovered downstream. This design/build concept opens a whole new aspect of computation that we can't touch with today's computing equipment or approaches. Finally, consider the combination of many disciplines that today are separate elements in design. Aerodynamics, structural analyses, thermal effects, and control systems all could and should be combined in design evaluation and not considered separately. To solve these problems, computing power of greater capability is required; in fact, the more computing power, the "better" the product! It is not a question of being able to use a workstation to solve these problems. The question is, can a corporation afford to allow products to be designed on workstations (with yesterday's techniques) while competitors are solving for optimal designs with supercomputers?
Given the rich demand for computational power to advance science and engineering research, design, and analysis as described above, it would seem that there would be no end to the rate at which supercomputers could be sold. Indeed, technically there is no end to the appetite for more power, but in reality each new quantum jump in computational power at a given location (user community) will satisfy needs for some amount of time before a new machine can be justified. The strength in the supercomputer market in the 1980s came from two sources: existing customers and "new" industries. Petrochemical industries, closely followed by the aerospace industry, were the early recruits. These industries seem to establish a direct connection between profit and/or productivity and computing power. Most companies in these industries not only bought machines but upgraded to next-generation machines within about five years. This alone established an upswing in the supercomputing market when matched by the already strong government laboratory market from whence supercomputers sprang. Industry by industry, market penetration was made by companies like Cray Research, Inc. In 1983 the Japanese entered the market, and several of their companies did well outside the U.S. New market industries worldwide included weather prediction, automobiles, chemicals, pharmaceuticals, academic research institutions (state- and NSF-supported), and biological and environmental sciences. The rapid addition of "new" markets by industries created a phenomenal growth rate.
In 1989 the pace of sales slackened at the high end. The reasons are complex and varied, partly because of the options for users with "less than supercomputer problems" to find cost-effective alternatives; but the biggest impact, in my opinion, is the inability to create new industry
markets. Most of the main technically oriented industries are already involved in supercomputing, and the pace of sales has slowed to that of upgrades to support the traditional analysis computations alluded to above. This is critical to the success of these companies but has definitely slowed the rate of sales enjoyed in the 1980s. This might seem like a bleak picture if it weren't for one thing: as important as these traditional applications are, they are but the tip of the iceberg of scientific computing opportunities in industry . In fact, at Boeing well over a billion dollars are invested in computing hardware. Supercomputers have made a very small "dent" in this computing budget. One might say that even though supercomputers exist at almost 100 per cent penetration by company in aerospace, within companies, this penetration is less than five per cent.
Certainly supercomputers are not fit for all computing applications in large manufacturing companies. However, the acceptance of any computing tool, or research tool such as a wind tunnel, is a function of its contribution to the "bottom line." The bottom line is profit margin and market share. To gain market share you must have the "best product at the least cost." Supercomputing is often associated with design and hence, product quality. The new applications of concurrent engineering (multidisciplinary analysis) and optimal design (described above) will achieve cost reduction by ensuring that manufacturability, reliability, and maintainability are included in the design. This story needs to be technically developed and understood by both scientists and management. The real untapped market, however, lies in bringing high-end computation to bear on manufacturing problems ignored so far by both technologists and management in private industry.
For example, recently at Boeing we established a Computational Modeling Initiative to discover new ways in which the bottom line can be helped by computing technology. In a recent pilot study, we examined the rivet-forming process. Riveting is a critical part of airplane manufacturing. A good rivet is needed if fatigue and corrosion are to be minimized. Little is known about this process other than experimental data. By simulating the riveting process and animating it for slow-motion replay, we have utilized computing to simulate and display what cannot be seen experimentally. Improved rivet design to reduce strain during the riveting has resulted in immediate payoff during manufacturing and greatly reduced maintenance cost over the life of the plane. Note that this contributes very directly to the bottom line and is an easily understood contribution. We feel that these types of applications (which in this case required a supercomputer to handle the complex structural analysis simulation) could fill many supercomputers productively once the
applications are found and implemented. This latent market for computation within the manufacturing sectors of existing supercomputer industries is potentially bigger than supercomputing use today. The list of opportunities is enormous: robotics simulation and design, factory scheduling, statistical tolerance analysis, electronic mockup (of parts, assemblies, products, and tooling), discrete simulation of assembly, spares inventory (just-in-time analysis of large, complex manufacturing systems), and a host of others.
We have identified three critical drivers for a successful supercomputing market that all are critical for U.S. industrial competitiveness: 1) traditional and more refined analysis; 2) design optimization, multidisplinary analysis, and concurrent engineering (design/build); and 3) new applications of computation to manufacturing process productivity.
The opportunities in item 3 above are so varied, even at a large company like Boeing, it is hard to be explicit. In fact, the situation requires those involved in the processes to define such opportunities. In many cases, the use of computation is traditionally foreign to the manufacturing process, which is often a "build and test" methodology, and this makes the discovery of computational opportunities difficult. What is clear, however, is that supercomputing opportunities exist (i.e., a significant contribution can be made to increased profit, market share, or quality of products through supercomputing). It is worthwhile to point out broadly where supercomputing has missed its opportunities in most industries, but certainly in the aerospace sector:
• manufacturing—e.g., rivet-forming simulation, composite material properties;
• CAD/CAM—e.g., electronic mockup, virtual reality, interference modeling, animated inspection of assembled parts;
• common product data storage—e.g., geometric-model to grid-model translation; and
• grand-challenge problems—e.g., concurrent engineering, data transfer: IGES, PDES, CALS.
In each area above, supercomputing has a role. That role is often not central to the area but critical in improving the process. For example, supercomputers today are not very good database machines, yet much of the engineering data stored in, say, the definition of an airplane is required for downstream analysis in which supercomputing can play a role. Because supercomputers are not easily interfaced to corporate data farms, much of that analysis is often done on slower equipment, to the detriment of cost and productivity.
With this as a basis, how can there be any softness in the supercomputer market? Clearly, supercomputers are fundamental to competitiveness, or are they?
Impediments to Industrial Use of Supercomputers
Supercomputers have been used to great competitive advantage throughout many industries (Erisman and Neves 1987). The road to changing a company from one that merely uses computers on routine tasks to one that employs the latest, most powerful machines as research and industrial tools to improve profit is a difficult one indeed. The barriers include technical, financial, and cultural issues that are often complex; and even more consternating, once addressed, they can often reappear over time. The solution to these issues requires both management and technologists in a cooperative effort. We begin with what are probably the most difficult issues—cultural and financial barriers.
The cultural barriers that prevent supercomputing from taking its rightful place in the computing venue abound. Topping the list is management understanding of supercomputing's impact on the bottom line. Management education in this area is sorely needed, as most managers who have wrestled with these issues attest. Dr. Hermann, one of the panelists in this session, suggested that a successful "sell" to management must include a financial-benefits story that very few people can develop. To tell this story one must be a technologist who understands the specific technical contributions computing can have on both a company's product/processes and its corporate competitive and profit goals. Of the few technologists who have this type of overview, how many would take on what could be a two-year "sell" to management? History can attest that almost every successful supercomputer placement in industry, government, or academia has rested on the shoulders of a handful of zealots or champions with that rare vision. This is often true of expensive research-tool investments, but for computing it is more difficult because of the relative infancy of the industry. Most upper-level managers have not personally experienced the effective use of research computing. When they came up through the "ranks," computing, if it existed at all, was little more than a glorified engineering calculator (slide rule). Managers in the aerospace industry fully understand the purpose of a $100 million investment in a wind tunnel, but until only in the last few years did any of them have to grapple with a $20 million investment in a "numerical" wind tunnel. Continuing with this last aerospace example, how did the culture change? An indispensable ally in the aerospace
industry's education process has been the path-finding role of NASA, in both technology and collaboration with industry. We will explore government-industry collaboration further in the next section.
Cultural issues are not all management in nature. As an example, consider the increasing need for collaborative (design-build) work and multidisciplinary analysis. In these areas, supercomputing can be the most important tool in creating an environment that allows tremendous impact on the bottom line, as described above. However, quite often the disciplines that need to cooperate are represented by different (often large) organizations. Nontechnical impediments associated with change of any kind arise, such as domain protection, fear of loss of control, and career insecurities owing to unfamiliarity with computing technology. Often these concerns are never stated but exist at a subliminal level. In addition, organizations handle computing differently, often on disparate systems with incompatible geometric description models, and the technical barriers from years of cultural separation are very real indeed.
Financial barriers can be the most frustrating of all. Supercomputers, almost as part of their definition, are expensive. They cost from $10 to $30 million and thus are usually purchased at the corporate level. The expense of this kind of acquisition is often distributed by some financial mechanism that assigns that cost to those who use it. Therein lies the problem. To most users, their desk, pencils, paper, phone, desk-top computer, etc., are simply there. For example, there is no apparent charge to them, their project, or their management when they pick up the phone. Department-level minicomputers, while a visible expense, are controlled at a local level, and the expenses are well understood and accepted before purchase. Shared corporate resources, however, look completely different. They often cost real project dollars. To purchase X dollars of computer time from the company central resource costs a project X dollars of labor. This tradeoff applies pressure to use the least amount of central computing resources possible. This is like asking an astronomer to look through his telescope only when absolutely necessary for the shortest time possible while hoping he discovers a new and distant galaxy.
This same problem has another impact that is more subtle. Supercomputers like the Cray Research machines often involve multiple CPUs. Most charging formulas involve CPU time as a parameter. Consequently, if one uses a supercomputer with the mind set of keeping costs down, one would likely use only one CPU at a time. After all, a good technologist knows that if he uses eight CPUs, Amdahl's law will probably only let him get the "bang" of six or seven and then only if he
is clever. What is the result? A corporation buys an eight-CPU supercomputer to finally tackle corporate grand-challenge problems, and the users immediately bring only the power of one CPU to bear on their problems for financial reasons. Well, one-eighth of a supercomputer is not a supercomputer, and one might opt for a lesser technological solution. In fact, this argument is often heard in industry today by single-CPU users grappling with financial barriers. This is particularly frustrating since the cost-reduction analysis is often well understood, and the loss in product design quality by solving problems on less competitive equipment is often not even identified!
The technological barriers are no less challenging. In fact, one should point out that the financial billing question relative to parallel processing will probably require a technological assist from vendors in their hardware and operating systems. To manage the computing resource properly, accounting "hooks" in a parallel environment need be more sophisticated. Providing the proper incentives to use parallel equipment when the overhead of parallelization is a real factor is not a simple matter. These are issues the vendors can no longer leave to the user but must become a partner in solving.
Supercomputers in industry have not really "engaged" the corporate enterprise computing scene. Computers have had a long history in most companies and are an integral part of daily processes in billing, CAD/CAM, data storage, scheduling, etc. Supercomputers have been brought into companies by a select group and for a specific need, usually in design analysis. These systems, like these organizations, are often placed "over there"—in a corner, an ivory tower, another building, another campus, or any place where they don't get in the way. Consequently, most of the life stream of the corporation, its product data, is out of reach, often electronically and culturally from the high-performance computing complex. The opportunities for supercomputing alluded to in the previous section suggest that supercomputers must be integrated into the corporate computing system. All contact with the central computing network begins at the workstation. From that point a supercomputer must be as available as any other computing resource. To accomplish this, a number of technical barriers must be overcome, such as
• transparent use,
• software-rich environment,
• visualization of results, and
• access to data.
If one delves into these broad and overlapping categories, a number of issues arise. Network topology, distributed computing strategy, and
standards for data storage and transport immediately spring to mind. Anyone who has looked at any of these issues knows the solutions require management and political savvy, as well as technical solutions. At a deeper level of concern are the issues of supercomputer behavior. On the one hand, when a large analysis application is to be run, the supercomputer must bring as much of its resources to bear on the computation as possible (otherwise it is not a supercomputer). On the other hand, if it is to be an equal partner on a network, it must be responsive to the interactive user. These are conflicting goals. Perhaps supercomputers on a network need a network front end, for example, to be both responsive and powerful. Who decides this issue? The solution to this conflict is not solely the responsibility of the vendor. Yet, left unresolved, this issue alone could "kill" supercomputer usage in any industrial environment.
As supercomputer architectures become increasingly more complex, the ability to transfer existing software to them becomes a pacing issue. If existing programs do not run at all or do not run fast on new computers, these machines simply will not be purchased. This problem, of course, is a classic problem of high-end computing. Vectorization and now parallelization are processes that we know we must contend with. The issue of algorithms and the like is well understood. There is a cultural issue for technologists, however. The need to be 100 per cent efficient on a parallel machine lessens as the degree of parallelism grows. For example, if we have two $20 million computers, and one runs a problem at 90 per cent efficiency at a sustained rate of four GFLOPS (billion floating-point operations per second), and the other runs a problem at 20 per cent efficiency at 40 GFLOPS, which would you choose? I would choose the one that got the job done the cheapest! (That can not be determined from the data given! For example, at 40 GFLOPS, the second computer might be using an algorithm that requires 100 times more floating-point operations to reach the same answer. Let us assume that this is not the case and that both computers are actually using the same algorithm.) The second computer might be favored. It probably is a computer that uses many parallel CPUs. How do we charge for the computer time? How do we account for the apparently wasted cycles? I ask these two questions to emphasize that, at all times, the corporate resource must be "accounted" for with well-understood accounting practices that are consistent with corporate and government regulations!
We have paid short shrift to technological issues, owing to time and space. It is hoped that one point has become clear—that the cultural, financial, and technical issues are quite intertwined. Their resolution and
the acceptance of high-end computing tools in industry will require collaboration and technology transfer among all sectors—government, industry, and academia.
Technology Transfer and Collaboration
Pending before Congress are several bills concerning tremendous potential advances in the infrastructure that supports high-performance computing. We at this meeting have a great deal of interest in cooperative efforts to further the cause of high-performance computing—to insure the technological competitiveness of our companies, our research institutions, and, indeed, our nation. To achieve these goals we must learn to work together to share fruitfully technological advances. The definition of infrastructure is perhaps a good starting point for discussing technology transfer challenges. The electronic thesaurus offers the following substitutes for infrastructure:
• chassis, framework, skeleton;
• complex, maze, network, organization, system;
• base, seat; and
• cadre, center, core, nucleus.
The legislation pending has all these characteristics. In terms of a national network that connects high-performance computing systems and large data repositories of research importance, the challenge goes well beyond simply providing connections and hardware. We want a national network that is not a maze but an organized, systematized framework to advance technology. Research support is only part of the goal, for research must be transferred to the bottom line in a sense similar to that discussed in previous sections. No single part of the infrastructure can be singled out, nor left out, for the result to be truly effective. We have spoken often in this forum of cooperative efforts among government, academia, and industry. I would like to be more explicit. If we take the three sectors one level of differentiation further, we have Figure 1.
Just as supercomputers must embrace the enterprise-wide computing establishment within large companies, the national initiatives in high-performance computing must embrace the end-user sector of industry, as well. The payoff is a more productive economy. We need a national network, just like we needed a national highway system, an analogy often used by Senator Albert Gore. Carrying this further, if we had restricted the highway system to any particular sector, we would not have seen the birth of the trucking industry, the hotel and tourism industries, and so on. Much is to be gained by cooperative efforts, and many benefits cannot be predicted in advance. Let us examine two
|
examples of technology transfer that came about through an investment in infrastructure, one by government and another by industry.
First is an example provided by Dr. Riaz Abdulla, from Eli Lilly Research Laboratories, in a private communication. He writes:
For your information, supercomputing, and particularly network supercomputing at Eli Lilly became successful owing to a mutually supportive research and management position on the matter. Both the grass-roots movement here, as well as enlightened management committed to providing the best possible tools to the research staff made the enhancement of our research computer network among the best. . . . We are well on the way to establishing a network of distributed processors directly linked to the supercomputing system via high-speed links modeled after the National Center for Supercomputing Applications [at the University of Illinois, one of the NSF-funded supercomputer centers] and the vision of Professor Larry Smarr. Without the model of the NCSA, its staff of scientists, consultants, engineers and software and visualization experts, Lilly's present success in supercomputing would have been impossible.
Clearly, the government investment in supercomputing for the academic world paid off for Eli Lilly. While this was not an original goal of the NSF initiatives, it clearly has become part of the national infrastructure that NSF has become a part of in supercomputing.
In the second example, technology is transferred from the private sector to the academic and government sectors. Boeing Computer Services
has been involved in supercomputing for almost two decades, from before the term was coined. We purchased Serial No. 2 of Control Data Corporation's CDC 6600, for example—a supercomputer in its day. As such, we owned and operated a national supercomputer time sales service when the NSF Advanced Scientific Computing Program was launched. We responded to a request for proposals to provide initial supercomputer time in Phase I of this program. Under contract with NSF we were able to give immediate access to supercomputing cycles. We formed a team to train over 150 research users in access to our system. This was done on location at 87 universities across the country. We provided three in-depth Supercomputing Institutes, the model of which was emulated by the centers themselves after they were established. In subsequent years we helped form, and are a member of, the Northwest Academic Computing Consortium (NWACC), along with 11 northwest universities. In collaboration we have secured NSF funding to create NWNet, the northwest regional NSF network. Boeing designed and initially operated this network but has since turned the operation over to NWACC and the University of Washington in Seattle. In other business activities, Boeing designed, installed, operates, and trains users of supercomputer centers in academia (the University of Alabama system) and government laboratories (NASA and the Department of Energy). Indeed, technology transfer is often a two-way street. The private sector is taking some very aggressive steps to advanced technology in our research laboratories, as well. (For example, see the paper following in this session, by Pat Savage, Shell Development Company, discussing Shell's leadership to the community in parallel computing tools and storage systems.)
Conclusion
We are delighted to see that much of the legislation before Congress recognizes the importance of technology transfer and collaboration among the Tier I entities of Figure 1. We are confident that all elements of Tier II will be included, but we exhort all concerned that this collaboration be well orchestrated and not left to serendipity. Transferring technology among organizations or Tier I sectors is the most difficult challenge we have, and our approach must be aggressive. The challenges of the supercomputing industry are no less difficult. They too can only be overcome by cooperation. These challenges are both technical and cultural and present an awesome management responsibility.
References
A. Erisman and K. W. Neves, "Advanced Computing for Manufacturing," Scientific American257 (4), 162-169 (1987).
K. W. Neves and J. S. Kowalik, "Supercomputing: Key Issues and Challenges," in NATO Advanced Research Workshop on Supercomputing, NATO ASI Series F, Vol. 62 , J.S. Kowalik, Ed., Springer-Verlag, New York (1989).