Frontiers of Supercomputing II "d0e12308"

High-Performance Computing at the National Security Agency

George Cotter

George R. Cotter currently serves as Chief Scientist for the National Security Agency (NSA). From June 1988 to April 1990, he was the Chairman of the Director's Senior Council, a study group that examined broad NSA and community issues. From June 1983 to June 1988, Mr. Cotter served as Deputy Director of Telecommunications and Computer Services at NSA, in which position he was responsible for implementing and managing worldwide cryptologic communications and computer systems.

Mr. Cotter has a B.A. from George Washington University, Washington, DC, and an M.S. in numerical science from Johns Hopkins University, Baltimore, Maryland. He has been awarded the Meritorious and Exceptional Civilian Service medals at NSA and in 1984 received the Presidential Rank of Meritorious Cryptologic Executive. Also in 1984, he received the Department of Defense Distinguished Civilian Service Award.

Introduction

High-performance computing (HPC) at the National Security Agency (NSA) is multilevel and widely distributed among users. NSA has six major HPC complexes that serve communities having common interests. Anywhere from 50 to several hundred individuals are served by any one complex. HPC is dominated by a full line of systems from Cray Research,

― 388 ―

Inc., supplemented by a few other systems. During the past decade, NSA has been driving toward a high level of standardization among the computing complexes to improve support and software portability. Nevertheless, the standardization effort is still in transition. In this talk I will describe the HPC system at NSA. Certain goals of NSA, as well as the problems involved in accomplishing them, will also be discussed.

Characterization of HPC

NSA's HPC can handle enormous input data volumes. For the division between scalar and vector operations, 30 per cent scalar to 70 per cent vector is typical, although vector operations sometimes approach 90 per cent. Little natural parallelism is found in much of the code we are running because the roots of the code come from design and implementations on serial systems. The code has been ported and patched across 6600s, 7600s, CRAY-1s, X-MPs, and right up the line. We would like to redo much of that code, but that would present a real challenge.

An important characteristic of our implementation is that both batch and interactive operations are done concurrently in each complex with much of the software development. Some of these operations are permanent and long-term, whereas others are experimental. The complexes support a large research community. Although interactive efforts are basically day operations, many batch activities require operating the systems 24 hours a day, seven days a week.

HPC Architecture

At NSA, the HPC operating environment is split between UNIX and our home-grown operating system, Folklore, and its higher-level language, IMP. The latter is still in use on some systems and will only disappear when the systems disappear.

The HPC architecture in a complex consists of the elements shown in Figure 1. As stated before, both Folklore and UNIX are in use. About five or six years ago, NSA detached users from direct connection to supercomputers by giving the users a rich variety of support systems and more powerful workstations. Thus, HPC is characterized as a distributed system because of the amount of work that is carried out at the workstation level and on user-support systems, such as CONVEX Computer Corporation machines and others, and across robust networks into supercomputers.

NSA has had a long history of building special-purpose devices that can be viewed as massively parallel processors because most of them do

― 389 ―

Figure 1.
NSA's HPC architecture in the 1990s.

very singular things on thousands of processors. Over the past few years, NSA has invested a great deal of effort to upgrade networking and storage capacity of the HPC complexes. At present, a major effort is under way to improve the mass-storage system supporting these complexes. Problems abound in network support. Progress has been slow in bringing new network technology into this environment because of the need to work with a large number of systems, with new protocols, and with new interfaces. A great deal of work remains to be done in this field.

Software Environment

IMP, Fortran, and C are the main languages used in HPC at NSA. Although a general Ada support function is running in the agency (in compliance with Department of Defense requirements to support Ada), HPC users are not enthusiastic about bringing up Ada compilers on these systems. NSA plans to eliminate IMP because it has little vectorizing capability, and the user has to deal with vectorizing.

Faster compilers are needed, particularly a parallelizing C compiler. HPC also requires full-screen editors, as well as interactive debuggers that allow partial debugging of software. Upgrading network support is a slow process because of the number of systems involved and new protocols and interfaces. Upgrading on-line documentation, likewise, has been slow. Software support lags three to five years behind the

― 390 ―

introduction of new hardware technology, and we don't seem to be gaining ground.

Mass-Storage Requirements

A large number of automatic tape libraries, known as near-line (1012-bit) storage, have deteriorated and cannot be repaired much longer. Mass-storage systems must be updated to an acceptable level. Key items in the list of storage requirements are capacity, footprint, availability, and bit-error rate, and these cannot be overemphasized. In the implementation of new mass-storage systems, NSA has been driven by the need for standardization and by the use of commercial, supportable hardware, but the effort has not always been completely successful.

One terabyte of data can be stored in any one of the ways shown graphically in Figure 2. If stacked, the nine-track tape reels would reach a height 500 feet, almost as high as the Washington Monument. Clearly, the size and cost of storage on nine-track tapes is intolerable if large amounts of data are to be fed into users' hands or into their applications. Therefore, this type of storage is not a solution.

NSA is working toward an affordable mass-storage system, known as R1/R2, because the size is manageable and the media compact (see Figure 3). This goal should be achieved in the middle 1990s. Central to the system will be data management and a data-management system, database system, and storage manager for this kind of capability, all being considered as a server to a set of clients (Cray, CONVEX, Unisys). The mass-storage system also includes Storage Tek silos having capabilities approaching a terabyte in full 16-silo configuration. In addition, E-Systems is developing (funded by NSA) a very large system consisting of a D2 tape, eight-millimeter helical-scan technology, and 1.2 × 1015 bits in a box that has a relatively small footprint. Unfortunately, seconds to minutes are required for data transfer calls through this system to clients being served, but nevertheless the system represents a fairly robust near-line storage capacity.

Why is this kind of storage necessary? Because one HPC complex receives 40 megabits of data per second, 24 hours a day, seven days a week—so one of these systems would be full in two days. Why is the government paying for the development of the box? Why is industry not developing it so that NSA might purchase it? Because the storage technology industry is far from robust, sometimes close to bankruptcy.

― 391 ―

Figure 2.
Storage requirements for one terabyte of data, by medium.

Figure 3.
Mass-storage system for R1/R2 architecture.

― 392 ―

Summary of Issues

I have addressed the following important issues:

• cost/performance relationships;

• large memory;

• mass storage;

• software environments;

• network support; and

• new architecture.

NSA is driven by computing requirements that would demand a 40 per cent improvement each year in cost/performance if annual investment were to be held steady. Since we are far from getting that improvement—even though cost/performance has improved a great deal over the years—the complexes are growing. We have problems that are intractable today because sufficiently large memories are not available on the systems. Mass storage and software environments have been thoroughly discussed. Network support, which is lagging behind, has not worked well with the storage industry or with the HPC industry. A much tighter integration of developments in the networking area is necessary to satisfy the needs of NSA.

HPC facilities issues include space, power, and cooling. We are seriously considering building an environmentally stable building that will allow the import of 40 kilowatts of power to the systems. However, such outrageous numbers should drive the computer industry toward cooler systems, new technology, and into the direction of superconductivity.

― 393 ―