Object-Oriented Programming, Visualization, and User-Interface Issues[*]
David W. Forslund has served since 1989 as Deputy Director of the Advanced Computing Laboratory, Los Alamos National Laboratory. Dr. Forslund, a specialist in theoretical plasma physics, is credited with such accomplishments as the discovery of heat conduction instabilities in the solar wind and development of the first model to account for intense visible harmonics of laser light produced in CO2 laser plasmas. Much of Dr. Forslund's most original work has focused on simulation codes for applications in plasma physics. For example, he developed and has maintained the international laser fusion plasma simulation code WAVE on a multitude of operating systems and machine architectures. Further, he codeveloped the first implicit electromagnetic plasma simulation code, VENUS, with which he discovered the surface magnetic fields that have since explained many of the interactions between intense CO2 light with plasmas. Currently, he is investigating advances in human-computer interfaces and has recently demonstrated successful coupling of a scientific workstation and a supercomputer via the Network extensible Window SystemÔ.
Dr. Forslund holds an M.A. (1967) and a Ph.D. (1969) from Princeton University. During his tenure at Los Alamos, which began in 1969, he has served as a Staff Member at the Experimental Physics Division, the Associate Group Leader of the Laser Division, and the Alternate Group Leader of the Applied Theoretical Physics Division. In 1981, he was named a Laboratory Fellow. He has published widely on topics relating to plasma physics and plasma simulations and referees numerous professional journals, including the Journal of Geophysical Research, the Journal of Applied Physics, and the Journal of Computational Physics.
There are two important but disparate elements of the user interface that we will discuss in this presentation. The first is the object-oriented paradigm, which provides a useful framework for writing parallel applications, and the second is the use of visualization tools, which can provide an intuitive interface to complex applications.
Object-Oriented Parallel Programming
Object-oriented programming has become fairly common and popular and has been used in numerous computer-science projects. However, it has not yet been utilized to any degree in large-scale scientific computing. Nevertheless, we believe it is well suited to scientific computing, which frequently deals with well-defined, loosely interacting, physical objects. In particular, the paradigm is particularly useful in distributed, parallel computing because the objects help to encapsulate and clearly define the movement of data. The message interface maps well to the distributed memory model of parallel computing by constraining the mapping of data into memory. The additional features of inheritance and data abstraction also promise to reduce significantly the cost of software maintenance for large-scale scientific programs.
An important tool in defining the objects to be used in scientific computing comes from a mapping of the physical model into the computational environment. If the objects are made to correspond to physical elements in the underlying model, the paradigm fits very well. Since all of the information an object needs is stored internally, an object can also provide a good representation for a thread of control. This can greatly assist in the development of parallel applications in a way that is
independent of the hard ware on which it is being run. One can logically define the necessary number of threads (or processes) required for a given problem without worrying about the number of physical processors. This is analogous to not worrying about the size of vector registers in a supercomputer or the actual number of processors in a Thinking Machines Corporation Connection Machine.
A number of these ideas have been implemented in a distributed particle simulation code, which is reported on in the 1990 USENEX C++ conference proceedings (Forslund et al. 1990). The distributed environment for this code is the ISIS programming environment developed at Cornell University by Ken Birman (1990) and his colleagues.
Data Parallel Programming
Another area in which object-oriented programming has had success is data parallel programming. Rob Collins (personal communication), from the University of California, Los Angeles, has built an efficient C++ library for the Thinking Machines Corporation CM-2, called CM++. It gives full access to the C/Paris functionality without any loss in performance. This allows writing in a higher-level abstraction without sacrificing speed. Collins and Steve Pope of the Advanced Computing Laboratory at Los Alamos National Laboratory (personal communication) have been working on porting this library to a more general environment, called DPAR. Thus, the data-parallel paradigm has been abstracted and now runs on a workstation with good optimization. We are optimistic that this library could be made to work on a Cray Research, Inc., supercomputer, with comparable efficiency to that provided on the CM-2.
We are also trying to combine this data-parallel programming paradigm with the distributed environment we mentioned before, following some of the ideas of Guy Steele. In a paper entitled "Making Asynchronous Parallelism Safe for the World," Steele (1990) describes a programming style that tries to unify SIMD and MIMD computing. The essence of the proposal is to allow asynchronous threads to have only a restricted shared-memory access providing only commutative operations. This removes the dependence of the order of sibling parallel threads.
Threads then communicate (synchronize) only on their death. Complex operations are provided by a hierarchy of communications. This is the style we have used in our distributed particle code mentioned earlier. In one sense, this style is the simplest extension of the SIMD programming model.
As one solves large-scale problems on massively parallel machines, the data generated become very difficult to handle and to analyze. In order for the scientist to comprehend the large volume of data, the resulting complex data sets need to be explored interactively with intuitive tools that yield realistic displays of the information. The form of display usually involves polygons and lines, image processing, and volume rendering. The desired interface is a simple, flexible, visual programming environment for which one does not have to spend hours writing code. This might involve a dynamic linking environment much like that provided by the Advanced Visualization System (AVS) from Stardent Computer or apE from the Ohio State University Supercomputer Center in Columbus.
The output need not always be precisely of physical variables but should match what we expect from our physical intuition and our visual senses. It also should not be just a collection of pretty artwork but should have physical meaning to the researcher. In this sense, we don't try to precisely match a physical system but rather try to abstract the physical system in some cases.
To handle the enormous computational requirements involved in visualization, we must also be able to do distributed processing of the data and the graphics. Besides being useful in the interpretation of significant physical and computational results, this visualization environment should be usable both in algorithmic development and debugging of the code that generates the data. The viewing should be available in both "real time" and in a postprocessing fashion, depending on the requirements and network bandwidth. To optimize the traversal of complex data sets, advanced database techniques such as object-oriented databases need to be used.
As mentioned above, there are two graphical environments available today (and possibly others) that attempt to provide the sort of capability described above. They are AVS and apE. The idea is to provide small, strongly typed, modular building blocks out of which one builds the graphical application. These are illustrated in Figure 1, which displays a schematic of the user's workspace with AVS. The data flows through the graphical "network" from the input side all the way through the graphical display. In AVS, there are four basic types of components out of which one builds the application: data that is input, filters that modify the data, mappers that change the data from one format to another, and renderers that display them on the screen. Figure 2 illustrates AVS's ability to interactively analyze data.
The limit to the network complexity is only the memory and display limits of the workstation. However, this limitation can frequently be a major problem, as the size of the data set produced on current supercomputers can far exceed the capabilities of this software, even on the most powerful graphics workstations.
Because this data-flow style is, in fact, object-oriented, this model can be readily distributed or parallelized, with each module being a thread or distributed process. By placing the nodes on different machines or processors, this data-flow model can, at least in principle, be distributed or parallelized. In fact, the apE environment provides for this kind of functionality. For high performance in a graphics environment, these nodes need to be connected with a very-high-speed (e.g., gigabit/second) network if they are not running out of shared
memory on the same machine. The next generation of graphics environments of this type will hopefully operate in this manner.
A number of real physical applications are using this graphics environment, including problems that run on a CRAY Y-MP and the CM-2. For example, at the Institute of Geophysics and Planetary Physics, Los Alamos National Laboratory, a three-dimensional climate model has been run on the Cray (see Figure 3). A layer of the resulting temperature data has been taken and mapped onto a globe and displayed in an animated manner as a function of time. Using AVS, one can rotate the spherical globe while the data is being displayed, allowing one to investigate the polar regions, for example, in more detail. This is one simple example of how the data can be explored in a manner that is hard to anticipate ahead of time.
Figures 4 and 5 further illustrate the capability of high-performance graphics environments as applied to physical processes. Realistic displays like these, which can be explored interactively, are powerful tools for understanding complex data sets.
K. Birman, ISIS Reference Manual, Version 2.0 , ISIS Distributed Systems, Inc., Ithaca, New York (1990).
D. Forslund, C. Wingate, P. Ford, J. Junkins, J. Jackson, S. Pope, "Experience in Writing a Distributed Particle Simulation Code in C++," in Proceedings of the USENEX C++ Conference , USENEX Association, Berkeley, California, pp. 177-190 (1990).
G. Steele, Jr., "Making Asynchronous Parallelism Safe for the World," in The Conference Record of the Seventeenth Annual ACM Symposium on the Principles of Programming Languages, San Francisco, California, 17-19 January 1990 , Association for Computing Machinery, New York, pp. 218-231 (1990).