previous sub-section
Supercomputer Systems-Software Challenges
next sub-section

High-Speed Networks

High-speed networks (gigabit-per-second and higher bandwidth) cause fundamental changes in software at both the application and systems levels. The good news is that these networks can absorb data at supercomputer rates, but this moves the problem of coping with the high data rate to the recipient. To illustrate the scope of this challenge, consider a Cray Research, Inc., machine with a four-nanosecond cycle time. At one gigabit per second, this Cray can handle the network in software because it can execute 16 instructions per 64-bit word transmitted or received. This example illustrates two problem areas. The first is that a Cray is a rather expensive network controller; productive use of networks requires that more cost-effective interface hardware be employed. The second problem is that one gigabit per second is slow for high-speed networks; at least another order of magnitude in bandwidth will become available in the near future, leaving the Cray with less than two instructions per word.

Existing local area networking practice does not extend to high-speed networks because local area networks (LANs) are fundamentally different from their high-speed counterparts. At the hardware level, high-speed networks are based on point-to-point links with active switching hardware rather than the common media access often used in LANs (e.g., Ethernet). This is motivated both by the needs of the telecommunications industry (which is at the forefront of development of these networks) and the fact that LAN media access techniques do not scale to the gigabit-per-second range. On a 10-megabit-per-second Ethernet, a bit is approximately 30 meters long (about 100 feet); since this is the same order of magnitude as the physical size of a typical LAN, there can only be a few bits in flight at any time. Thus, if the entire network is idled by a low-level media-management event (e.g., collision detection), only a few bits are lost. At a gigabit per second, a bit is 30 centimeters long (about one foot), so the number of bits lost to a corresponding media-management event on the same-size network is a few hundred; this can be a significant


150

source of lost bandwidth and is avoided in high-speed network protocols. Using point-to-point links can reduce these management events to the individual link level (where they are less costly) at the cost of active switching and routing hardware.

The bandwidth of high-speed networks also raises issues in the areas of protocols and hardware interface design. The computational overhead of existing protocols is much more costly in high-speed networks because the bandwidth losses for a given amount of computation are orders of magnitude larger. In addition, the reduced likelihood of dropped packets may obviate protocol logic that recovers from such events. Bandwidth-related issues also occur in the design of hardware interfaces. The bandwidth from the network has to go somewhere; local buffering in the interface is a minimum requirement. In addition, the high bandwidth available from these networks has motivated a number of researchers to consider memory-mapped interface architectures in place of the traditional communication orientation. At the speeds of these networks, the overhead of transmitting a page of memory is relatively small, making this approach feasible.

The importance of network management is increased by high-speed networks because they complement rather than replace existing, slower networks. Ethernet is still very useful, and the availability of more expensive, higher-bandwidth networks will not make it obsolete. Supercomputing facilities are likely to have overlapping Ethernet, fiber-distributed data interface, and high-speed networks connected to many machines. Techniques for managing such heterogeneous collections of networks and subdividing traffic appropriately (e.g., controlling traffic via Ethernet, transferring data via something faster) are extremely important. Managing a single network is challenging enough with existing technology; new technology is needed for multinetwork environments.


previous sub-section
Supercomputer Systems-Software Challenges
next sub-section