Abstract:
Improvements in microprocessor and networking performance have made
networks of workstations a very attractive platform for high-end
parallel and distributed computing. However, the effective
deployment of such environments requires addressing two problems not
associated with dedicated parallel machines: heterogeneous resource
capabilities and dynamic availability. Achieving good performance
requires that application components be able to migrate between
cluster resources and efficiently adapt to the underlying resource
capabilities. An important component of the required support is
maintaining network connectivity, which directly impacts on the
transparency of migration to the application and its performance
after migration. Unfortunately, existing approaches rely on either
extensive operating system modifications or new APIs to maintain
network connectivity, both of which limits their wider
applicability.
This paper presents the design, implementation, and performance of a
transparent network connectivity layer for dynamic cluster
environments. Our design uses the techniques of API interception
and virtualization to construct a transparent layer in user space;
use of the layer requires no modification either to the application
or the underlying operating system and messaging layers. Our layer
enables the migration of application components without breaking
network connections, and additionally permits adaptation to the
characteristics of the underlying networking substrate. Experiments
with supporting a persistent socket interface in two
environments---an Ethernet LAN on top of TCP/IP, and a Myrinet LAN
on top of Fast Messages---show that our approach incurs minimal
overheads and can effectively select the best substrate for
implementing application communication requirements.