Research Summary

Arash Baratloo: Home Page ] [ C.V. ] [ Research ] [ Publications ] [ Advisor ] [ Contact Information ] [ Misc ]

goals, ideas and direction

Research Goals, Ideas and Direction

The advantages of utilizing Networks Of Workstations (NOW) for compute intensive applications are well understood: workstations are relatively cheap, widely available and mostly underutilized. For example, here at NYU there are over 200 workstations that we can access. I have seen programs that are executed on a single machine that run all night. If we utilize 100 (or so) of these workstations, we can run the same program in just a few minutes. The utilization of NOW for parallel computations is not an original idea--much research has been done on this topic, and many software tools have been built. Given there is only a small number of programs that run on NOW, it is fair to ask "if the hardware is widely available (which it is), and if there are tools for building distributed computations (which there are), then why aren't most programs able to run on NOW?" Simply, the answer is that most of us find distributed-program development too complicated. Although existing software tools make distributed programming possible, they do not make it easy to design and to build. And in general, the cost of distributed program development outweighs the hardware gains and this explains the small number of distributed programs.

My research has focused on making distributed computations easy to design and to build. The complexity of distributed programming is due to the tight coupling of programs and their execution platform: NOW programs are supposed to execute on available resources, ie, workstations as they become available, but this availability is not known at programming time. Thus, with traditional tools programmers have to explicitly deal with the unpredictability of the execution environment. I have worked on a series research projects to overcome this problem, both for workstation clusters and for the World Wide Web.

I have been involved with the MILAN meta-project from the start. Among others, this encompasses projects such as Calypso, Charlotte, and KnittingFactory. I have worked on Distributed Shared Memory (DSM) systems, transparent fault-tolerance and load-balancing, resource allocation, and programming environments for the World Wide Web.

Calypso, 1994-

Calypso is a software environment for high-performance computing on workstation clusters. Calypso allows the programmers to view a NOW as a single virtual metacomputing resource. I have been involved in the design and led the development of the software system. Calypso is unique in that it separates the programming model from the execution environment: programs are written for a virtual shared memory multiprocessor, but are executed on a network of dynamically changing workstations. Calypso gives the programmer the illusion of a reliable machine and realizes this reliable machine on a network of unreliable computers. This way, the complexities of distributed programming are hidden from programmers because
  • data movement and data coherence are transparent,
  • load balancing is performed by the runtime system,
  • machine and network failures are completely masked,
  • free machines are integrated into a running computation, and
  • crashed and slow machines are removed from a computation.

The software system has demonstrated impressive performance results on a network of workstations with fluctuating work-loads, network traffic, and failures. Calypso runs on Solaris, Linux, and Windows NT.

This project is rapidly expanding through NSF, DARPA and Intel fundings. For more information visit the official Calypso home page. This is a joint work with Partha Dasgupta and Zvi Kedem.
Calypso

Charlotte

Charlotte, 1996-

Charlotte leverages the fundamental ideas of Calypso and heterogeneity of Java to experiment with the WWW as a metacomputing platform. Charlotte provides (a variation of) distributed threads and distributed shared name-space in Java. Charlotte is unique in that it requires no administration: anybody with a Java capable browser can contribute towards any Charlotte application on the Web. I have been involved in the design and the development of the software system.

You can read about Charlotte and other related research efforts in the May'97 issue of Scientific American, in an article titled Cyber View .

For more information visit the official Charlotte home page. This is a joint work with Mehmet Karaul, Zvi Kedem, and Peter Wyckoff.

Adaptive Resource Allocation, 1997-

Calypso programs are adaptive in that they can utilize workstations as they become idle, and release them as they are claimed by users. However, existing resource management systems are either incapable of handling adaptive programs, or they are tightly integrated with a single programming environment. I worked on mechanisms for resource management systems to support adaptive programs while being able to handle multiple programming environments. This enables execution of adaptive (and unmodified) PVM, MPI and Calypso applications side-by-side on networks of time-shared machines.

This is a joint work with Ayal Itzkovitz , Zvi Kedem, and Yuanyuan Zhao .
resource manager

KnittingFactory

KnittingFactory, 1997-

Working on Charlotte made me realize the restrictions on which applets must work within, namely, the Java's host-of-origin policy. Under Java's host-of-origin policy, an applet can establish a network connection only to the host it came (downloaded) from. On one hand, this policy provides a certain level security to end users, but on the other hand, it complicates the development of collaborative Web based applications. Typical solutions have been to either rely on untrusted native code or to use a single forwarding agent. KnittingFactory is an infrastructure to alleviate the difficulties associated with building collaborative Web. It provides a set of integrated services that include
  • a registry and a lookup service so that applets can find other members of a collaborative session,
  • an embedded light-weight Java class server to remove the need of on external HTTP server program, and
  • a direct applet-to-applet communication.
It can be, and has been used as a flexible infrastructure for building collaborative Web application such as shared white boards, editors, and calendars, as well as parallel computing systems.

For more information visit the official KnittingFactory home page. This is a joint work with Mehmet Karaul, Holger Karl, and Zvi Kedem.


Arash Baratloo
Last modified: Thu Nov 18 11:55:02 EST 1999