The goal of our work is to develop DataSlicer, a hosting platform for data-centric network services, that addresses the challenge of improving service scalability and performance over a wide-area network. Traditional web caching infrastructures fail to benefit accesses to such services due to the fact that service responses are usually generated dynamically and hence considered ``uncacheable".

We hope to achieve an alternative caching infrastructure by: i) dynamically detecting service usage locality across several dimensions: dataspace, network regions and multiple timescales; ii) creating replicas of portions of an origin database at appropriate (a few) network intermediaries based on the detected locality information; and iii) applying actions such as request redirection and admission control to reduce client-perceived response time.

Our infrastructure assumes that service providers would provide the required service-specific information, including i) relations of a multi-attribute dataspace accessed by service requests, ii) a transformer function that converts parameters specified in a request into the corresponding region in the dataspace, and iii) the locations in the network that can serve as service replicas.


System Overview


DataSlicer Intermediary Architecture

The architecture consists of a number of service-neutral ``router" nodes that are distributed across the network and interact with one or more service replicas. For each particular service, the router nodes are organized into a service-oriented overlay that attempts to minimize latency between each pair of nodes.

The main functionality of such router nodes include: i) relaying requests and responses between clients and the origin service; ii) keeping track of how portions of the dataspace are being accessed and maintaining metrics that summarize the performance of requests that access different portions; and iii) applying a distributed algorithm to determine an efficient service replica placement strategy to achieve and maintain some quality of service metrics.

To achieve such functionality, the router needs to i) associate a service request with the underlying logical dataspace for that service, ii) translate
parameters of a request into a region in this dataspace, iii) efficiently maintain usage statistics for that region, and iv) forward a request either to an upstream router or a nearby replica. Finally, to host multiple services in our infrastructure, additional resource sharing policies are needed to accommodate competition among different services.



The DataSlicer project is supported by National Science Foundation Grant CCR-0312956 and a gift from Microsoft University Relations.