The goal of our work is to develop DataSlicer, a hosting platform for data-centric network services, that addresses the challenge of improving service scalability and performance over a wide-area network. Traditional web caching infrastructures fail to benefit accesses to such services due to the fact that service responses are usually generated dynamically and hence considered ``uncacheable".
We hope to achieve an alternative caching infrastructure by: i) dynamically detecting service usage locality across several dimensions: dataspace, network regions and multiple timescales; ii) creating replicas of portions of an origin database at appropriate (a few) network intermediaries based on the detected locality information; and iii) applying actions such as request redirection and admission control to reduce client-perceived response time.
Our infrastructure assumes that service providers would provide the required service-specific information, including i) relations of a multi-attribute dataspace accessed by service requests, ii) a transformer function that converts parameters specified in a request into the corresponding region in the dataspace, and iii) the locations in the network that can serve as service replicas.
The architecture consists of a number of service-neutral ``router" nodes that are distributed across the network and interact with one or more service replicas. For each particular service, the router nodes are organized into a service-oriented overlay that attempts to minimize latency between each pair of nodes.
The main functionality of such router nodes include: i) relaying requests and responses between clients and the origin service; ii) keeping track of how portions of the dataspace are being accessed and maintaining metrics that summarize the performance of requests that access different portions; and iii) applying a distributed algorithm to determine an efficient service replica placement strategy to achieve and maintain some quality of service metrics.
To achieve such functionality, the router needs to i)
associate a service request with the underlying logical dataspace for that
service, ii) translate
parameters of a request into a region in this dataspace, iii) efficiently maintain usage statistics for that region, and iv) forward a request either to an upstream router or a nearby replica. Finally, to host multiple services in our infrastructure, additional resource sharing policies are needed to accommodate competition among different services.