Title: An Analysis of Usage Locality for Data-Centric Web Services (NYU-CS-TR866) Author: Congchun He, Vijay Karamcheti Abstract:
The growing popularity of XML Web Services is resulting in a significant increase in the proportion of Internet traffic that involves requests to and responses from Web Services. Unfortunately, web service responses, because they are generated dynamically, are considered ``uncacheable" by traditional caching infrastructures. One way of remedying this situation is by developing alternative caching infrastructures, which improve performance using on-demand service replication, data offloading, and request redirection. These infrastructures benefit from two characteristics of web service traffic --- (1) the open nature of the underlying protocols, SOAP, WSDL, UDDI, which results in service requests and responses adhering to a well-formatted, widely known structure; and (2) the observation that for a large number of currently deployed data-centric services, requests can be interpreted as structured accesses against a physical or virtual database --- but require that there be sufficient locality in service usage to offset replication and redirection costs. This paper investigates whether such locality does in fact exist in current web service workloads. We examine access logs from two large data-centric web service sites, SkyServer and TerraServer, to characterize workload locality across several dimensions: data space, network regions, and different time epochs. Our results show that both workloads exhibit a high degree of spatial and network locality: 10\% of the client IP addresses in the SkyServer trace contribute to about 99.95\% of the requests, and 99.94\% of the requests in the TerraServer trace are directed towards regions that represent less than 10\% of the overall data space accessible through the service. Our results point to the substantial opportunity for improving Web Services scalability by on-demand service replication.