Abstract

Requests for dynamic and personalized content increasingly dominate current-day Internet traffic, driven both by a growth in dynamic web services and a ``trickle-down'' effect stemming from the effectiveness of caches and content-distribution networks at serving static content. To efficiently serve this trend, several server-side and cache-side techniques have recently been proposed. Although such techniques, which exploit different forms of reuse at the sub-document level, appear promising, a significant impediment to their widespread deployment is (1) the absence of good models describing characteristics of dynamic web content, and (2) the lack of effective synthetic content generators, which reduce the effort involved in verifying the effectiveness of a proposed solution.

This paper addresses both of these shortcomings. Its primary contribution is a set of models that capture the characteristics of dynamic content both in terms of independent parameters such as the distributions of object sizes and their freshness times, as well as derived parameters such as content reusability across time and linked documents. These models are derived from an analysis of the content from six representative news and e-commerce sites, using both size-based and level-based splitting techniques to infer document objects. A secondary contribution is a Tomcat-based dynamic content emulator, which uses these models to generate ESI-based dynamic content and serve requests for whole document and separate objects. To validate both the models and the design of the content emulator, we compare the bandwidth requirements seen by an idealized cache simulator that is driven by both the real trace and emulated content. Our simulation results verify that the output of the content emulator effectively and efficiently models real content.