Data Clustering for Autonomic Application Replication

Author: Jie Yang
Source: Masters thesis, Vrije Universiteit, August 2005.

Abstract

This thesis has been realized in the context of GlobeDB, a system for hosting Web applications that can automatically replicate application data and maintain distributed consistency. GlobeDB adopts partial replication to reduce the network latency and traffic, and adopts data clusters to reduce the overhead of fine-grained replication. However, GlobeDB only proposed a naive clustering algorithm, which was a bottleneck to the systems performance. This thesis discusses the issue of data clustering in GlobeDB. The main challenges include evaluating the quality of clusters, selecting a clustering algorithm, and deciding on a suitable number of clusters. We systematically study various clustering algorithms and proposed some new algorithms. Experiments prove that the new algorithms can efficiently improve the performance of GlobeDB. We also propose criteria to select the best clustering algorithm and parameters according to the situation. In addition, we found that reclustering periodically can improve performance compared with non-reclustering strategy, and the best reclustering period is based on the stability of application data's popularity.

Download

The thesis, in PDF (646,640 bytes).

Bibtex Entry

@MastersThesis{Yang2005,
  author = 	 {Jie Yang},
  title = 	 {Data Clustering for Autonomic Application Replication},
  school = 	 {Vrije Universiteit},
  address = 	 {Amsterdam, The Netherlands},
  year = 	 {2005},
  month = 	 aug
}

gpierre@cs.vu.nl