Clustering Methodologies for Web Server Requests
Current Web Servers are built using multiple different components that interact in a dynamic and complex way. Tasks such as performance analysis and problem solving are challenging mostly due to lack of a high level model of the observed system. Although low level detailed information is available, successfully addressing these tasks usually involves a human "expert" who manually examines the data and based on his experience tries to tackle them. Facilitating this process by building tools that bridge the gap between low level data and a higher level model is both an immediate need and a research challenge.
In this talk we discuss the issues addressed for building such tools that cluster Web Server requests gathered using the Magpie tool-chain. We demonstrate that basic machine learning principles can be used to guide the process. In particular, we present a two-step methodology that gradually clusters requests. During the first step requests are separated according to the s/w components used to serve them. During the second step a further clustering is performed according to the total resources, such as CPU and network, consumed during the execution of similar requests. The methodology (1) successfully captures the dynamics of different requests while at the same time (2) reveals detailed information of similar requests.