home search a-z help
University of Cambridge Computer Laboratory
Computer Laboratory > Research > Systems Research Group > NetOS > BRASIL > Data


Characterizing Network-based Applications

This guy is a poison tree frog, he is from Brasil
- Obvious really!


Data by Paper

  • Data used in "Efficient Application Identification and the Temporal and Spatial Stability of Classification Schema", Computer Network, 2008 (PDF). If you would like to use the data, please cite this paper.


    The data is in Weka .arff format. For each flow in the data there are 12 feature vectors and a hand-verified class-label (ground truth). The features are exactly as described in the paper, collected only from the first five packets (note: NOT first five data packets) of each flow. The feature collection algorithm has also been implemented as a module in GTVS. Additionally, the original dataset of Day3 with 249 features is available here.

  • For earlier data used in Andrew W. Moore and Denis Zuev, "Internet Traffic Classification Using Bayesian Analysis Techniques" in the Proceedings of the ACM SIGMETRICS June 2005, Banff, Canada (PDF) and

    Denis Zuev and Andrew W. Moore, "Traffic Classification using a Statistical Approach", in the Proceedings of Sixth Passive and Active Measurement Workshop (PAM 2005), March/April 2005, Boston, MA (PDF) and

    Andrew Moore, Denis Zuev and Michael Crogan, "Discriminators for use in flow-based classification", Technical Report RR-05-13, Department of Computer Science, Queen Mary, University of London, August, 2005 (PDF). Please refer to the 2005 SIGMETRICS paper webpage.