Data used in
"Efficient Application Identification and the Temporal and Spatial Stability of Classification Schema", Computer Network, 2008 (PDF). If you would like to use the data, please cite this paper.
Day1.TCP.arff.gz
Day2.TCP.arff.gz
Day3.TCP.arff.gz
SiteB.TCP.arff.gz
Day1.UDP.arff.gz
Day3.UDP.arff.gz
SiteB.UDP.arff.gz
The data is in Weka .arff format. For each flow in the data there are 12 feature vectors and a hand-verified class-label (ground truth). The features are exactly as
described in the paper, collected only from the first five packets (note: NOT first five data packets) of each flow. The feature collection algorithm has also been implemented as a module in GTVS. Additionally, the original dataset of Day3 with 249 features is available here.
For earlier data used in
Andrew W. Moore and Denis Zuev,
"Internet Traffic Classification Using Bayesian Analysis
Techniques"
in the Proceedings of the ACM SIGMETRICS
June 2005, Banff, Canada
(PDF) and
Denis Zuev and Andrew W. Moore,
"Traffic Classification using a Statistical Approach",
in the Proceedings of Sixth Passive and Active
Measurement
Workshop (PAM 2005), March/April 2005, Boston, MA
(PDF)
and
Andrew Moore, Denis Zuev and Michael Crogan, "Discriminators for use in flow-based classification", Technical Report RR-05-13, Department of Computer Science, Queen Mary, University of London, August, 2005 (PDF).
Please refer to the 2005 SIGMETRICS paper webpage.