A Semi-Supervised Clustering Method For P2P Traffic Classification

doi:10.4304/jnw.6.3.424-431

Journal of Networks, Vol 6, No 3 (2011), 424-431, Mar 2011

doi:10.4304/jnw.6.3.424-431

A Semi-Supervised Clustering Method For P2P Traffic Classification

Bin Liu

Abstract

In the last years, the use of P2P applications has increased significantly and currently they represent a significant portion of the Internet traffic. In consequence of this growth, P2P traffic identification and classification are becoming increasingly important for network administrators and designers. However, this classification was not simple. Nowadays, P2P applications explicitly tried to camouflage the original traffic in an attempt to go undetected. This paper present a methodology and selection of three P2P traffic metrics and applies semi-supervised clustering to identify P2P applications. Three P2P traffic metrics: IP Address Discreteness, Success Rate of Connections and Bidirectional Connections rate had been proposed and used in this paper. The semi-supervised classification method for P2P traffic consist two steps: Particle Swarm Optimization (PSO) clustering algorithm was employed to partition a training dataset that mixed few labeled samples with abundant unlabeled samples. Then, available labeled samples were used to map the clusters to the application classes. Experimental results using traffic from campus showed that high P2P traffic classification accuracy had been achieved with a few labeled samples.

Keywords

P2P;Particle Swarm Optimization;P2P Traffic Classification; Semi-Supervised Clustering

References

[1] S. Sen and J. Wang, “Analyzing peer-to-peer trafﬁc across large networks,” IEEE/ACM Transactions on Networking (TON), vol. 12, issue 2,pp. 219-232, April 2004.

[2] S. Saroiu, K. P. Gummadi, R. J. Dunn, S. D. Gribble, and H. M. Levy,An analysis of Internet content delivery systems,” in Proceedings of the 5th symposium on Operating systems design and implementation, 2002,pp. 315-327.

[3] S. Sen, O. Spatscheck, and D. Wang, “Accurate, scalable in-network identiﬁcation of P2P trafﬁc using application signatures,” in Proceedings of the 13th international conference on World Wide Web, New York, USA, 2004, pp. 512-521.

[4] T. Karagiannis, A. Broido, N. Brownlee, kc claffy, and M. Faloutsos,“Is P2P dying or just hiding?” in IEEE Globecom 2004 - Global Internet and Next Generation Networks, Dallas, TX, USA, 2004

[5] H. Bleul, E. P. Rathgeb, and S. Zilling, “Evaluation of an efﬁcient measurement concept for P2P multiprotocol trafﬁc analysis,” in Proceed-ings of the 32nd EUROMICRO Conference on Software Engineering and Advanced Applications, 2006, pp. 414-423.

[6] A. Spognardi, A. Lucarelli, and R. D. Pietro, “A methodology for P2P ﬁle-sharing trafﬁc detection,” in Proceedings of the Second International Workshop on Hot Topics in Peer-to-Peer Systems - Volume 00 HOT-P2P’05, 2005, pp. 52-61.

[7] T. Karagiannis, A. Broido, M. Faloutsos, and kc claffy, “Transport layer identiﬁcation of P2P trafﬁc,” in Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, Taormina, Sicily, Italy, 2004, pp.121-134

[8] F. Constantinou and P. Mavrommatis, “Identifying known and unknown peer-to-peer trafﬁc,” in Proceedings of the Fifth IEEE International Symposium on Network Computing and Applications, 2006, pp. 93-102.
doi:10.1109/NCA.2006.34

[9] Thomas Karagiannis, Konstantina Papagiannaki, Michalis Faloutsos, BLINC: Multilevel Traffic Classification in the Dark, ACM SIGCOMM. 35 (4), pp. 229-240, 2005

[10] A. W. Moore J.Hall, C.Kreibich, E. Harris, and I. Pratt. ̌Architecture of a Network Monitor.̍ In Passive & Active Measurement Workshop 2003 (PAM2003), La Jolla, CA, April 2003.

[11] A. W. Moore and D. Papagiannaki. “Toward the Accurate Identification of Network Applications.” In Proceedings of the Sixth Passive and Active Measurement Workshop (PAM 2005), March 2005.

[12] A. W. Moore and D. Zuev. “Internet Traffic Classification Using Bayesian Analysis Techniques.” SIGMETRICS Perform. Eval. Rev., vol. 33, pp. 50-60, 2005.
doi:10.1145/1071690.1064220

[13] A. W. Moore and D. Zuev. “Discriminators for use in flow-based classification”. Technical report, Intel Research, Cambridge, 2005

[14] L. Zhu, R. Yuan, and X. Guan, “Accurate Classification of the Internet Traffic Based on the SVM Method.” ICC 2007, June 24-28, Glasgow, 2007.

[15] F Hernandez, A B Nobel, F D Smith, and K Jeffay. “Statistical Clustering of Internet Communication Patterns.” In Proceedings of Symposium on the Interface of Computing Science and Statistics, 2003.

[16] A. McGregor, M. Hall, P. Lorier, and J. Brunskill. “Flow Clustering Using Machine Learning Techniques.” In PAM, 2004.

[17] S. Zander, T. Nguyen, and G. Armitage. “Automated traffic classification and application identification using machine learning.” In Passive & Active Meas

Full Text: PDF

Journal of Networks (JNW, ISSN 1796-2056)

Username
Password
Remember me