Text Clustering Using a Suffix Tree Similarity Measure
Abstract
Keywords
References
[1] Meadow, C. T., Boyce, B. R., Kraft, D. H. (2000), Text Information Retrieval Systems (second edition). Academic Press.
[2] Ko, Y., Park, J., Seo, J. (2004), ‘Improving Text Categorization Using the Importance of Sentences’, Information Processing & Management, vol. 40, pp. 65-79.
http://dx.doi.org/10.1016/S0306-4573(02)00056-0
[3] Theobald, M., Siddharth, J., Paepcke, A.: SpotSigs. (2008), ‘Robust and Efficient Near Duplicate Detection in Large Web Collections’, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press, Singapore, pp.563-570.
[4] Wang, D., Li, T., Zhu, S. (2008), ‘Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization’, Proceeding of the 31st Annual International ACM SIGIR Conference, ACM Press, Singapore, pp. 307-314.
[5] Maguitman, A., Menczer, F., Roinestad, H., Vespignani, A. (2005) ‘Algorithmic Detection of Semantic Similarity’. Proceeding of the 14th International World Wide Web Conference, ACM Press, Chiba, Japan, pp.107-116.
http://dx.doi.org/10.1145/1060745.1060765
[6] Salton, G., Wong, A., Yang, C. S. (1975), ‘A vector space model for automatic indexing’, Communications of the ACM, vol. 18, pp. 613-620.
http://dx.doi.org/10.1145/361219.361220
[7] Deerwester, S., Dumais, S., Furnas, T. (1990), ‘Indexing by latent semantic analysis’, Journal of American Society of Information Science, Vol. 41, 391-407.
http://dx.doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
[8] Zamir, O., Etzioni, O., Madani, O., Karp, R. M. (1997), ‘Fast and intuitive clustering of web documents’, Proceeding of the 3rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM Press, Newport Beach, California, USA, pp. 287-290.
[9] Zamir, O., Etzioni, O. (1998), ‘Web text clustering: a feasibility demonstration’, Proceeding of the 28th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, Melbourne, Australia, pp. 46-54.
[10] Li, Y. J., Soon, M. C., John, D. H. (2008), ‘Tex text clustering based on frequent word meaning sequences’, Data & Knowledge Engineering, Vol. 64, pp. 381-404.
http://dx.doi.org/10.1016/j.datak.2007.08.001
[11] Shehata, S., Karray, F., Kamel, M. (2007), ‘A Concept-based Model for Enhancing Text Categorization’, Proceedings of the 13rd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM Press, San Jose, California, USA, pp.629-637,
[12] Chim, H., Deng, X. (2007), ‘A new suffix tree similarity measure for document clustering’ Proceeding of the 16th International Conference on World Wide Web (2007). ACM Press, Banff, Alberta, Canada, pp.121-130.
http://dx.doi.org/10.1145/1242572.1242590
[13] Edith, H., Rene, A.G., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F. (2006), ‘Document clustering based on maximal frequent sequences’, Proceedings of the FinTAL2006, LNAI, vol. 4139, pp. 257-267.
[14] Beil, F., Ester, M., Xu, X.W. (2002), ‘Frequent term-based text clustering’, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), pp. 436-442.
http://dx.doi.org/10.1145/775047.775110
[15] Reuters-21578 (1997), text categorization test collection, Available at: http://www.daviddlewis.com/resources/testcollections/reuters21578/, Assessed on 17 December 2010.
[16] BBC Dataset, (2010), Machine Learning group, Available at: http://mlg.ucd.ie, Assessed on 17 December 2010.
[17] LingPipe, (2010), Alias-i, Inc, Available at: http://www.alias-i.com, Assessed on 17 December 2010.
[18] Karypis, G., (2010), CLUTO–A Clustering Toolkit, Department of Computer Science, University of Minnesota, Available at: http://www.cs.umn.edu/~karypis/cluoto/, Assessed on 17 December 2010.
Full Text: PDF