Construction of Lexicons to Perk Up Re-Clustering
DOI:
https://doi.org/10.51983/ajcst-2018.7.3.1891Keywords:
Lexicon, Clustering, ATSCA, Keygraph, KBLCAAbstract
The existing semantic methods cluster the documents based on unabridged or abridged term comparisons. After clustering, these terms are not preserved, costing the cluster operation to be repeated in its entirety upon the arrival of new documents. Hence the semantic clustering methods can be considered as “on the go” methods. Re-clustering becomes unavoidable in all circumstances both in the Iterative and Incremental Clustering Methods. It would be more appropriate to build and evolve a lexicon with the derived keywords of the documents and to refer them in further cluster operations. The rationale is to deny re-clustering upon new documents and refer the Lexicon to formulate clusters until the quality of clusters is intact, and when it breaks above the threshold, the cluster operation can be repeated. Since re-clustering is delayed until a breakeven point, the process of re-clustering becomes faster. This process may incur additional runtime complexity, but would extremely simplify and speed up the process of re-clustering. This paper discusses about the construction of lexicons and its applications in clustering. The Keyword based Lexicon Construction Algorithm (KBLCA) is demonstrated to build lexicons and the breakeven point for re-clustering is proposed and described. The theory of denying re-clustering is briefed, along with experimental results.
References
H. Sayyadi and L. Raschid, "A Graph Analytical Approach for Topic Detection," ACM Transactions on Internet Technology (TOIT), vol. 13, no. 2, 2013.
S. M. Lad, "Keyword Extraction from Conversation Text Document and Recommending Document using Fuzzy Logic Based Weight Matrix Method," International Journal of Advanced Research in Computer Science, vol. 7, no. 4, pp. 34-38, August 2016.
H.-C. Chang and C.-C. Hsu, "Using Topic Keyword Clusters for Automatic Document Clustering," Proceedings of the Third International Conference on Information Technology and Applications, IEEE, 2005.
Y. Kim, M. Kim, A. Cattle, and J. Otmakhova, "Applying Graph-based Keyword Extraction to Document Retrieval," International Joint Conference on Natural language Processing, October 2013, pp. 864-868.
M. Habibi and A. Popescu-Belis, "Keyword Extraction and Clustering for Document Recommendation in Conversations," IEEE, vol. 23, no. 4, pp. 746-759, 2015.
M. Rezaei, N. Gali, and P. Franti, "CIRank: A Method for Keyword Extraction from web pages using Clustering and distribution of nouns," IEEE/ WIC /ACM International Conference on Web Intelligence and Intelligent Agent technology, vol. 1, pp. 79-84, 2015.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.