Performance Analysis of Dimensionality Reduction Techniques in the Context of Clustering

Authors

  • T. Sudha Professor, Department of Computer Science, Sri Padmavathi Mahila Visva Vidyalayam, Tirupati, Andhra Pradesh, India
  • P. Nagendra Kumar Assistant Professor, Department of Computer Science, Geethanjali Institute of Science and Technology, Sri Potti Sreeramulu, Andhra Pradesh, India

DOI:

https://doi.org/10.51983/ajcst-2019.8.S3.2084

Keywords:

Clustering, Dimensionality Reduction, t-distributed Stochastic Neighbour Embedding, Probabilistic Principal Component Analysis

Abstract

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.

References

J. Han and M. Kamber, "Data Mining: Concepts and Techniques," 2nd ed. Morgan Kaufmann Publishers, Elsevier.

Wikipedia. "Curse of Dimensionality." [Online]. Available: https://en.wikipedia.org/wiki/Curse_of_dimensionality

Wikipedia. "Dimensionality Reduction." [Online]. Available: https://en.wikipedia.org/wiki/Dimensionality_reduction

J. P. Cunningham and Z. Ghahramani, "Linear dimensionality Reduction: Survey, Insights and Generalizations," Journal of Machine Learning Research, pp. 2859-2900, 2015.

Wikipedia. "Nonlinear Dimensionality Reduction." [Online]. Available: https://en.wikipedia.org/wiki/Nonlinear-dimensionality-reduction.html

MathWorks. "t-SNE." [Online]. Available: www.mathworks.com/help/stats/t-sne.html

MathWorks. "PPCA." [Online]. Available: www.mathworks.com/help/stats/ppca.html

O. Saini and S. Sharma, "A Review on Dimensionality Reduction Techniques in Data Mining," Computer Engineering and Intelligent Systems, vol. 9, no. 1, pp. 7-14, 2018.

M. Song, H. Yang, S. H. Siadat, and M. Pechenizkiy, "A Comparative Study of Dimensionality Reduction Techniques to Enhance Trace Clustering Performances," Expert Systems with Applications, vol. 40, no. 9, pp. 3722-3734, July 2013.

V. Vinay, I. J. Cox, K. Kenwood, and N. Milic, "A Comparison of Dimensionality Reduction Techniques for Text Retrieval," in Proceedings of the Fourth International Conference on Machine Learning and Applications, IEEE, December 2005.

T. Sudha and P. N. Kumar, "Comparative Study of Dimensionality Reduction Techniques in the Context of Clustering," International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR), vol. 6, no. 1, pp. 19-28, February 2016.

T. Sudha and P. N. Kumar, "Achieving Privacy Preserving Clustering in Images using Multidimensional Scaling," International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR), vol. 6, no. 2, pp. 9-18, May 2016.

R. W. Sembiring, J. M. Zain, and A. Embong, "Dimension Reduction of Health Data Clustering," International Journal on New Computer Architectures and Their Applications, vol. 1, no. 3, pp. 1041-1050, 2011.

C. O. S. Sorzano, J. Vargas, and A. Pascual-Montano, "A Survey of Dimensionality Reduction Techniques," arXiv.org, March 2014.

H. Haripriya et al., "A Comparative Analysis of Self-organizing Maps on Weight Initializations using Different Strategies," Fifth International Conference on Advances in Computing and Communications, pp. 434-438, March 2016.

P. Mangiameli, S. Chen, and D. West, "A Comparison of SOM Neural Network and Hierarchical Clustering Methods," European Journal of Operational Research, vol. 93, no. 2, pp. 402-417, Sept. 1996.

A. Gupta and R. Bowden, "Evaluating Dimensionality Reduction Techniques for Visual Category Recognition using Renyi Entropy," 19th European Signal Processing Conference, pp. 913-917, September 2011.

F. S. Tsai, "Comparative Study of Dimensionality Reduction Techniques for Data Visualization," Journal of Artificial Intelligence, vol. 3, no. 3, pp. 119-134, 2010.

C. Bartenhagen et al., "Comparative Study of Unsupervised Dimension Reduction Techniques for the Visualization of Microarray Gene Expression Data," BMC Bioinformatics, November 2010.

A. Konstorum et al., "Comparative Analysis of Linear and Nonlinear Dimension Reduction Techniques on Mass Cytometry Data," bioRxiv, March 2018.

S. Huang, M. O. Ward, and E. A. Rundensteiner, "Exploration of Dimensionality Reduction for Text Visualization," NSF grant IIS-0119276.

K. Yildiz, Y. Camurcu, and B. Dogan, "Comparison of Dimension Reduction Techniques on High Dimensional Datasets," The International Arab Journal of Information Technology, vol. 15, no. 2, March 2018.

Downloads

Published

10-05-2019

How to Cite

Sudha, T., & Nagendra Kumar, P. (2019). Performance Analysis of Dimensionality Reduction Techniques in the Context of Clustering. Asian Journal of Computer Science and Technology, 8(S3), 66–71. https://doi.org/10.51983/ajcst-2019.8.S3.2084

Most read articles by the same author(s)