Selection of Features on Mining Techniques for Classification

Authors

  • Gurrampally Kumar Research Scholar, Department of Computer Science and Engineering, Annamalai University, Tamil Nadu, India
  • S. Mohan Assistant Professor, Department of Computer Science and Engineering, Annamalai University, Tamil Nadu, India
  • G. Prabakaran Assistant Professor, Department of Computer Science and Engineering, Annamalai University, Tamil Nadu, India

DOI:

https://doi.org/10.51983/ajcst-2018.7.S1.1793

Keywords:

Feature Selection, Data Mining, Classification, Correlation Coefficient

Abstract

Feature selection has been developed by several mining techniques for classification. Some existing approaches couldn’t remove the irrelevant data from dataset for class. Thus it needs the selection of appropriate features that emphasize its role in classification. For this it consider the statistical method like correlation coefficient to identify the features from feature set whose data are very important for existing classes. The several methods such as Gaussian process, linear regression and Euclidean distance have taken into consideration for clarity of classification. The experimental results reveal that the proposed method identifies the exact relevant features for several classes.

References

E. R. Dougherty, "Small sample issue for Microarray-based classification," Comparative Functional Genomics, vol. 2, pp. 28–34, 2001.

C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," in Proc. Compute. Syst. Bioinformatics Conf., pp. 523–529, 2003.

T. R. Golub et al., "Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring," Science, vol. 286, pp. 531–537, Oct. 1999.

N. R. Pal et al., "Discovering biomarkers from gene expression data for predicting cancer subgroups using neural networks and relational fuzzy clustering," BMC Bioinformatics, vol. 8, pp.1-18, 2007.

N. R. Pal, "A fuzzy rule based approach to identify biomarkers for diagnostic classification of cancers," in Proc. IEEE Int. Fuzzy Syst. Conf., pp. 1–6, 2007.

Y.-S. Tsai et al., "Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems," BMC Bioinformatics, vol. 9, pp.1-33, 2008.

Y.-S. Tsai et al., "Identification of single and multiple-class specific signature genes from gene expression profiles by group marker index," PLoS ONE, vol. 6, pp. e24259, 2011.

N.K. Kamila, L.D. Jena, and H.K. Bhuyan, "Pareto-based multi-objective optimization for classification in data mining," Cluster Computing (Springer), vol. 19, no. 4, pp. 1723–1745, Dec 2016.

Jun Wang et al., "Feature Selection by Maximizing Independent Classification Information," IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 4, pp. 828–841, April 2017.

H. K. Bhuyan and N.K. Kamila, "Privacy preserving Sub-feature Selection based on fuzzy probabilities," Cluster Computing (Springer), vol. 17, no. 4, pp. 1383-1399, 2014.

H. K. Bhuyan and N.K. Kamila, "Privacy preserving sub-feature selection in distributed data mining," Applied soft computing (Elsevier), vol. 36, pp. 552-569, 2015.

Z. Li et al., "Clustering-guided sparse structural learning for unsupervised feature selection," IEEE Trans. Knowl. Data Eng., vol. 26, no. 9, pp. 2138-2150, Sep. 2013.

D. Koller and M. Sahami, "Toward optimal feature selection," in Proc. 13th Int. Conf. Mach. Learn., pp. 284-292, 1996.

M. Banerjee and N. R. Pal, "Feature selection with SVD entropy: Some modification and extension," Inf. Sci., vol. 264, pp. 118-134, 2014.

P. Mitra, C. A. Murthy, and S. K. Pal, "Unsupervised feature selection using feature similarity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 301-312, Mar. 2002.

N. Søndberg-madsen, C. Thomsen, and J. M. Pea, "Unsupervised feature subset selection," in Proc. Workshop Probabilistic Graph. Models Classification, pp. 71-82, 2003.

J. Tang and H. Liu, "Unsupervised feature selection for linked social media data," in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 904-912, 2012.

X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," in Proc. Adv. Neural Inf. Process. Syst., pp. 507–514, 2005.

D. Cai, C. Zhang, and X. He, "Unsupervised feature selection for multicluster data," in Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 333-342, 2010.

Danilo Costarelli, "Sigmoidal Functions Approximation and Applications," PhD Dissertation, Dipartimento di Matematica e Fisica Sezione di Matematica, Roma TRE Universita, Deglistudi, 2014.

Monami Banerjee and Nikhil R. Pal, "Unsupervised Feature Selection with Controlled Redundancy (UFeSCoR)," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, Dec 2015.

H. K. Bhuyan and C. V. Madhusudan Reddy, "Sub-feature selection for novel classification," IEEE Explore, April, 2018.

Downloads

Published

09-11-2018

How to Cite

Kumar, G., Mohan, S., & Prabakaran, G. (2018). Selection of Features on Mining Techniques for Classification. Asian Journal of Computer Science and Technology, 7(S1), 108–111. https://doi.org/10.51983/ajcst-2018.7.S1.1793