Improved Weighted Page Ranking Algorithm Based on Principal Component Analysis and Map Reduce Frame work for Web Access
DOI:
https://doi.org/10.51983/ajcst-2019.8.2.2144Keywords:
Information Retrieval (IR), Search Engine, Hyperlinks, Elements, Page Ranking (PR), Principal Component Analysis (PCA), Map Reduce (MR) Framemwork, Weighed page Ranking (WPR)Abstract
In general the World Wide Web become the most useful information resource used for information retrievals and knowledge discoveries. But the Information on Web to be expand in size and density. The retrieval of the required information on the web is efficiently and effectively to be challenge one. For the tremendous growth of the web has created challenges for the search engine technology. Web mining is an area in which applies data mining techniques to deal the requirements. The following are the popular Web Mining algorithms, such as PageRanking (PR), Weighted PageRanking (WPR) and Hyperlink-Induced Topic Search (HITS), are quite commonly used algorithm to sort out and rank the search results. In among the page ranking algorithm uses web structure mining and web content mining to estimate the relevancy of a web site and not to deal the scalability problem and also visits of inlinks and outlinks of the pages. In recent days to access fast and efficient page ranking algorithm for webpage retrieval remains as a challenging. This paper proposed a new improved WPR algorithm which uses a Principal Component Analysis technique called (PWPR) based on mean value of page ranks. The proposed PWPR algorithm takes into account the importance of both the number of visits of inlinks and outlinks of the pages and distributes rank scores based on the popularity of the pages. The weight values of the pages is computed from the inlinks and outlinks with their mean values. But in PWPR method new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. To solve this problem is a MapReduce (MR) framework is promising approach to refreshing mining results for mining big data .The proposed MR algorithm reduces the time complexity of the PWPR algorithm by reducing the number of iterations to reach a convergence point.
References
L. Feng, "Extracting Structure of Web Site Based on Hyperlink Analysis in Wireless Communications, Networking and Mobile Computing," in WiCOM '08, 4th International Conference, 2008.
B. Christophe, V. Verdot, and V. Toubiana, "Searching the 'web of things in Semantic Computing" in ICSC Fifth IEEE International Conference, 2011, pp. 308-315.
T. Srivastava, P. Desikan, and V. Kumar, "Web mining–concepts, applications and research directions," in Foundations and advances in data mining, Studies in Fuzziness and soft computing, vol. 180, pp. 275-307, Sep 2005.
Q. Zhang and R.S. Segall, "Web mining: A survey of current research, techniques, and software," in International Journal of Information Technology and Decision Making, vol. 7, no. 04, pp. 683-720, 2008.
B. Singh and H.K. Singh, "Web data mining research: A survey in Computational Intelligence and Computing Research (ICCIC)," in IEEE International Conference, Dec 2010, pp. 1-10.
W. Xing and A. Ghorbani, "Weighted PageRank Algorithm," in proceedings of the 2nd Annual Conference on Communication Networks and Services Research, 2004, pp. 305-314.
A.M.Z. Bidoki and N. Yazdani, "DistanceRank: An intelligent ranking algorithm for web pages," in Information Processing and Management, vol. 44, no. 2, pp. 877-892, 2008.
T. Abou-Assaleh et al., "A Link –Based Ranking Scheme For Focused Search" in WWW2007, ACM Press, 2007.
F. Lamberti, A. Sanna, and C. Demartini, "A relation-based page rank algorithm for semantic web search engines," in Knowledge and Data Engineering, IEEE Transactions, vol. 21, no. 1, pp. 123-136, 2009.
L.W. Lee, J.Y. Jiang, C. Wu, and S.J. Lee. "A query-dependent ranking approach for search engines," in WCSE’09 Second International Workshop on Computer Science and Engineering, vol. 1, pp. 259-263, Oct 2009.
Milan Vojnovic et al., "Ranking and Suggesting Popular Items," in IEEE Transaction of Knowledge and Data Engineering, vol. 21, no. 8, Aug 2009.
S. Cheng, P. YunTao, Y. JunPeng, G.Hong, Y.ZhengLu, and H. ZhiYu, "PageRank, HITS and impact factor for journal ranking." in WRI World Congress on Computer Science and Information Engineering, vol. 6, pp. 285-290, 2009.
P. Sharma, Deepak Tyagi, and P. Bhadana, "Weighted page content rank for ordering web search result," in International Journal of Engineering Science and Technology, vol. 2, no. 12, pp. 7301-7310, 2010.
P. Rani and E.S. Singh, "An Offline SEO (Search Engine Optimization) Based Algorithm to Calculate Web Page Rank According to Different Parameters," in International Journal of Computers and Technology, vol. 9, no. 1, pp. 926-931, 2013.
Punit Patel, "Research of Page ranking algorithm on Search engine using Damping factor," in International Journal of Advance Engineering and Research Development (IJAERD), vol. 1, no. 1, February 2014.
S. Tuteja, "Enhancement in Weighted Page Rank Algorithm Using VOL," in IOSR Journal of Computer Engineering (IOSR-JCE), vol. 14, no. 5, pp. 135-141, 2013.
R. Jain and D.G. Purohit, "Page ranking algorithms for web mining," in International journal of computer applications, vol. 13, no. 5, pp. 22-25, 2011.
Ruby-Figueroa, R. (2015). Principal Component Analysis (PCA), In Encyclopedia of Membranes, pp. 1-2, 2015.
S. Seo, E.J. Yoon, J. Kim, S. Jin, J.S. Kim, and S. Maeng, "Hama: An efficient matrix computation with the map-reduce framework," in IEEE Second International Conference on Cloud Computing Technology and Science, pp. 721-726, 2010.
L. Fegaras, C. Li, and U. Gupta, "An optimization framework for map-reduce queries," in Proceedings of the 15th International Conference on Extending Database Technology ACM, pp. 26-37, 2012.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.