Key-Based Top-K Search in Multidimensional Databases
DOI:
https://doi.org/10.51983/ajcst-2012.1.1.1676Abstract
Previous studies on supporting free- form keyword queries over RDBMSs provide users with linked-structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. The problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube) is studied. The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. Given a keyword query, the goal is to find the top-k most relevant cells. This project studies the problem of keyword-based top k search in text cube, i.e., given a keyword query, find the top-k most relevant cells in a text cube. When users want to retrieve information from a text cube using keyword queries, relevant cells, rather than relevant documents, are preferred as the answers, because:(i) relevant cells are easy for users to browse; and (ii)relevant cells provide users insights about the relationship between the values of relational attributes and the text data. The proposed algorithm uses relevance scoring formula for finding the top-k relevant cells by exploring only a small portion of the whole text cube (when k is small) and enables early termination.
References
Cindy Xide Lin et al. Text Cube: Computing IR Measures for Multidimensional Text Database Analysis. In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 905–910, Washington, DC, USA, 2008 IEEE Computer Society.
Yintao Yu et al. iNextCube: Information Network-Enhanced Text Cube. Proc. VLDBEndow., 2(2):1622–1625, 2009.
Duo Zhang et al. Topic Modeling for OLAP on Multidimensional Text Databases:Topic Cube and Its Applications. Stat. Anal. Data Min., 2(56):378–395, 2009.
S. Chaudhuri, R. Ramakrishnan, and G. Weikum, “Integrating db and ir technologies: What is the sound of one hand clapping?” in Proc. Conf. on Innovative Data Syst. Research (CIDR), 2005, pp.1–12.
S. Amer-Yahia, P. Case, T. R¨olleke, J. Shanmugasundaram, and G. Weikum, “Report on the db/ir panel at sigmod 2005,” SIGMOD Record, vol. 34, no. 4, pp. 71–74, 2005.
G. Weikum, “Db&ir: both sides now,” in Proc.ACM SIGMOD, 2007, pp. 25–30.
S. Agrawal, S.Chaudhuri, and G. Das, “Dbxplorer: A system for keyword-based search over relational databases,” in Proc. IEEE Int’l Conf. Data Eng. (ICDE), 2002, pp. 5–16
F. Liu, C. T. Yu, W. Meng, and A. Chowdhury, “Effective keyword search in relational databases,” in Proc. ACM SIGMOD, 2006, pp. 563–574.
Y. Luo, X. Lin, W. Wang, and X. Zhou, “Spark: top-k keyword query in relational databases,” in Proc. ACM SIGMOD, 2007, pp. 115–126.
]G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, “Keyword searching and browsing in databases using banks,” in Proc. IEEE Int’l Conf. Data Eng. (ICDE), 2002, pp. 431–440.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2012 The Research Publication
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.