Xml clustering

3433 mots 14 pages
STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE
Wei-ning Qian, Hai-lei Qian, Li Wei, Yan Wang and Ao-ying Zhou
Computer Science Department Fudan University Shanghai 200433 E-mail: wnqian@fudan.edu.cn
Abstract: Based on the query expansion techniques in information retrieval systems, structure-based query expansion for XML search engines, which is designed to ease the query for XML data while keeping the power and flexibility of XML query, is introduced in this paper. To enable the structure expansion, a structure thesaurus should be built first, which involves the construction of a weighted graph from XML documents and the linkage-based clustering method to cluster the nodes into several groups. After a query comes, the structure thesaurus is examined, so that for each tag in the original query, the tags in the same group are retrieved. Unrelated tags are filtered and some heuristic rules are applied to replacing the tags in the original query with the related tags and to expanding the structure. It is shown that using structure-based query expansion, the system can return result with high precision and recall.

1. INTRODUCTION
XML (Extensible Markup Language) is a specification of W3C (Bray, 1998). It is developed to complement HTML for data exchange on the Web. In recent years, XML has been more and more used in large information systems, such as digital libraries or information centers. In most of these systems, search engine is a major module. XML search engine has gained its popularity over HTML search engine primarily due to two notable advantages it bears. 1) It provides the ability to query not only the content, but also the structure. 2) It usually has more complex and powerful query languages, such as XML-QL (Deutsch) and XQL (Robie, 1999). These languages allow users to query elements satisfying certain conditions. However, these two advantages of XML search engine also bring the following shortcomings: 1) It is difficult for users to pose

en relation

  • Inditex groupe
    4681 mots | 19 pages
  • Mexx
    620 mots | 3 pages
  • LexUriServ
    30271 mots | 122 pages
  • Astérix obelix comm
    652 mots | 3 pages
  • Groupe Colruyt
    513 mots | 3 pages
  • Contenant l'allureyuxxxxxxxxxxxxxx
    997 mots | 4 pages
  • Xsdbsdfb
    2956 mots | 12 pages
  • Lexique marx
    1258 mots | 6 pages
  • Younes Khoubza
    365 mots | 2 pages
  • L'xp exos
    556 mots | 3 pages
  • Le cluster
    931 mots | 4 pages
  • louis Xv
    839 mots | 4 pages
  • Louix xiv
    7784 mots | 32 pages
  • Cluster performant
    5239 mots | 21 pages
  • Cxccxdsq
    799 mots | 4 pages