CoBase on the Web

|| Home || Projects || Publications || Members ||

Query Relaxation for XML Database

Project funded NSF Award # 0219442

1. Project Overview

1.1 Project Introduction & Motivation

As the World Wide Web becomes a major means of disseminating and sharing information, there has been an exponential increase in the amount of data in web-complaint format such as HyperText Markup Language (HTML) and Extensible Markup Language (XML). XML is essentially a text representation of hierarchical (tree-like) data where a meaningful piece of data is bounded by matching starting and ending tags, such as <name> and </name>. Due to its simplicity as compared to SGML and its relative expressiveness as compared to HTML, XML has become a popular format for information representation and data exchange.

To cope with the tree-like structures in the XML model, several XML-specific query languages have been proposed (e.g. XPath, XQuery, etc.). All these query languages aim at the exact matching of query conditions. Answers are found when those XML documents match the given query condition exactly. However, this may not always be the case in the XML model. To remedy this condition, we propose to develop a query relaxation framework for searching answers that approximately match the given query conditions. Query relaxation enables systems to relax the user query to a less restricted form to devise approximate answers.

In the XML domain, because of the flexible nature of the XML model allows varied structures and/or values and the non-rigid XML tag syntax enables to embed a wealth of meta information in XML documents, the need for such query relaxation increases. The following points illustrates that query relaxation is more important for the XML model than the relational model:

Unlike in the relational model where users are given a relatively small-sized schema to ask queries, the schema in the XML model is substantially bigger and more complex. Therefore, it is often unrealistic for users to understand the full schema and compose very complex queries at once. Therefore, it becomes critical to be able to relax the user's query when the original query yields null or not sufficient answers.
As the number of data sources available on the web increases, it becomes common to build a system where data is gathered from heterogeneous data sources and where the structures of the participating data sources are different, even though they are using the same ontology. Therefore, the capability to query against differently-structured data sources becomes more important. Query relaxation allows the query to be structurally relaxed and routed to diverse data sources with different structures.

We propose a cooperative XML (CoXML) system for approximate query answering. The objective of CoXML is to develop an XML query relaxation enabled search engine such that when users' queries have empty or insufficient answers, our system can relax the queries to get approximate answers.

1.2 Summary of Approaches

CoXML is based on the following two key techniques:

Extending the standard XQuery with relaxation constructs and controls. Such a relaxed XQuery (RLXQuery) language enable users to specify relaxation constructs (e.g., approximate, preferred, unacceptable query conditions etc.), and relaxation controls (e.g., relaxation orders, answer set size, etc.).
Using a knowledge-based relaxation index XML Type Abstraction Hierarchy (XTAH), to guide the XML query relaxation process. XTAH is a hierarchical tree-like multi-level knowledge structure where similar XML fragments are clustered based on the inter XML fragment distance measure. An XTAH has two node types: internal nodes and leaf nodes. An internal node has a representative tree that summarizes the characteristics, including the structure and semantics, of all the XML fragments in that cluster. A leaf node has a representative tree as well as a set of XML fragments that are similar to each other. Guided by the XTAH, based on the relaxation constructs in RLXQuery, CoXML processes the query and returns approximately matched answers.

1.3 CoXML System Overview

The diagram above illustrates the architecture of our CoXML system, which consists of a relaxation controller, a RLXQuery parser, a RLXQuery preprocessor, a database manager, a relaxation manager, an XTAH manager, an XTAH generator, an execution trace manager and a post-processor. The functionalities of each module are briefly described as follows:

XTAH Generator
Given a set of XML documents and some domain knowledge, the XTAH generator automatically generates XTAHs that summarize the content and structure characteristics of the underlying XML documents.
Relaxation Controller
The relaxation controller is the main interface to the outside world. The controller is the controlling unit of RLXQuery search engine and it dispatches the query to different modules to allow them to perform some specific task to the query. It is the module that decides when to stop query relaxation based on the criteria specified by the user.
RLXQuery Parser
The RLXQuery Parser transforms an RLXQuery into an XQueryRep object. The Query Parser performs two services. The first part is responsible for checking the syntax of the query. If the syntax is correct, then the query is parsed to the second part. The second part extracts all the in formation from the query and creates an XQueryRep object.
RLXQuery Preprocessor
The RLXQuery Preprocessor transforms relaxation constructs (if any) in the query into the standard XQuery constructs and communicates with the database manager to check whether there are enough answers available. If not, the relaxation controller will send the query to the relaxation manager.
Database Manager
The functionality of the database manager is to find exact matched answers to a standard XQuery.
Relaxation Manager
The relaxation manager performs the following functionalities: 1) building a relaxation structure based on the specified relaxation constructs and controls; 2)communicating with the XTAH manager to get relaxed conditions; and 3)modifying the query accordingly.
XTAH Manager
The functionality of the XTAH manager is to select an appropriate XTAH for guiding the query relaxation process.
Post-processor
The functionality of the post-processor is to accept unsorted answers and output these answers in a ranked order.
Execution Trace Manager
The Execution Trace keeps track of all the actions incurred on the XQueryRep by each module. All modules need to communicate with the Execution Trace to record their changes to the XQueryRep.
Ranking of approximate answers
A model has been developed to rank answers based on both content and structure relaxation.

2. Screenshots Examples

Example of using a GUI interface to help users to specify relaxations in queries

3. Members

Wesley W. Chu	Principal Investigator	wwc AT cs.ucla.edu
Shaorong Liu	Ph.D. Student	sliu AT cs.ucla.edu
Ruzan Shahinian	M.S. Student
Joseph Chen	M.S. Student	jccs05 AT gmail.com
Christian Cardenas	M.S. Student
Anna Putnam	M.S. Student
Eric Sung	M.S. Student
Tony Lee	M.S. Student

4. Publications

4.1 Papers

Shaorong Liu and Wesley W. Chu. CoXML: A Cooperative XML Query Answering System. In the 8th International Conference on Web-Age Information Management, 2007. (Paper in PDF format)
Wesley W. Chu and Shaorong Liu. CoXML: Cooperative XML Query Answering In The Encyclopedia of Computer Science and Engineering, Edit by B. Wah, John Wiley & Sons, Inc, 2007, to appear (Paper in PDF format)
Shaorong Liu, Wesley W. Chu, and Ruzan Shahinian. Vague Content and Structure (VCAS) Retreival for Document-centric XML Collections. In WebDB (in conjunction with SIGMOD), 2005. (Paper in PDF format )

Shaorong Liu, Qinghua Zou, and Wesley W. Chu. Configurable Indexing and Ranking for XML Information Retrieval.In Proceedings of 27th Annual International ACM SIGIR Conference, 2004. (Paper in PDF format , slides in power point format)
Shaorong Liu and Wesley W. Chu. Cooperative XML (CoXML) Query Answering at INEX 2003. In Proceedings of the 2^nd INitiative of the Evaluation of XML retrieval (INEX) Workshop, Schloss Dagstuhl, Germany,2003. PDF
Qinghua. Zou, Shaorong Liu, and Wesley W. Chu. Using a Compact Tree to Index and Query XML Data. In Proceedings of Thirteenth Conference on Information and Knowledge Management (CIKM), 2004. PDF
Qinghua. Zou, Shaorong Liu, and Wesley W. Chu. Ctree: A Compact Tree for Indexing XML Data. In Proceedings of 6th International Workshop on Web Information and Data Management (WIDM), 2004.PDF

4.2 Presentations & Posters

"CoXML: A Cooperative XML Query Answering System", UCLA Engineering Annual Research Review, 2006.(Poster abstract in pdf format)

"Query Relaxation for XML Databases", NSF IDM 04 Workshop, Cambridge, Massachusetts, Oct. 10 - Oct. 12th, 2004.(Poster in power point format)

"Query Relaxation for XML Databases", NSF IDM 03 Workshop, Seattle, Washington, Sept. 14- Sept. 16, 2003. (Presentation in power point format)

4.3 Master Comprehensive Exam Reports

Tony Lee. Indexing Technique for XML structure patterns. MaSters Thesis, Computer Science Department, UCLA, Winter 2003. PDF
Eric Sung. Parser for Relaxation-Enabled XML Query Language. Masters Thesis, Computer Science Department, UCLA, Fall 2004. PDF
Christian Cardenas. Parsing and Preprocessing of RLXQuery for XML Query Relaxation. Masters Thesis, Computer Science Department, UCLA, Spring 2004. PDF
Anna Putnam. The Relaxation Manager for XML Query Relaxation. Masters Thesis, Computer Science Department, UCLA, Spring 2004. PDF