Project Overview
Welcome to the XPRESS project home page! We, CoBase research
group, aim at building a novel XML Query Processing and Relaxation
system in Relational Storage. XPRESS stands for
"Xml Processing and Relaxation in rElational
Storage
System"
This project is under umbrella of the larger
effort to build an intelligent information system in CoBae
project supported by DARPA.
As World-Wide Web becomes a major mean of disseminating and sharing
information, there is an exponential increase of the amount of data in
web-compliant format such as HyperText Markup Language (HTML) and
Extensible Markup Language (XML). Especially, XML has been recently
getting a lot of attention as a possible candidate format due to its
relative simplicity as compared to SGML and relative powerfulness as
compared to HTML.
Building an efficient information system to handle such XML documents
is a challenging task, involving many issues related to storage,
index, query, user interface, etc.
Two promising ways of implementing it are:
- Building an XML-oriented database system from the scratch.
- Using a relational database (RDB) as a underlying system.
The first approach includes projects such as Lore at Stanford
U. [SIGMOD 97] and the second approach includes projects such as
STORED/SilkRoute at AT\&T Labs [SIGMOD 98], Niagara at U. Wisconsin
[VLDB 99].
Each approach has its own strong edge in different aspects, but the
second approach especially has the following benefits:
- Considering the present market that is mostly dominated by RDB
products, it is not easy nor practical to abandon RDB to support
XML. It is very likely that industries would be reluctant to adopt the
new technology if it does not support the existing RDB techniques as
they were reluctant towards object-oriented database in the past.
- By using RDB as an underlying system, the mature RDB techniques
can be leveraged. That is, a vast number of sophisticated techniques
(e.g., OLAP, Data Mining, Data Warehousing, etc.) developed for RDB
can be applied to XML data with a minimal change.
- The integration of a large amount of XML data on web and legacy
data in relational format is possible. This will enable the various
types of input/output combinations of the information system:
- XML input - XML output,
- XML input - Relational output,
- Relational input - XML output, and
- Relational input - Relational output.
It is also important to mention that this approach is not without its cons:
- The conversion costs between XML and relational model can be very
high. It could cause a heavy CPU computation and slow query response time.
- The conversion process may not be always correct or
complete. Due to the irregularities in XML, systems may lose some
information during the conversion.
In this research, we plan to investigate the second approach in a
detailed and complete manner. Furthermore, we'd like to leverage on
the mature CoBase
techniques [ICDE 91, JIIS 94, JIIS 96] to cope with
situations when no sufficient answers and found. Towards this goal,
the following four functionalities are identified as
illustrated in the Figure below:
- Mapping XML to Relational Model
- Rewriting XML Query to SQL
- Query Relaxation in XML
- Mapping Relational to XML Model
Last modified: Sun Sep 10 21:20:52 PDT 2000