From Databases to
Web-Bases Database
Research Issues in Internet Information Systems
Giansalvatore Mecca
Databases in Internet Information
Systems
Born to be a uniform interface for
sharing data, due to its enormous growth the World Wide Web is rapidly evolving into a
standard, world-wide distributed computing platform, upon which next-generation
information systems are likely to be based. There are several hints suggesting such a
scenario; a relevant one is the radical change in the attitude of users with respect to
the Web: a recent survey conducted by the Wired Magazine [Wir] says that a large
part of users that a couple of years ago used to surf the Web in search of new, exotic
sites, have now abandoned their exploration in favor of a more selective access to a
limited number of sites offering information and services relevant to their work or their
everyday life.
The focus of database research in this
field is somehow following a similar path; it started as an attempt to extend database
techniques to the exploration of Web data [KS95, LSS96, MMM97], and provide support for
Web-wide computation, and is now evolving towards the definition of a database support for
Internet-based information systems. In this respect, a key difference consists in the fact
that the Web is no longer seen as a large graph of essentially unstructured objects to be
browsed or explored using high-level languages, but as a collection of sites offering data
and services. Two main issues are critical here:
* first, since Web data and services are open to the public, they may be used for
computing purposes; however, to profitably elaborate upon these data, it is essential to
capture their semantics and be able to flexibly manipulate them;
* second, the availability of a standard distributed platform such as the Web is a
a powerful driving force towards the development of cooperative initiatives, i.e.,
systems that correlate and integrate data coming from Web data sources, to generate new,
value-added services.
Database research and technology can
play an important role in investigating these two issues, that is, establishing new and
more sophisticated forms of repositories capable of handling data from the Web, and
supporting Web-based cooperation.
Research Issues in Web Data
Management
Future generation information systems
are likely to be based on HTTP-like protocols, hyper-textual front-ends and
platform-independent programming languages. This will clearly have a strong impact on the
role played by data in such systems. Traditional concepts, such as data-independence
from applications, design methodologies, and even the concept of DBMS need to be
reconsidered in this new framework. Database management systems will evolve into new forms
of repositories, capable of dealing with these new requirements in data manipulation.
A first feature of such repositories is
that they will be collections of data of heterogeneous nature, and more specifically:
partly highly structured data, such as the ones typically stored in relational or
object-oriented database systems, and partly semistructured data [Abi97], in the Web
style. While structured data can be well handled using traditional database know-how,
semistructured data require new efforts. We list a number of relevant issues, which also
represent potentially interesting research topics.
* Data Models and Query Languages New data models and languages [AQM+97,
BDHS96, AMM97, AM98] have been proposed; in this context, sophisticated text management
techniques [AM97, HGMC+97] are needed to wrap data sources and build a database
representation in a data model; also, methodologies and management systems [MW97, FFK+97.
AMM98] for Web data need to be investigated. Several of these system have also been
implemented and are available on the Web.
* Inferring Structure in Web Documents Another important issue is related to
finding structure in apparently unstructured data, that is, extending to semistructured
data some notion of database "scheme'' (for several interesting references see the
relative Sessions in [Suc97])..
* Complexity and Optimization A critical point here is related to query
optimization: the Web imposes a radically new cost model for query evaluation, in which
local processing can be considered of negligible cost with respect to network accesses;
this new model need to be formalized [AV97a, MM97], and then used as a basis for efficient
query evaluation [AV97b, MMM98].
* Data Integration In a world in which almost everything is accessible, it
is natural to think of integration. Integration is not an unknown issue to database
scientists [SL90, Kim95, KCG595]. Interestingly, logic seems to play an important role
here [Ull97]. Logic has been proposed as a framework for querying the Web [LSS96, DBHT97,
HLLS97], although practical implementations are still under development. However, it also
provides a nice formalism to reason about data integration [CGMH+94, GMPQ+95, LRO96].
References
[Abi97] S. Abiteboul. Querying
semi-structured data. In Sixth International Conference on Data Base Theory, (ICDT'97),
Delphi (Greece), Lecture Notes in Computer Science, 1997.
[AM97] P. Atzeni and G. Mecca. Cut and
Paste. In Sixteenth ACM SIGMOD Intern. Symposium on Principles of Database Systems
(PODS'97), Tucson, Arizona, pages 144--153, 1997. http://poincare.inf.uniroma3.it:8080/Araneus/.
[AM98] G. O. Arocena and A. O.
Mendelzon. WebOQL: Restructuring documents, databases and Webs. In Fourteenth IEEE
International Conference on Data Engineering (ICDE'98), Orlando, Florida, 1998. To
Appear.
[AMM97] P. Atzeni, G. Mecca, and P.
Merialdo. To Weave the Web. In International Conf. on Very Large Data Bases (VLDB'97),
Athens, Greece, August 26-29, pages 206--215, 1997.
http://poincare.inf.uniroma3.it:8080/Araneus/.
[AMM98] P. Atzeni, G. Mecca, and P.
Merialdo. Design and maintenance of data-intensive Web sites. In VI Intl. Conference on
Extending Database Technology (EDBT'98), Valencia, Spain, March 23-27, 1998. To Appear.
[AQM+97] S. Abiteboul, D.
Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured
data. Journal of Digital Libraries, 1(1):68--88, April 1997.
[AV97a] S. Abiteboul and V. Vianu.
Queries and computation on the Web. In Sixth International Conference on Data Base
Theory, (ICDT'97), Delphi (Greece), Lecture Notes in Computer Science, 1997.
[AV97b] S. Abiteboul and V. Vianu.
Regular path queries with constraints. In Sixteenth ACM SIGMOD Intern. Symposium on
Principles of Database Systems (PODS'97), Tucson, Arizona, pages 122--133, 1997.
[BDHS96] P. Buneman, S. Davidson, G.
Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured
data. In ACM SIGMOD International Conf. on Management of Data (SIGMOD'96), Montreal,
Canada, pages 505--516, 1996.
[CGMH+94] S. Chawathe, H.
Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom. The
TSIMMIS project: Integration of heterogenous information sources. In IPSJ Conference,
Tokyo, 1994.
[DBHT97] K. De Bosschere, M.
Hermenegildo, and P. Tarau, editors. Proceedings of the Workshop on Logic Programming
Tools for Internet Applications (in conjunction with ICLP'97), Leuven, July 11, 1997 http://clement.info.umoncton.ca/%7Elpnet/proceedings97/ . 1997.
[FFK+97] M. Fernandez, D.
Florescu, J. Kang, A. Levy, and D. Suciu. STRUDEL -- a Web site management system. In ACM
SIGMOD International Conf. on Management of Data (SIGMOD'97), Tucson, Arizona, 1997.
Exhibits Program.
[GMPQ+95] H. Garcia-Molina,
Y. Papakonstantinou, D. Quass, Rajaraman A., Y. Sagiv, J. D. Ullman, and J. Widom. The
TSIMMIS approach to mediation: Data models and languages (extended abstract). In NGITS
Workshop, 1995.
[HGMC+97]J. Hammer, H.
Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting semistructured information
from the Web. In Proceedings of the Workshop on the Management of Semistructured Data
(in conjunction with ACM SIGMOD 1997) http://wwwresearch.att.com/0 suciu/workshop-papers.html,
1997.
[HLLS97] R. Himmeroeder, G. Lausen, B.
Ludaescher, and C. Schlepphorst. On a declarative semantics for Web queries. In Fifth
International Conference on Deductive and Object-Oriented Databases (DOOD'97), Montreux,
Switzerland, December 8-11, 1997.
[KCGS95] W. Kim, I. Choi, S. Gala, and
M. Scheevel. On resolving schematic heterogeneity in multidatabase systems. In W. Kim,
editor, Modern Database Systems, pages 521--550. ACM Press, 1995.
[Kim95] Kim_95 W. Kim, editor. Modern
Database Systems: the Object Model, Interoperability, and Beyond. ACM Press and Addison
Wesley, 1995.
[KS95] D. Konopnicki and O. Shmueli.
W3QS: A query system for the world-wide web. In International Conf. on Very Large Data
Bases (VLDB'95), Zurich, pages 54--65, 1995.
[LRO96] A. Y. Levy, A. Rajaraman, and
J. J. Ordille. Querying heterogeneous information sources using source descriptions. In
International Conf. on Very Large Data Bases (VLDB'96), Mumbai(Bombay), 1996.
[LSS96] L. Lakshmanan, F. Sadri, and I.
N. Subramanian. A declarative language for querying and restructuring the Web. In 6th
Intern. Workshop on Research Issues in Data Engineering: Interoperability of
Nontraditional Database Systems (RIDE-NDS'96), 1996.
[MM97] A. Mendelzon and T. Milo. Formal
models of Web queries. In Sixteenth ACM SIGMOD Intern. Symposium on Principles of Database
Systems (PODS'97), Tucson, Arizona, 1997.
[MMM97] A. Mendelzon, G. Mihaila, and
T. Milo. Querying the World Wide Web. Journal of Digital Libraries, 1(1):54--67, April
1997.
[MMM98] G. Mecca, A. Mendelzon, and P.
Merialdo. Efficient queries over Web views. In VI Intl. Conference on Extending Database
Technology (EDBT'98), Valencia, Spain, March 23-27, 1998. To Appear.
[MW97] J. McHugh and J. Widom.
Integrating dynamically-fetched external information into a DBMS for semistructured data.
In Proceedings of the Workshop on the Management of Semistructured Data (in conjunction
with ACM SIGMOD 1997) http://www.research.att.com/0 suciu/workshop-papers.html,
1997.
[SL90] A. P. Sheth and J. A. Larson.
Federated database systems for managing distributed, heterogeneous, and autonomous
databases. ACM Computing Surveys, 22(3):183--236, September 1990.
[Suc97] D. Suciu, editor. Proceedings
of the Workshop on the Management of Semistructured Data (in conjunction with ACM SIGMOD
1997) http://www.research.att.com/0suciu/workshop-papers.html. 1997.
[Ull97] J. D. Ullman. Information
integration using logical views. In Sixth International Conference on Data Base Theory,
(ICDT'97), Delphi (Greece), Lecture Notes in Computer Science, pages 19--40, 1997.
[Wir] The Wired Magazine Web site.
http://www.wired.com. |