C o m p u t a t i o n a l    L o g i c

From Databases to Web-Bases


 

From Databases to Web-Bases

Database Research Issues in Internet Information Systems

Giansalvatore Mecca

Databases in Internet Information Systems

Born to be a uniform interface for sharing data, due to its enormous growth the World Wide Web is rapidly evolving into a standard, world-wide distributed computing platform, upon which next-generation information systems are likely to be based. There are several hints suggesting such a scenario; a relevant one is the radical change in the attitude of users with respect to the Web: a recent survey conducted by the Wired Magazine [Wir] says that a large part of users that a couple of years ago used to surf the Web in search of new, exotic sites, have now abandoned their exploration in favor of a more selective access to a limited number of sites offering information and services relevant to their work or their everyday life.

The focus of database research in this field is somehow following a similar path; it started as an attempt to extend database techniques to the exploration of Web data [KS95, LSS96, MMM97], and provide support for Web-wide computation, and is now evolving towards the definition of a database support for Internet-based information systems. In this respect, a key difference consists in the fact that the Web is no longer seen as a large graph of essentially unstructured objects to be browsed or explored using high-level languages, but as a collection of sites offering data and services. Two main issues are critical here:

* first, since Web data and services are open to the public, they may be used for computing purposes; however, to profitably elaborate upon these data, it is essential to capture their semantics and be able to flexibly manipulate them;

* second, the availability of a standard distributed platform such as the Web is a a powerful driving force towards the development of cooperative initiatives, i.e., systems that correlate and integrate data coming from Web data sources, to generate new, value-added services.

Database research and technology can play an important role in investigating these two issues, that is, establishing new and more sophisticated forms of repositories capable of handling data from the Web, and supporting Web-based cooperation.

Research Issues in Web Data Management

Future generation information systems are likely to be based on HTTP-like protocols, hyper-textual front-ends and platform-independent programming languages. This will clearly have a strong impact on the role played by data in such systems. Traditional concepts, such as data-independence from applications, design methodologies, and even the concept of DBMS need to be reconsidered in this new framework. Database management systems will evolve into new forms of repositories, capable of dealing with these new requirements in data manipulation.

A first feature of such repositories is that they will be collections of data of heterogeneous nature, and more specifically: partly highly structured data, such as the ones typically stored in relational or object-oriented database systems, and partly semistructured data [Abi97], in the Web style. While structured data can be well handled using traditional database know-how, semistructured data require new efforts. We list a number of relevant issues, which also represent potentially interesting research topics.

* Data Models and Query Languages New data models and languages [AQM+97, BDHS96, AMM97, AM98] have been proposed; in this context, sophisticated text management techniques [AM97, HGMC+97] are needed to wrap data sources and build a database representation in a data model; also, methodologies and management systems [MW97, FFK+97. AMM98] for Web data need to be investigated. Several of these system have also been implemented and are available on the Web.

* Inferring Structure in Web Documents Another important issue is related to finding structure in apparently unstructured data, that is, extending to semistructured data some notion of database "scheme'' (for several interesting references see the relative Sessions in [Suc97])..

* Complexity and Optimization A critical point here is related to query optimization: the Web imposes a radically new cost model for query evaluation, in which local processing can be considered of negligible cost with respect to network accesses; this new model need to be formalized [AV97a, MM97], and then used as a basis for efficient query evaluation [AV97b, MMM98].

* Data Integration In a world in which almost everything is accessible, it is natural to think of integration. Integration is not an unknown issue to database scientists [SL90, Kim95, KCG595]. Interestingly, logic seems to play an important role here [Ull97]. Logic has been proposed as a framework for querying the Web [LSS96, DBHT97, HLLS97], although practical implementations are still under development. However, it also provides a nice formalism to reason about data integration [CGMH+94, GMPQ+95, LRO96].

References

[Abi97] S. Abiteboul. Querying semi-structured data. In Sixth International Conference on Data Base Theory, (ICDT'97), Delphi (Greece), Lecture Notes in Computer Science, 1997.

[AM97] P. Atzeni and G. Mecca. Cut and Paste. In Sixteenth ACM SIGMOD Intern. Symposium on Principles of Database Systems (PODS'97), Tucson, Arizona, pages 144--153, 1997. http://poincare.inf.uniroma3.it:8080/Araneus/.

[AM98] G. O. Arocena and A. O. Mendelzon. WebOQL: Restructuring documents, databases and Webs. In Fourteenth IEEE International Conference on Data Engineering (ICDE'98), Orlando, Florida, 1998. To Appear.

[AMM97] P. Atzeni, G. Mecca, and P. Merialdo. To Weave the Web. In International Conf. on Very Large Data Bases (VLDB'97), Athens, Greece, August 26-29, pages 206--215, 1997. http://poincare.inf.uniroma3.it:8080/Araneus/.

[AMM98] P. Atzeni, G. Mecca, and P. Merialdo. Design and maintenance of data-intensive Web sites. In VI Intl. Conference on Extending Database Technology (EDBT'98), Valencia, Spain, March 23-27, 1998. To Appear.

[AQM+97] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. Journal of Digital Libraries, 1(1):68--88, April 1997.

[AV97a] S. Abiteboul and V. Vianu. Queries and computation on the Web. In Sixth International Conference on Data Base Theory, (ICDT'97), Delphi (Greece), Lecture Notes in Computer Science, 1997.

[AV97b] S. Abiteboul and V. Vianu. Regular path queries with constraints. In Sixteenth ACM SIGMOD Intern. Symposium on Principles of Database Systems (PODS'97), Tucson, Arizona, pages 122--133, 1997.

[BDHS96] P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In ACM SIGMOD International Conf. on Management of Data (SIGMOD'96), Montreal, Canada, pages 505--516, 1996.

[CGMH+94] S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogenous information sources. In IPSJ Conference, Tokyo, 1994.

[DBHT97] K. De Bosschere, M. Hermenegildo, and P. Tarau, editors. Proceedings of the Workshop on Logic Programming Tools for Internet Applications (in conjunction with ICLP'97), Leuven, July 11, 1997 http://clement.info.umoncton.ca/%7Elpnet/proceedings97/ . 1997.

[FFK+97] M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. STRUDEL -- a Web site management system. In ACM SIGMOD International Conf. on Management of Data (SIGMOD'97), Tucson, Arizona, 1997. Exhibits Program.

[GMPQ+95] H. Garcia-Molina, Y. Papakonstantinou, D. Quass, Rajaraman A., Y. Sagiv, J. D. Ullman, and J. Widom. The TSIMMIS approach to mediation: Data models and languages (extended abstract). In NGITS Workshop, 1995.

[HGMC+97]J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting semistructured information from the Web. In Proceedings of the Workshop on the Management of Semistructured Data (in conjunction with ACM SIGMOD 1997) http://wwwresearch.att.com/0 suciu/workshop-papers.html, 1997.

[HLLS97] R. Himmeroeder, G. Lausen, B. Ludaescher, and C. Schlepphorst. On a declarative semantics for Web queries. In Fifth International Conference on Deductive and Object-Oriented Databases (DOOD'97), Montreux, Switzerland, December 8-11, 1997.

[KCGS95] W. Kim, I. Choi, S. Gala, and M. Scheevel. On resolving schematic heterogeneity in multidatabase systems. In W. Kim, editor, Modern Database Systems, pages 521--550. ACM Press, 1995.

[Kim95] Kim_95 W. Kim, editor. Modern Database Systems: the Object Model, Interoperability, and Beyond. ACM Press and Addison Wesley, 1995.

[KS95] D. Konopnicki and O. Shmueli. W3QS: A query system for the world-wide web. In International Conf. on Very Large Data Bases (VLDB'95), Zurich, pages 54--65, 1995.

[LRO96] A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying heterogeneous information sources using source descriptions. In International Conf. on Very Large Data Bases (VLDB'96), Mumbai(Bombay), 1996.

[LSS96] L. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative language for querying and restructuring the Web. In 6th Intern. Workshop on Research Issues in Data Engineering: Interoperability of Nontraditional Database Systems (RIDE-NDS'96), 1996.

[MM97] A. Mendelzon and T. Milo. Formal models of Web queries. In Sixteenth ACM SIGMOD Intern. Symposium on Principles of Database Systems (PODS'97), Tucson, Arizona, 1997.

[MMM97] A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web. Journal of Digital Libraries, 1(1):54--67, April 1997.

[MMM98] G. Mecca, A. Mendelzon, and P. Merialdo. Efficient queries over Web views. In VI Intl. Conference on Extending Database Technology (EDBT'98), Valencia, Spain, March 23-27, 1998. To Appear.

[MW97] J. McHugh and J. Widom. Integrating dynamically-fetched external information into a DBMS for semistructured data. In Proceedings of the Workshop on the Management of Semistructured Data (in conjunction with ACM SIGMOD 1997) http://www.research.att.com/0 suciu/workshop-papers.html, 1997.

[SL90] A. P. Sheth and J. A. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3):183--236, September 1990.

[Suc97] D. Suciu, editor. Proceedings of the Workshop on the Management of Semistructured Data (in conjunction with ACM SIGMOD 1997) http://www.research.att.com/0suciu/workshop-papers.html. 1997.

[Ull97] J. D. Ullman. Information integration using logical views. In Sixth International Conference on Data Base Theory, (ICDT'97), Delphi (Greece), Lecture Notes in Computer Science, pages 19--40, 1997.

[Wir] The Wired Magazine Web site. http://www.wired.com.


Coordinator's Report ] Intelligent Access to Heterogeneous Information Sources ] PODS '97 -- ACM Symposium on Principles of Database Systems ] Spatial Databases and Spatial Logic ] [ From Databases to Web-Bases ] Institute of Information Systems, Technical University of Vienna ]


Home ] Automated Deduction Systems ] Computational Logic & Machine Learning ] Concurrent & Constraint Logic Programming ] Language Design, Semantics & Verification Methods ] Logic Based Databases ] Program Development ] Knowledge Representation & Reasoning ]