Spatial Databases
and Spatial Logic Jan Paredaens
Introduction
Spatial databases basically investigate
that kind of databases that manage information about one, two or more dimensional real
space. In classical database theory the content of databases is abstract and has no
property except the equality. This is why genericity [16] is so important here. In spatial
databases, the
content has some interpretation. All the laws of real geometry hold. This induces a far
richer class of datastructures. It augments the possibilities to represent these
databases. The content of the latter is indeed infinite, since it contains, mostly, an
infinite number of points. There are however, different possibilities to represent this
information in a finite way.
The different geometrical properties
give rise to many kinds of genericity [49]. They express the importance of the special
semantic properties of real spatial databases. Therefore, the search for complete
languages for expressing these generic queries is a hard challenge, especially for the
generic classes that cannot be described in a finite way. We are not going to discuss this
topic but refer to [38] for an elaborated overview.
Algebras and Graphs
The geo-relational algebra was a first
design of an extended relational algebra including spatial data types with a comprehensive
set of operations [33]. This algebra was implemented within a large
prototype system, the Gral System. The ROSE algebra offers more general datatypes for a 2D
spatial
database [34, 35]. The concept of a realm is fundamental for this kind of work. An
important problem in spatial databases that is not solved through the introduction of
spatial data types and algebras is the representation and querying of networks, such as
roads, rivers or public transport systems. This is handled in the GraphDB model [36, 37].
More information on these topics can be found in [2].
First order in Spatial Databases
The framework of constraint
databases, introduced by Kanellakis, Kuper and Revesz [41], provides an elegant and
powerful model of spatial databases [49, 51]. In this setting, a spatial database, viewed
as a possibly infinite set of points in real space, is represented as a Boolean
combination of polynomial inequalities. For example, the unit disk in the real plane
except its upper left quadrant would be represented as
{(x,y) *x2
+ y2 ? 1*? (x >= 0 * y
>= 0)}.
By extending the relational calculus
with polynomial inequalities one obtains a simple spatial query
language. For example, the query whether the database S contains a circle can be
expressed as
(*x0)(*y0)(*r)(*x)(*y)((x-x0)2+(y-y0)2=r2
*S(x,y)).
Although variables now range over the
whole of the real numbers, queries expressed in the calculus can still be effectively
computed using methods from symbolic computation.
The expressiveness of the calculus is
limited, however. For example, the combined results of Benedikt et al. [12] and Grumbach
and Su [29] imply that one cannot express in the calculus that the database is
topologically connected. The topological connectivity query can be viewed as the spatial
analogue of the standard relational query of graph connectivity, which is also not
expressible in the standard relational calculus.
4 Deductive Spatial Databases
Consider the following trivial program
R(0)*
R(y)*R(x),
y=x+1.
This program does not terminate,
regardless of the input database (which is not used at all in the
program), because the resulting set R grows unboundedly. Let us therefore see what
happens if a fixed bound is put on the result, by requiring that the result must stay
within some fixed bounded region.
The following example shows that under
this requirement programs may still loop forever:
R(1) *
R(x) * R(x'),
x = x'/2.
Relation R remains in the
interval [0,1] but the program does not terminate; the interval is repeatedly cut in half
and this goes on indefinitely.
In search of termination guarantees, we
therefore put the drastic restriction on programs that no arithmetic (polynomial
inequalities) is allowed in rule bodies. In other words, we concentrate on pure Datalog
programs. Clearly, under this restriction, programs that do not rely on the database, such
as the two
programs given above, can no longer loop forever.
The classical example in pure Datalog
is the transitive closure program TC:
R(x,y) * S(x,y)
R(x,y) * R(x,z),
S(x,y).
On finite inputs S, TC always
terminates. On infinite inputs, however, this is of course no longer true. For example, TC
will loop forever on the semi-algebraic set S defined by y=x+1; in
the n-th iteration all points of the form (x,x+n) are added.
It is a basic fact of standard
relational database theory that on finite relations, pure Datalog programs always
terminate. One might hope that this property carries over to infinite but bounded, semi-algebraic
sets. For example, by bounding the above-mentioned set y=x+1 as
y=x+1*0 ? x ? 10,
TC will terminate after ten iterations.
This hope is unjustified, however. On the bounded input
S= {(x,y) *y=2x
* 0 ? x ? 1/2},
TC will loop forever; every iteration
adds a line segment of increasing slope.
The previous example also shows that,
even if we require not only the input set but also the output set to be
bounded, programs still can loop forever. Indeed, the lines of increasing slope all remain
in the unit square.
As a last resort in our search for
termination guarantees, we therefore investigate the stronger requirement that the output
set is always contained in the input set. In other words, we concentrate on selection
queries. The output of a selection query on a bounded input is clearly also bounded.
Whether or not a program expresses a
selection query is a semantic property, but we can put a strong
syntactic range restriction on the rules so that only selection queries can be expressed:
every rule has to be of the form
R(x,y) * S(x,y)
, B,
where S is as always the EDB
predicate denoting the database and B is the rest of the rule body. A program in
which every rule is of this form is called a selection program. The
transitive-closure program is not a selection program.
The following program, a
range-restricted variation of TC, is an example of a selection program:
R(x,y)*S(x,y),
¬ S(z,x)
R(x,y)*S(x,y),
S(x,z), R(z,y).
On unbounded semi-algebraic inputs,
selection programs can loop forever, as witnessed by the set
S = {(x,y)*y=x+1* x >= 0} E {(0,0)}.
On this S, the above program
will not terminate; in the first iteration the point (0,0) is added, and in
subsequent iterations subsequent points of the form (n,n+1) are added. However, if
we bound S, e.g., by adding the constraint y ? 10, the program will
terminate.
Hence, our last hope is that
selection programs on bounded
semi-algebraic inputs always terminate.
The following selection program
destroys also this last hope:
R(x,y) * S(x,y),
S(x,x)
R(x,y) * S(x,y),
R(z,x).
On the bounded input
S = {(x,y)* y = x/2
* 0 x ? 1}E{(1,1)},
this program does not terminate; in the
first iteration the point (1,1) is added, in the second iteration the endpoint of
the line segment in the database is added, and in subsequent iterations the line segment
is
repeatedly cut in half, going on indefinitely. [44] presents some results in this context.
5 Topological Aspects
Motivated by applications in
geographical information systems where only relative positions of spatial data are of
importance, a number of researchers have investigated topological properties of spatial
databases and query languages in which to express them. Topological queries are those
whose result only depends on the topological aspects of the spatial data.
In the polynomial model, the natural
query language, as mentioned before, is first-order logic over the reals (expanded with a
symbol to address the database). Relative to this query language, we say that two
spatial databases are called topologically elementary equivalent if they cannot be
distinguished by such topological first-order queries. Kuijpers, Paredaens and Van den
Bussche [50] have given an
effective characterization of topological elementary equivalence of closed databases in
the real plane. This characterization is based on a known topological property of
semi-algebraic sets, namely that locally around each point they are "conical''. The
points in the database are partitioned according to the types of their cones. Roughly, the
characterization then says that two databases are topologically elementary equivalent if
and only if the cardinalities of the equivalence classes of their partitions match.
The same authors have also investigated
the question of which topological properties can be expressed in first-order logic.
They have introduced a two-tiered logic for expressing topological properties, which is
subsumed by first-order logic over the reals. and they prove on certain classes of
databases that the two
logics are actually equivalent.
Another approach to investigating
topological properties of planar spatial databases was followed by Papadinitriou, Suciu
and Vianu [48]. They consider labeled regions in the plane and investigate topological
relationships between them that were introduced by Egenhofer and Franzosa [27]. The query
languages that they consider are not coordinate or point based, but rather region based,
in the sense that quantifiers range over regions. They have shown that the closure of
these relationships under appropriate logical operations yields a language that is
complete for topological properties of spatial databases.
Some queries only involve properties of
the database that are topological in nature. In this class of queries, concepts such as
adjacency, connectivity, and containment are important. Queries like "Is there a
highway connecting Brussels to Paris?'' or "Give all countries in Europe adjacent to
the Atlantic'' are typical in this respect. Characteristic of topological properties is
that they do not distinguish between two databases that can be obtained from each other by
a topological deformation. We will call such databases topologically equivalent.
In applications in which only
topological properties are under consideration, it may be desirable to be able to work
with a representation of the database which is topologically invariant, meaning
that two topologically equivalent databases will be represented identically. Ideally, as
we will illustrate later, a representation should also be lossless, in the sense
that two databases that are not topologically equivalent will be represented
differently.
This topological property is elaborated
in the context of a limited class of spatial databases consisting of points in the plane R2,
lines between these points, and areas formed by these lines. This model is commonly
referred to as the topological data model [36, 46]. An example application is a
subway or railroad map in which only relative positions of spatial objects such as
stations and tracks are depicted without, for instance, taking the actual length of the
trajectory into account. A survey of application domains that can be modeled in this
manner was given by Laurini and Thompson [46]. Another approach of topological issues in
the context of spatial databases can be found in [4].
6 Some Spatial Database Systems
The World Wide Web hosts a multitude of
commercial spatial database products and scientific prototypes. Geographic Information
Systems (GIS) constitute the most well-known applications of spatial databases, though
other applications (e.g. CAD and robotics) also exist. In this paragraph, the first three
products are well-known implementations of spatial databases, whereas the last two are
examples of numerous commercial GIS products. For an exhaustive list of GIS resources on
the Web, we refer to [1].
1.The ROSE algebra (Gueting and
Schneider, 1995) [6].
The ROSE algebra offers general spatial
data types in 2D, is based on finite resolution geometry, has a complete formal definition
of semantics, and interfaces with any DBMS data model. The main techniques used are
(parallel) traversal of objects, plane-sweep, and graph algorithms. The ROSE algebra has
also been implemented and the Modula-2 source code is freely available from the authors
for study or use.
2. Dedale (Stephane Grumbach et al.)
[7]
Dedale is a prototype of a constraint
database for multidimensional data, conceived for spatio-temporal applications. It is
implemented on top of the O2 DBMS, and uses a graphical user interface for querying
purposes. Its main characteristics are (i) to offer a real abstraction of geometric data
allowing the development of high level extensible query languages with a potential for
optimization, while (ii) allowing the use of optimal computation techniques for spatial
queries.
3. ARC/INFO (ESRI) [8]
ARC/INFO, running on UNIX or Windows
NT, is a powerful GIS server for large or small organizations. ARC/INFO is implemented as
a client to ESRI's Spatial Database Engine (SDE ) universal spatial data server. It
enables the user to create and maintain geographic information, manage large, multi-user
spatial databases, perform sophisticated spatial analysis, etc.
4. TNT-mips (Micro Images)[9]
TNTmips is the Map and Image Processing
System from MicroImages, Inc. for geospatial analysis. Some of the analysis features of
the product include: Elevation Mapping and Orthoimage Creation, Surface Fitting,
Classification and Interpretation, Statistical Measurements, Polygon Fitting, etc.
5. Laser Scan [10]
Laser-Scan provides an advanced
object-oriented GIS application development and spatial database environment together with
a range of solutions built on this software platform.
7 Conclusions and Future Directions
I hope that the reader realizes that we
still do not really understand the interaction between deductive databases and spatial
databases. A lot of investigation is still needed in this domain. It could be that we need
another data model to represent spatial information. On the other hand a lot of knowledge
is available in geometry and computational geometry. Spatial databases are mainly based on
some spatial logic. This spatial logic should be studied deeply. A fruitful
marriage between computational geometry, deduction, computational logic, the spatial logic
and some new data modelling techniques can give birth to a better understanding and a more
efficient threatement of spatial information.
Acknowledgement
I especially thank Bart Kuijpers and
Stijn Dekeyser for their help during the production of this text.
References
[1] Geographical Information Systems
WWW Resource List,http://www.geo.ed.ac.uk/home/giswww.htmly (last visited 9-dec-97)
[2] Ralf Hartmut Gueting's Home Page,
http://voss.fernuni-hagen.de/gebiete/pi4/gueting/home2.html (last visited 9-dec-97)
[3] Spatial Databases,
http://www.luc.ac.be/research/ group s/theocomp/spatial.html (last visited 9-dec-97)
[4] Spatial Database Research Group,
http://spatial. maine.edu/ max/max3.html (last visited 9-dec-97)
[5] Database Group UIA,
http://pucky.uia.ac.be/ makke/Database/index.phtml (last visited 9-dec-97)
[6] The ROSE algebra,
http://voss.fernuni-hagen.de/gebiete/pi4/gueting/home.html (last visited 9-dec-97)
[7] Dedale prototype,
http://www-rocq.inria.fr/verso/dedale/ (last visited 9-dec-97)
[8] ESRI's ARC/INFO, professional GIS,
http://www.esri.com/base/products/arcinfo/arcinfo.html (last
visited 9-dec-97)
[9] TNTmips, The Map and Image
Processing System, http://tnt.microimages.com/tntmips.htm (last visited 9-dec-97)
[10] Laser Scan, http://www.lsl.co.uk/ (last visited 9-dec-97)
[11] Alexandroff P., Elementary
Concepts of Topology. Dover Pub., Inc., New York, 1961.
[12] Benedikt M., Dong G., Libkin L.,
Wong L., Relational expressive Power of Constraint Query Languages. In Proceedings 15th
ACM Symposium on Principles of Database Systems, pages 5-16, 1996.
[13] Bochnak J., Coste M., Roy M.-F., Geometry
algebrique reelle. Springer-Verlag, 1987.
[14] Burton W., Representation of
many-sided polygons and polygonal lines for rapid processing.
Comm. of the ACM, 20, 3, 166--171, 1977.
[15] Chan E., Zhu R., QL/G - A Query
language for Geometric Data Bases.TR CS-94-25, Univ. of Waterloo, 1994.
[16] Chandra A., Harel D., Computable
queries for relational database systems. Journal of Comp. and System Sciences, 21,
2, 156--178, 1980.
[17] Cole P., King C., Quantitative
Geography. John Wiley, London, 1968.
[18] Collins G.E., Quantifier
elimination for real closed fields by cylindrical algebraic decomposition. LNCS 33,
134--183, 1975.
[19] Corbett J.P., Topological
Principles in Cartography. Technical paper 48, US Bureau of the Census, Washington DC,
1979.
[20] Dumortier F., Gyssens M.,
Vandeurzen L., Van Gucht D., On the decidability of Semi-Linearity for Semi-Algebraic Sets
and its Implications for Spatial Databases In Proceedings 16th ACM Symposium on
Principles of Database Systems, pages 68-77. ACM Press, 1997.
[21] Egenhofer M., Frank A., Towards a
spatial query language: User interface considerations. Proc. of VLDB, 1988.
[22] Egenhofer M., A Formal definition
of Binary Topological Relationships. LNCS 367, 457--472, 1989.
[23] Egenhofer M., Reasoning about
Binary topological Relations. \newblock LNCS 525, 143--160, 1991.
[24] Egenhofer M., Topological
Relations between Regions in R2 and Z2. LNCS 692, 316--336,
1993.
[25] Egenhofer M., Franzosa R.,On the
equivalence of topological relations. Int. J. Geographical Information Systems,
523--542, 1994.
[26] Egenhofer M., Spatial SQL: a Query
and Presentation Language. IEEE Transactions on Knowledge and Data Engineering,
Vol. 6, N o.1, 1994.
[27] Egenhofer M., Franzosa R.,
Point-set topological spatial relations. Int. J. Geographical Information Systems,
5(2):161--174, 1991.
[28] Egenhofer M., Herring J., Advances
in Spatial Databases, Lecture Notes of Computer Science, Volume 951, 1995
[29] Grumbach S., Su J., First-order
definability over constraint databases In Montanari and Rossi, Principles and practice
of constraint programming, pages 121-136, 1995.
[30] Guenther O., Efficient structures
for geometric data management. LNCS 377, Springer-Verlag, 1988.
[31] Guenther O., Buchmann A., Research
Issues in Spatial Databases. Sigmod Record Vol 19,4, 61--68, 1990.
[32] Guting R., Geo-Relational Algebra:
A model and query language for geometric database systems. Proc of Extending Data Base
Technology, Venice, 506--527, 1988.
[33] Guting R., Gral: An Extensible
Relational Database System for Geometric Applications. Proc 15th VLDB, Amsterdam,
33-44, 1989.
[34] Guting R., Schneider M.,
Realm-based Spatial Data Types: The ROSE Algebra. TR 141-3-93 of Fern Universitat, Hagen,
1993.
[35] Guting R., Schneider M., Realms: A
Foundation for Spatial Data Types in Database Systems. 3rd Int. Symp. on Large Spatial
Databases, 14-35, Singapore, 1993.
[36] Guting R., GraphDB: A Data Model
and Query Language for Graphs in Databases. Informatik Berichte 155-2/1994,
FernUniversitat, Hagen, 1994.
[37] Guting R., An Introduction to
Spatial Database Systems. The VLDB Journal, 3, 4, 1994.
[38] Gyssens M., Van den Bussche J.,
Van Gucht D., Complete geometrical query languages, In Proceedings 16th ACM Symposium
on Principles of Database Systems, pages 62-67. ACM Press, 1997.
[39] Hirst T., Harel D., Completeness
Results for Recursive Data Bases. Proc. 12th Symp. on Princ. of Database Systems,
Washington, DC, 244--252, 1993.
[40] Hoel E., Performance of
Data-parallel Spatial Operations. VLDB94, 156--167, Chili, 1994.
[41] Kanellakis P., Kuper G., Revesz
P., Constraint query languages. Proc. 9th Symp. on Princ. of Database Systems,
299--313, 1990.
[42] Kemper A., Wallrath M., An
analysis of geometric modeling in database systems. Computing Surveys, 19, 1,
47--91, 1987.
[43] Kuijpers B., Paredaens J., Van den
Bussche J., Lossless Representation of Topological Spatial Data. To appear in LNCS,
Proceedings of the 4th Symposium on Large Spatial Data Bases, Portland (Maine) , 1995.
[44] Kuijpers B., Paredaens J., Smits
M, Van den Bussche J., Termination properties of Spatial Database Programs, Proc. Logic
in Databases, LNCS 1154, pages 95-110, 1996
[45] Kuijpers B., Paredaens J., Data
Models and Query Languages for Spatial Databases, Accepted in DKE
[46] Laurini R., Thompson D., Fundamentals
of Spatial Information Systems. Academic Press, APIC Series 37, 1992.
[47] Moise E.E., Geometric Topology
in Dimensions 2 and 3. Graduate Texts in Mathematics, 47, Springer-Verlag, 1977.
[48] Papadimitriou C., Suciu D., Vianu
V., Topological queries in spatial databases. In Proceedings 15th ACM Symposium on
Principles of Database Systems, pages 81--92. ACM Press, 1996.
[49] Paredaens J., Spatial Databases,
the Final Frontier, ICDT95, LNCS 893, pages 18-32, 1995
[50] Paredaens J., Kuijpers B., Van~den
Bussche J., On topological elementary equivalence of spatial databases. In F. Afrati and
Ph. Kolaitis, editors, Database Theory---ICDT'97, volume 1186 of Lecture Notes
in Computer Science, pages 432--446. Springer, 1997.
[51] Paredaens J., Van den Bussche J.,
Van Gucht D., Towards a Theory of Spatial Database Queries. In Proceedings 13th ACM
Symposium on Principles of Database Systems, pages 279-288, 1994.
[52] Paredaens J., Van den Bussche J.,
Van Gucht D., Full paper of "Towards a theory of spatial database queries''. In
preparation.
[53] Peano G., Sur une courbe qui
remplit toute une aire plane. Mathematische Annalen, 36(a), 157--160, 1890.
[54] Preparata F., Shamos M., Computational
Geometry. Springer-Verlag, 1985.
[55] Rigaux Ph., Scholl M., Multiple
Representation Modelling and Querying. IGIS94, 1994.
[56] Samet H., The Design and
Analysis of Spatial Data Structures. Addison-Wesley, 1990.
[57] Scholl M., Voisard A., Thematic
Map Modeling. Proc. SSD, 167--192, also in LNCS 409, Springer-Verlag, 1989.
[58] Stolboushkin A., Taitslin M.,
Linear vs. Order Constraint Queries over Relational Databases In Proceedings 15th ACM
Symposium on Principles of Database Systems, pages 17-27, 1996.
[59] Tarski A., A decision method
for elementary algebra and geometry. University of California Press, 1951.
Jan Paredaens
University of Antwerp, Belgium
pareda@uia.ua.ac.be |