Go to Bottom of Page   |   Up one Level

LECTURE 10
Chee Yap
Computational Geometry II: Window Queries

Computational Geometry II: Window Queries


Go to Bottom of Page   |   Up one Level

OVERVIEW
1. Introduction
2. Preprocessing Problems
3. Orthogonal Range Searching
4. Range Trees
5. Interval Trees
6. Priority Search Trees
7. Point Stabbing
8. R-Trees


[Previous] [This] [Next]            [Top] [Bot]
Introduction


[Previous] [This] [Next]            [Top] [Bot]
Preprocessing Problems


[Previous] [This] [Next]            [Top] [Bot]
Window Query Problems


[Previous] [This] [Next]            [Top] [Bot]
Orthogonal Range Searching, I


[Previous] [This] [Next]            [Top] [Bot]
Orthogonal Range Searching, II


[Previous] [This] [Next]            [Top] [Bot]
Interval Tree


[Previous] [This] [Next]            [Top] [Bot]
Window Querying of Aligned Segments


[Previous] [This] [Next]            [Top] [Bot]
Spatial Queries and Databases

We now take a slightly different point of view: suppose we are dealing with very large data sets involves geometric objects such as points, lines, segments, polytopes and paths. In databases, these are called spatial databases. The difference from what we have discussed above is that we no longer can assume that the full data set can fit into main memory. This has two immediate consequences:

In practice, the most important spatial data applications arise in 2-dimensions. Most modern databases systems support efficient for such structures objects. In particular, Postgrel supports a very rich set of such operators (see Lecture 9). The standard data structure used here are R-Trees and its many variants. R-trees are a generalization of B-trees, one of the most important data structures in databases (the other is hashing). Traditional data structure design criteria seeks to optimize query time alone, as the data is seen as static. New applications and demands require more dynamic operations. Moreover, sublinear size data structures are expected to grow in importance.

R-trees were introduced by A. Guttmann [4]. Since its introduction, many variants of R-trees have been investigated. Until recently, there were no general worst case complexity bounds for such data structures. This has begun to change recently [3,1]. The trick lies in exploiting additional complexity parameters of the input, including stabbing numbers and aspect ratios of rectangles.

So the basic problem is this: given a set S of d-boxes in \mathbb Rd, to preprocess S so that given any query box q, we can efficiently retrieve all b Î S that intersects q. Call this the box-box query problem. A special case of this is when q is a single point, and then it becomes the point-box query problem.

When we desire to perform this query for a set of geometric objects in \mathbb Rd that are not boxes, it is still possible to use this data structure if we first replace each objeve by its minimum bounding box (MBB). In this case, the box-box query problem can be regarded as a preliminary ``filtering step'' to produce a superset of what we really needed. The subsequent ``refinement step'' can be regarded as a straightforward sequential search through the output from the filtering step. This will be efficient if the filtering step do not produce too many false hits. See Zhou and Suri [xx] for such examples.

BB Hierarchy.   The concept of a bounding-box hierarchy (or BB Hierarchy) for S is quite generic. Most of the search structures we are interested are special forms of such a hierarchy.

Let |S|=n. A BB hierarchy for S is a search tree T with n leaves, and for each leaf v of T, there is an associated box Bv Î S. Morever every box in S is associated with some leaf v. For a node u Î T, let Cu denote the set of boxes at the leaves of the tree Tu rooted at u. The smallest enclosing box for the cluster Cu is denoted Bu. Call Bu and Cu the bounding box and cluster at u, respectively. Thus the space used by T is linear. Note that T is not restricted to binary trees. However, we assume in the following that each node u has at most b children, where b is a constant.

We can use a BB Hierarchy to answer any box query q on S in the obvious way: let q be a box. Starting at the root of T, suppose we are at a general node u. If q ÇBu=Æ, we are done. If Bu Í q then we report every box in Cu. Otherwise, we recursively visit all the children of u.

Let us analyze the query time for this algorithm: we introduce the notion of crossing: say q crosses a node u of T if BuÇq is non-empty and Bu is not contained q. We define the crossing number of T to be the maximum number of nodes in T that crosses any query box q. It follows that the time to answer any query (using conventions established at the beginning) is
Q(n,k) = O(c(T) + k)
where c(T) be the crossing number of T and k is the output size. Note that the O(c(T)+ k) here depends on b, which we suppress. To make this dependence on b explicit, we can write
Q(n,k) = O(b·c(T) + k).

R-Trees.   An R-Tree is one form of a BB Hierarchy. R-trees were introduced by A. Guttmann [4]. Since its introduction, many variants of R-trees have been investigated.

To understand R-Trees, we must first understand the concept of a B-Tree. These are search trees parametrized by a number b. All nodes u has at most b children. They also have at least b/2 children except that the root is allowed to have as few as 2 children. The unique feature is that all the leaves of a B-Tree lie in the same level. Given these constraints, there are relatively simple and natural algorithms for insertion and deletion into B-Trees. We will not elaborate on them here. The constraints also means that a B-Tree has height at most (lgn)/(lg(b/2))+O(1).

What distinguish one BB Hierarchy from another is the way they form the clusters (viewed from a bottom up perspective) or the way clusters are split (viewed from top down). Various heuristics are discussed in the literature. See for example [2].


[Previous] [This] [Next]            [Top] [Bot]
R-Trees with Low Stabbing Number

Until recently, there were no general worst case complexity bounds for such data structures. This has begun to change recently [3,1]. The trick lies in exploiting additional complexity parameters of the input, including stabbing numbers and aspect ratios of rectangles.

[FIGURE with strips]

[Previous Section] [Next Section] Go to Top of Page

References

[1]
P. K. Agarwal, M. de Berg, J. Gudmundsson, M. Hammar, and H. J. Haverkort. Box-trees and r-trees with near-optimal query time. In Symposium on Computational Geometry, pages 124-133, 2001.

[2]
S. Brakatsoulas, D. Pfoser, and Y. Theodoridis. Revisiting r-tree construction principles. In Proc. 6th East-European Conf. on Advances in Databases and Information Systems. Springer-Verlag, 2002. Lecture Notes in CS, LNCS No.2435. ADBIS'02, Bratislava, Slovakia, September 2002.

[3]
M. de Berg, J. Gudmundsson, M. Hammar, and M. H. Overmars. On r-trees with low stabbing number. In European Symposium on Algorithms, pages 167-178, 2000.

[4]
A. Guttmann. R-trees: a dynamic indexing structure for spatial searching. In Proc. ACM-SIGMOD Intl. Conf. on Management of Data, pages 47-57, 1984.




File translated from TEX by TTH, version 3.01.
On 30 Apr 2003, 17:47.