Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
Introduction
Querying large datasets to locate some selected records is a common task in data
warehousing applications. However, answering these queries efficiently is often
difficult due to the complex nature of both the data and the queries. The most
straightforward way of evaluating a query is to sequentially scan all data records
to determine whether each record satisfies the specified conditions. A typical query
condition is as follows: ???Count the number of cars sold by producer P in the time
interval T???. This search procedure could usually be accelerated by indices, such as
variations of B-Trees or kd-Trees (Comer, 1979; Gaede & Guenther, 1998). Generally,
as the number of attributes in a dataset increases, the number of possible
indexing combinations increases as well. To answer multidimensional queries effi-
ciently, one faces a difficult choice. One possibility is to construct a separate index
for each combination of attributes, which requires an impractical amount of space.
Another possibility is to choose one of the multidimensional indices, which is only
efficient for some of the queries. In the literature, this dilemma is often referred to
as the curse of dimensionality (Berchtold, Boehm, & Kriegl, 1998; Keim & Hinneburg,
1999).
Pages:
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313