Chan and Ioannidis (1999) showed that range encoding is the best bitmap
encoding technique for range-queries. However, range encoding might not always
be practical for high-cardinality attributes or for a large number of bins. As we will
show in the next section, range-encoded bitmap indices do not compress as well as
equality-encoded bitmap indices.
The general rule for choosing the number of bins is as follows: The more bins, the
less work during the candidate check. The reason is fairly straightforward. In general,
as the number of bins increases, the number of candidates per bin decreases.
Let us consider the following example. Assume the base data follows a uniform
random distribution. With a typical page size of 8KB, using the projection index,
Bitmap Indices for Data Warehouses 6
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
a page could hold 2048 4-byte words. If one in 1000 words is accessed during the
candidate check, it is likely that every page containing the attribute values would
be touched (Stockinger et al., 2004). We, thus, suggest using 1000 bins or more.
For equality encoding there is an additional trade-off, namely using more bins may
also increase the cost of the index scan. For range encoding the cost of the index scan
is not significantly affected by the number of bins because one needs to access no
more than two bitmaps to evaluate a range query (Chan & Ioannidis, 1999).
Pages:
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334