The number of false hits is large for KEBI
under uniform data distribution, but reduces sharply as the skew in data distribution
increases. As compared to the SBI, KEBI saves index storage space, reducing the
number of index pages retrieved, although filtering excess tuples requires additional
processing time and could increase the number of data pages retrieved. We do not
consider KEBI for comparison here as it has only been studied for equality or point
queries and DSS queries typically include complex range queries.
Range-Based.Bitmap.Indexing
SBIs can cause significant storage overhead if the attributes to be indexed have
high cardinality. This is the motivation for range-based bitmap indexing (RBBI)
(Wu & Yu, 1998). In RBBI, the attribute values are partitioned into a small number
of ranges and a bitmap vector is constructed to represent each range and not each
distinct value. A bit (1 or 0) is used to indicate whether an attribute in a tuple is
within a specific range or not.
Unless the distribution of the attribute values is known, ranges can be unevenly
populated resulting in highly unbalanced query access times. RBBI utilizes the
distribution of attribute values in the data to construct a range-based bitmap index
for high cardinality attributes with skew. It uses a dynamic bucket expansion and
contraction approach in which data are first scanned into the buffer to construct the
Table 3.
Pages:
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359