SEARCH
0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Prev | Current Page 325 | Next

Robert Wrembel and Christian Koncilia

"Data Warehouses and Olap: Concepts, Architectures and Solutions"

To
make the discussion more concrete, let us assume that a machine word is 32 bits.
In this case, a WAH tail contains exactly 31 bits from the input bitmap and a WAH
fill contains a multiple of 31 bits that are all the same (either 0 or 1). Because the
bitmap index is known to be efficient for low cardinality attributes, we further restrict
our discussion to high cardinality attributes only, say, c > 100. In an equalityencoded
bitmap index, there are c keys (distinct values of the attribute) and thus
c bitmaps. We do not know exactly how many bits are set to 1 in each individual
bitmap. However, we know that the total number of bits that are 1 is exactly N (the
number of rows in the dataset). In the worst case, there are (N+c) WAH runs in the
bitmaps, where N refers to maximum number of tail words (each containing a single
bit set to 1) and c refers to the maximum number of runs at the end of each bitmap
that are not terminated with a tail word. Each WAH run is encoded by two machine
words. Therefore, we need a total of 2(N+c) words to represent the bitmaps. Assuming
each key is encoded by one word along with one additional word to associate
the key with the bitmap, the total index size is 2N+4c words. In most cases, the attribute
cardinality c is much smaller than N. In these cases, the WAH compressed
equality-encoded bitmap index size is at worst 2N words.


Pages:
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337