For example, the attribute mass
fraction from a supernova simulation is expected to be in the range of 0 and 1. If,
for some reason, the mass fraction is not known, scientists typically enter the value
-999 to represent a bad or missing value. In this example, equi-width binning would
produce bins starting from -999. This results in too many empty bins and thus cannot
be combined to produce well-balanced equi-depth bins. In contrast, the approach
of sampled histograms is generally more reliable in detecting this type of unusual
outliers and typically produces well-balanced bins.
Space. Complexity:.........................
Sizes of Compressed Bitmap Indices
The space complexity of uncompressed bitmap indices was studied in Chan and
Ioannidis (1998, 1999). In this section, we analyze the size of compressed bitmap
indices. Our discussion mainly focuses on the WAH compression method since BBC
0 Stockinger & Wu
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
compression was extensively studied in Johnson (1999). We give an upper bound
of the worst-case size and provide an experimental study of compressed bitmap
indices for various application datasets.
Index Size: Worst-Case Behavior
In the previous section we defined a WAH run to be a fill followed by a tail.
Pages:
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336