Figure 7 shows the partitioning that resulted from applying the WBP strategy to
TPC-H query set. Concerning the execution plan of Figure 4b, this strategy allows
joins LI to O and LI-O-P-S to PS to be processed as LocalJ. Repartitioning is necessary
only for intermediate dataset LI-O.
??? WBP with bitmap join indexes (WBP+JB): We have materialized join bitmaps
in every node for attributes (p_brand, n_name, o_orderpriority, ps_availqty) to
speed up the query of Figure 5. For instance, before scanning the LI relation,
the associated bitmap join indexes such as the one for Brand x is scanned.
This way, only the LI rows associated with Brand x are processed any further,
including repartitioning data.
Figure 7. WBP partitioning
Efficient and Robust Node-Partitioned Data Warehouses 2
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
In the next section we review a generic cost model for the strategies, taking into
account factors such as the number of nodes and network bandwidth.
Cost.Model
The main processing costs (listed next) are repartitioning, data communication, local
processing, and merging:
a. Repartitioning.cost.(RC): Partitioning a relation consists of retrieving the
relation from secondary memory, dividing it into fragments by applying a
hash function to a join attribute, and assigning buffers for the data to send to
other nodes.
Pages:
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414