With PRS, the join execution plan of Figure 5(b) would be executed without
any data exchange between nodes, but each node would need to process full
O and PS relations, which are 18 and 7.5 GB in size considering TPC-H with
100 GB (scale factor 100).
In order to avoid replicating very large relations, a modified strategy is to replicate
dimensions and partition every fact, while also co-locating LI and O:
??? Hash-partition.fact.and.replicate.dimensions.strategy.(PFRD-H): Partition
relations identified as facts by the user (LI, O, and PS in TPC-H), co-locating
LI and O. With PFRD-H, the execution plan of Figure 4b requires repartitioning
of only two datasets: the intermediate result LI-O-P-S and relation PS. The
join between LI and O is a LocalJ.
??? Workload-based partitioning (WBP): A workload-based strategy where
hash-partitioning attributes are determined based on schema and workload
characteristics. We use the strategy proposed in Furtado (2004c). The partitioning
algorithm is:
1. Dimensions:.Small dimensions are replicated into every node (and optionally
cached into memory). Nonsmall dimensions can simply be hashpartitioned
by their primary key. This is because that attribute is expected
to be used in every equi-join with facts, as the references from facts to
dimensions correspond to foreign keys.
The determination of whether a dimension is small can be cost-based or,
for simplicity, based on a user-defined threshold (e.
Pages:
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412