The focus of the workload-based
algorithms in Furtado (2004a, 2004b, 2004c) is to partition the whole relations in
the join graph in a way that results in reduced repartitioning cost and redistribution
requirements. Given this JGQW graph, a simple partitioning algorithm can assign
partitioning attributes starting by the most frequent join (R3.A2, R2.A2 in the fig-
Figure 1. Join and query graphs for the example: (a) join graph and (b) join graph
for query workload (JGQW)
R1.A1 R2.A1 R3.A1
R3.A2
R5.A3 R3.A3
R4.A2
c1
c2
c3
R1.A1 R2.A1 R3.A1
R3.A2
R5.A3 R3.A3
R4.A2
6
8
3
R2.A2 R3.A2
10
(a)
(b)
208 Furtado
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
ure), to reduce the amount of repartitioning. More complex algorithms may search
in the space of possible alternative execution plans, evaluating a cost for each possible
plan (Kossman & Stocker, 2000).
In a data warehouse, there are an additional set of patterns that are important to
guide the partitioning algorithm. Data warehouses have facts and dimensions, facts
reference dimensions, and queries almost always join dimensions by their primary
key. As a result, the algorithm we proposed in Furtado (2004c) partitions dimensions
by primary keys (except small ones, which are replicated) and applies workloadbased
partitioning to facts.
Pages:
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398