(1991): a partitioning cost weight, ??, and a local processing cost
weight, ?±, so that ??/?± denotes the ratio of partitioning costs to local processing costs,
for example, ~2 (Sasha et al., 1991). A cost-based optimizer is used to determine
the most appropriate execution plan (Kossman & Stocker, 2000; Steinbrunn et al.,
1997). A join order determines the order by which relations are joined. Assuming
the datasets are joined using an algorithm such as parallel hybrid hash-join, at each
step an additional relation is joined to the current intermediate result set IRi (selec-
2 8 Furtado
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
tion and projection operators are applied as soon as possible to reduce the size of
the datasets that need to be processed). Given the result set IRi and a relation Rj,
equations (1) and (2) represent the processing costs for a single server and a nodepartitioned
system where Rj is replicated into all nodes and IRi is partitioned:
one server system : ( ) a ?— i j IR +R
(1)
replicated join : ??« ??¶ a ?— ??¬ ??·
?? ???
i
j
IR +R
N
(2)
Equations (3) and (4) represent the local processing cost (LC) and repartitioning
cost (RC) when both datasets are partitioned. The RC cost in (4) is only incurred
when the datasets are not co-located.
Pages:
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416