Repartitioning is similar but involves a fragment in each node.
Multiple nodes can rehash and exchange relation fragments simultaneously.
b. Data.communication.cost.(DC): The data communication cost is monotonically
increasing with the size of the data transferred. We assume a switched
network, as this allows different pairs of nodes to send data simultaneously
(with no collisions). This, in turn, allows the repartitioning algorithm to be
implemented more efficiently.
c. Local.processing.cost.(LC): The local processing cost for the join operation
typically depends on whether the join is supported by fast access paths such
as indexes and the size of the relations participating in the join. For simplicity,
we assume these costs also increase monotonically on the relation sizes,
although, in practice, this depends on several parameters, including memory
buffer size.
d. Merging.cost.(MC): The merging cost is related to applying a final query to
the collected partial results at the merging node. We do not consider this cost
as it is similar in every case and independent of the other ones.
Given these items, the next objective is to represent the cost as an expression involving
the local processing and repartitioning costs (here we consider the data
communication cost within the repartitioning cost). We define weighting parameters
as in Sasha et al.
Pages:
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415