Operation tasks are submitted
as required and resource utilization for disk access, memory and bus, processor, and
network send/receive are used to determine completion time for those tasks. For
instance, the cost of a hybrid hash-join is related to the cost of scanning the relations
from secondary storage, bucketizing them, building a hash table, and probing
into the hash table. For instance, the cost to join relations R1 and R2 considering
the individual scan costs is scanR1 + scanR2 + 2(scanR1 + scanR2) (1-q), where q
denotes the fraction of R1 whose hash-table fits in memory (Steinbrunn et al., 1997).
Disk access rates (measured in MB/sec) are then used to complete the evaluation
of the cost. Similar strategies are applied to evaluate the repartitioning cost, which
involves scanning the datasets, operating on them, assigning buffers, and sending to
destination nodes (with given network bandwidth in MB/sec). A typical number of
instructions used to process different low-level operations and to send and receive
messages (Network) were included as a parameter to the simulator (St?¶hr, M?¤rtens
& Rahm, 2000). For these experiments we used a TPC-H with 100 GB and generic
query Qa of Figure 5a, with default selectivity for attribute values (x, y, w, z ) of
(0.7, 0.7, 0.2, 0.2) respectively.
Figure 9 shows the response time (a) and speedup (b) vs.
Pages:
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419