Partitioning,.Parallel.Join,.and.Cost.Models
Several strategies have been proposed for efficient distributed placement and query
processing. The semi-join operator (Bernstein & Chiu, 1981) applies selection and
projection operations before sending data through the network. Other proposed strategies
for efficient distributed query processing include placement dependency (Liu,
Chen, & Krueger, 1996), which uses dependency relationships between relations
to co-locate fragments for faster processing. This and other alternative strategies
are compared experimentally in Liu and Yu (1993). The most promising solutions
to extra join overheads that characterize many successful parallel and distributed
database systems in shared-nothing environments involve hash-partitioning large
relations into nodes in order to minimize data exchange requirements (DeWitt &
Gerber, 1985; Kitsuregawa, Tanaka, & Motooka, 1983). Parallel hash-join algorithms,
also reviewed in Yu and Meng (1998), consider partitioning and allocating
intervening relation fragments into processors or computer nodes for fast join processing.
These strategies typically allocate a hash range to each processor, which
builds a hash table and hashes relation fragments accordingly. In a shared-nothing
environment, it often becomes necessary to exchange data between nodes in order
to send tuples into the node that has been allocated the corresponding hash-value
range for the join attribute.
Pages:
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395