The most common primitives are: Linear sum (LS=SUM(X));
Sum of squares (SS=SUM(X2)); Number of elements (N); and Extremes (MAX
and MIN).
Although in Figure 3 every node computes partial aggregations for all aggregation
groups, aggregation can also be computed by assigning the computation of specific
aggregation groups to specific nodes (Shatdal & Naughton, 1995). A detailed study
and evaluation of query processing issues in the NPDW is available in Furtado
(2005a).
The repartitioning operation of step 3R in Figure 2(b) is necessary whenever a partitioned
dataset needs to participate in a join but is not partitioned by the join attribute.
Each node is assigned a hash range for the join key, and every node needs to send to
Figure 2. Query processing steps in NPDW: (a) example query (b) query processing
steps
(a) (b)
2 2 Furtado
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
every other node the tuples it has that belong to the hash-range corresponding to that
node. It should be implemented as efficiently as possible to minimize the cost. We
assume a switched network (the cost of repartitioning would be larger on a sharedmedia
hub-based network). A simple parallel repartitioning algorithm would be:
Number the N nodes sequentially;
For (i=1;i
Parallel: every node j sends data to node (j+i) mod N;
The objective of this algorithm is for nodes to exchange data in parallel, to reduce
the repartitioning overhead.
Pages:
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406