Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
most appropriate partitioning attributes, which should be related to typical query
access patterns. But while they are targeted at generic parallel databases and may
require tight integration with a specific cost predictor and optimizer (Rao et al., 2002),
we discuss generic data partitioning that is independent of the underlying database
server and targeted at node partitioned data warehouses (Furtado, 2004a, b, c). Consider
the query Q = { R1.A2, R2.A4 | R1.A1=R2.A1 ?› R2.A1=R3.A1 ?› R3.A2=R4.A2 ?›
R3.A3=R5.A3}, where Ri are relations and Ai are attributes. The join graph of Figure
1a is a graph where vertices correspond to attributes R.A participating in equi-joins
and the edges depict the set of equi-joins between those attributes. A component is
a set of interconnected vertices and its edges. The join graph of the query workload
(JGQW) shown in Figure 1b adds every join pattern occurring in the workload??”the
set of historical or expected queries??”with a weight on each edge representing the
frequency of occurrence of the corresponding join on the query workload (either a
percentage or number of occurrences) (Furtado, 2004a).
Nodes of a component of a join graph form a set of relations that can be joined
without requiring repartitioning and redistribution.
Pages:
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397