In the next figures we represent the contents of each node: filled relation boxes represent
replicated relations and partially-filled ones represent partitioned relations.
The following alternatives will be considered:
??? Partition.and.replicate.strategy.(PRS):.Partition the largest relation (LI in
TPC-H) and replicate all the other ones, as shown in Figure 6. Each node stores
GB
Lineitem (LI) 78
Partsupp (PS) 7.5
Orders (O) 18
Part (P) 2
Supplier (S) 0.1
Customer (C) 1.5
Figure 4. Summary of TPC-H schema: (a) TPC-H schema and (b) relation sizes
(100GB)
(a) (b)
Figure 5. Example query and possible execution plan (TPC-H): (a) generic query
Qa and (b) part of execution plan for Qa
Select..
sum(sales), sum(costs), sum(sales)-sum(cost), n_nation, o_year
From..
part, supplier,
lineitem, partsupp, orders, nation
where..
and p_brand = x
and n_name like y
and o_orderpriority = ???w??™
and ps_availqty > z
group.by
n_nation, o_year;
(a) (b)
Efficient and Robust Node-Partitioned Data Warehouses 2
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
only a fraction of the largest relation (LI) and replicas of all other relations. All
the joins are ReplicaJ in this case. This strategy allows joins to be processed
without any data exchange between nodes, but the overhead of processing
large replicated relations can be prohibitive.
Pages:
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411