The actual decision on whether to partition or
replicate relations requires a cost model that we review later.
Partitioning.Strategies
In this section we define a set of strategies that take into consideration partitioning
and replication. In the following section a generic cost model will also be presented.
Consider the TPC-H data warehouse schema of Figure 4 from TPC (1999). It contains
several large relations, which are frequently involved in joins. The schema
represents ordering and selling activity (LI-lineitem, O-orders, PS-partsupp, P-part,
S-supplier, C-customer), where relations such as LI, O, PS, and even P are quite
large. There are also two very small relations, NATION and REGION, not depicted
in the figure as they are very small and can be readily replicated into all nodes.
2 4 Furtado
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
Figure 5(a) shows a generic query Qa, and a possible ???star-join??? execution plan for
that query is shown in Figure 5(b).
Given this example schema, the challenge is how to partition, process, and provide
availability to obtain an efficient low cost, platform-independent shared-nothing
data warehouse. We wish to determine what would be a good partitioning strategy
to process queries, considering that each relation could either be fully partitioned
or replicated.
Pages:
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410