For efficiently evaluating queries that access or scan whole relations,
round robin partitioning is best suited. However, if only a subset of a relation
is accessed, hash or range partitioning are better than round robin partitioning
because they allow accessing only the data needed (assuming that the tuples
are partitioned on the same attributes used in the selection condition).
??? Data.replication:.The other basic alternative is full replication, that is, each
cluster node holds a copy of the whole database. Queries are served by a single
cluster node; several queries can be evaluated in parallel on different nodes. A
big advantage is that even for complex multijoin queries, no communication
or data shipping between cluster nodes is needed; this might not be an issue
with OLTP, but it is a massive problem with OLAP workloads (R?¶hm, 2000).
However, full data replication limits the scalability with large datasets, as the
whole dataset is stored several times. In spite of the capacity and the low costs
of today??™s hard disks, this might not be a problem storagewise; but it provides
no speedup for larger datasets, but rather for higher workloads, and it induces
high maintenance costs (updates have to be executed on several copies instead
of just once).
??? Hybrid designs: Warehouse schemata are often of a regular form with a central
fact table, which is connected to several dimension tables by foreign key
relationships.
Pages:
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446