??? Discussion: Data partitioning is a very popular approach to the physical design
for parallel databases. It provides a very good scalability with larger datasets,
especially if the partitioning scheme is free of data skew, that is all partitions
are about of the same size, and if all subqueries are evaluated locally on just
one data partition. Otherwise, if for example two partitioned relations are joined
on noncollocated join-attributes, large amounts of data must be shipped between
the cluster nodes. This might not be an issue in an OLTP environment,
but given the complex OLAP queries that typically access large parts of a
database it is important to avoid any kind of noncollocated join processing.
The main advantage of replication over partitioning is that no distributed query
processing is necessary at all. Instead of intraquery parallelism, it optimises interquery
parallelism in that different queries can be evaluated in parallel and without
interference on separate cluster nodes. The disadvantages of full replication are
the limited scalability with the data size and the maintenance costs. Updates must
be propagated to all replicas in the cluster, which gives rise to a number of problems
with regard to correctness, update performance, and scalability. Hence, this is
typically taken as ???rule-out??? argument for replication.
Pages:
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448