We have seen that replication is a key factor for query performance and how query
routing can optimise its utilisation, for example, by approximating the state of the
caches of the cluster nodes. Until now, we have left aside the problem field of updating
our dataset. This might be state-of-the-art for data intensive applications like
OLAP to strictly distinguish between query and update phases. However, a data
warehouse offering data, which is a week old, will not be acceptable much longer.
Rather, the intention should be to be able to run queries over up-to-date data if
needed. And actually, a cluster-based approach to OLAP can provide this.
The simultaneous admission of queries and updates in the presence of replication
necessitates a transaction management and replication control component. Transaction
management is responsible for a correct interleaved execution of concurrent
queries and updates, while replication management assures that updates eventually
affect all copies of the data.
Replication.and.Correctness
A na??ve approach to global correctness would use synchronous replication where
each update immediately goes to all replicas, also referred to as eager replication.
However, such approaches necessitate a distributed atomic commit protocol, which
is not feasible for large number of nodes (Gray, Helland, O??â„¢Neil, & Shasha, 1996).
Pages:
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458