Partitioning, that is, data from a relation
goes to different nodes, typically results in intraquery parallelism. Replication in
turn leads to interquery parallelism, as different nodes can evaluate queries in parallel.
Using these design primitives, we have the following basic alternatives for
physical design in a database cluster:
??? Data.partitioning: The most common form of data partitioning in a parallel
database environment is horizontal partitioning. With horizontal partitioning,
the tuples of a relation are divided (or declustered) among many or all nodes
of the cluster such that each tuple resides on only one node. There are several
partitioning strategies possible in order to decide which tuple is stored at what
node: round robin partitioning, hash partitioning, and range partitioning. Round
robin partitioning is the only partitioning strategy, which is not based on the
actual values of the data. Instead, assuming a cluster consisting of n nodes, the
ith tuple is simply stored on the (i mod n)-th node. In contrast, with the other
partitioning strategies one or more attributes from the given relational schema
are designated as partitioning attributes. Hash partitioning hashes each tuple on
the partitioning attributes using a hash function on the range [1, . . . , n]. Range
partitioning assigns value ranges of the partitioning attributes to certain cluster
nodes.
Pages:
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445