The node-partitioned data warehouse (NPDW) is a generic architecture
for partitioning and processing over the data warehouse in such an environment.
The objective of this chapter is to discuss and analyze partitioning, processing, and
availability issues in the design of the NPDW.
Background
Typical data warehouse schemas have some distinctive properties: they are mostly
read-only, with periodic loads. This characteristic minimizes consistency issues
Efficient and Robust Node-Partitioned Data Warehouses 20
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
which are a major concern regarding the parallelization of transactional schemas
and workloads; data warehouse schemas usually have multidimensional characteristics
(Kimball, Reeves, Ross, & Thornthwaite, 1998), with large central fact
relations containing several measurements (e.g., the amount of sales) and a size
of up to hundreds or thousands of gigabytes, and dimensions (e.g., shop, client,
product, supplier). Each measurement is recorded for each individual combination
of dimension values (e.g., sales of a product from a supplier, in one shop and for
an individual client). While there are specific analysis-oriented data marts stored
and analyzed using some nonrelational multidimensional engine (Kimball, Reeves,
Ross, & Thornthwaite, 1998), our focus is on the large central repository warehouses
stored in a relational engine; warehouses are used for online analytical processing
(OLAP), including reporting and ad-hoc analysis patterns.
Pages:
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392