OLAP involves complex
query patterns, with joins involving multiple relations and aggregations. These
query patterns can pose difficulties to the performance of shared-nothing partitioned
environments, especially when nodes need to exchange massive quantities of data.
While very small dimensions can be replicated into every node and kept in memory
to speed up joins involving them, much more severe performance problems appear
when many large relations need to be joined and processed to produce an answer.
We use the schema and query set of the decision support performance benchmark
TPC-H (TPC) as an example of such a complex schema and query workload and
also as our experimental testbed. Performance and availability are relevant issues
in data warehouses in general and pose specific challenges in the NPDW context
(standard computer nodes and nonspecialized interconnects).
Some research in recent years has focused on ad-hoc star join processing in data
warehouses. Specialized structures such as materialized views (Rousopoulos, 1998)
and specialized indexes (Chan & Ioannidis, 1998; O??â„¢Neil & Graefe, 1995) have
been proposed to improve response time. Although materialized views are useful in
a context in which queries are known in advance, this is not the case when ad-hoc
queries are posed. Parallel approaches are therefore important as they can be used
alone or in conjunction with specialized structures to provide efficient processing
for any query pattern at any time.
Pages:
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393