First, the designer is
responsible for defining an Execution Plan for the scenario. The definition of
an execution plan can be seen from various views. The Execution Sequence
involves the specification of (a) which process runs first, second, and so on;
Data Warehouse Refreshment 2
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
(b) which processes run in parallel; or (c) when a semaphore is defined so that
several processes are synchronized at a rendezvous point. ETL processes normally
run in batches, so the designer needs to specify an Execution Schedule,
that is, the time points or events that trigger the execution of the workflow as
a whole. Finally, due to system crashes, it is imperative that a recovery plan
exists, specifying the sequence of steps to be taken in the case of failure for
a certain process (e.g., retry to execute the process, or undo any intermediate
results produced so far). In the ETL case, due to the data centric nature
of the process, the designer must deal with the relationship of the involved
processes with the underlying data. This involves the definition of a primary
data flow that describes the route of data from the sources towards their fi-
nal destination in the data warehouse, as they pass through the processes of
the workflow.
Pages:
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253