In an ETL application that does not consider
the aspects of efficient managing, the interrupted DW loads cannot be defined as
???robust.??? Some works in the literature discuss the problems related to resumption
of interrupted DW loads (Labio, Wiener, Garcia-Molina, & Gorelik, 2000).
The simplest way to guarantee the data consistency in case of failure is to manage
a global rollback involving all loaded and modified data, quite easy for loaded data
(all the loaded partitions need to be truncated), a bit more difficult for the modi-
fied data (tables must be restored with the previously saved data). This approach
is functional in case of serious and complex failures, but when a problem involves
only the data portion of the ETL process (e.g., an updating of a dimensional table),
it could be unacceptable to throw out all jobs just done, especially when the entire
process requires hours.
A better way to manage partial failures is to organize the ETL process in functional
components (even useful for documentation/modularization purposes) that can
be individually recovered. Previously, we described processUnits (small blocks
of code) and the synchronization engine that starts them according to predefined
rules. This modularization is too fine-grained for recovery purposes (how can one
recover from a failure of a single processUnit instance), so we built over them a
logical container of processUnit called component.
Pages:
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223