In other
words, we have to optimize the sequence of the ETL operations involved in
the overall process.
Up to now, the research community has confronted the problem of the optimization
of data warehouse refreshment as a problem of finding the optimal
strategy for view maintenance. But this is not sufficient with respect to
mechanisms that are employed in real-world settings. In fact, in real-world data
warehouse environments, this procedure differs to the point that the execution
of operational processes (which is employed in order to export data from
operational data sources, transform them into the format of the target tables,
and finally, load them to the data warehouse) does not like as a ???big??? query;
rather it is more realistic to be considered a complex transaction. Thus, there is
a necessity to deal with this problem for a different perspective by taking into
consideration the characteristics of an ETL process presented in the previous
subsection. One could argue that we can possibly express all ETL operations
in terms of relational algebra and then optimize the resulting expression as
usual. But, the traditional logic-based algebraic query optimization can be
blocked, basically due to the existence of data manipulation functions.
However, if we study the problem of the optimization of ETL workflows from
its logical point of view, we can identify several interesting research problems
and optimization opportunities.
Pages:
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256