In Figure 1, we abstractly describe the general framework for ETL processes. On
the left side, we can observe the original data stores (sources) that are involved in
the overall process. Typically, data sources are relational databases and files. The
data from these sources are extracted by specialized routines or tools, which provide
either complete snapshots or differentials of the data sources. Then, these data are
propagated to the data staging area (DSA) where they are transformed and cleaned
before being loaded into the data warehouse. Intermediate results, again in the form
of (mostly) files or relational tables are part of the data staging area. The data warehouse
(DW) is depicted in the right part of Figure 1 and comprises the target data
stores, that is, fact tables for the storage of information and dimension tables with
the description and the multidimensional rollup hierarchies of the stored facts. The
loading of the central warehouse is performed from the loading activities depicted
in the right side before the data warehouse data store.
Despite the plethora of commercial solutions that offer ad-hoc capabilities for the
creation of an ETL scenario, a designer/administrator needs a concrete method to
develop an efficient, robust, and evolvable ETL workflow. Therefore, this chapter
intends to point out the main challenges and issues concerning the generic construction
of ETL workflows.
Pages:
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233