We will describe some techniques related to the physical database
design, pipelining, and parallelism which are crucial for the whole ETL process.
We will propose our practical approach, ???infrastructure based ETL???; it is not a
tool but a set of functionalities or services that experience has proved to be useful
and widespread enough in the ETL scenario, and one can build the application on
top of it.
Extraction, Transformation, and Loading Processes 8
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
Introduction
ETL stands for extraction, transformation, and loading, in other words, for the data
warehouse backstage. A variety of commercial ETL tools exist in the market (IBM,
2005; Informatica, 2005; Microsoft, 2005; Oracle, 2005), with a recent market review
of Gartner Research (Gartner, 2005). A lot of research efforts exist (Golfarelli
& Rizzi, 1998; Husemann, Lechtenborger, & Vossen, 2000; Tryfona, Busborg, &
Christiansen, 1999; Vassiliadis, Simitsis, & Skiadopoulos, 2002, May; Vassiliadis,
Simitsis, & Skiadopoulos, 2002, November) mostly targeting modeling (conceptual,
logical) and methodology issues (like logical modeling of ETL workflows).
Some works are focused on the end-to-end methodology for the warehouse and
ETL projects (Kimball & Caserta, 2004; Kimball, Reeves, Ross, & Thornthwaite,
1998; Vassiliadis, Simitsis, Georgantas, & Terrovitis, 2003) targeting the complete
life cycle of the DW project, describing how to plan, design, build, and run the
DW and its ETL backstage.
Pages:
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188