g., DB2 and Oracle), the data almost
always need transformations and must be stored in a staging table, but storing data
in a table is more expensive than storing them in files. Another important issue is
that data sometimes come from a DBMS, but sometimes do not, or come from a
closed system where interface types, emission criterion, naming rules, and so on
are predefined and typically not negotiated.
An ETL acquisition process, even if limited to file acquisition, must be able to
manage a great variety of situations, for example, a big file one time a day (zipped,
Extraction, Transformation, and Loading Processes 0
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
compressed, or none) or many little files every few minutes. Some types of files
are positional, other CSV, or coded in XML; sometimes they are plan files or in
master-detail fashion, in ASCII or binary format. Naming and location rules have
a limit only in the human imagination.
It is very difficult to achieve a context formalization/generalization useful to closely
define a model for data acquisition in ETL, but one can define a loose schematization
as a base for application software. As one can see in Figure 5, the highest entity is
the flow, a set of logically correlated data; an application may need more than one
flow.
Pages:
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215