It is impossible to recognize
a language for ETL and build software as a consequence (e.g., for SQL); it is always
(or almost) necessary to write from scratch the code that performs that extraction
and that transformation. Only in the loading phase (for a specific target DBMS),
it is possible to establish some rules. On the other hand, ad-hoc scripts/programs
have poor flexibility, reusability, and maintainability. An infrastructure suited for
the specific business/application context seems to us the best solution.
The infrastructure is not a tool; it is a set of functionalities or services that experience
has proved to be useful and widespread enough in the ETL scenario, and one can
build the application on top of them thus saving coding and debugging time. The
infrastructure layer implements, for example, an API to access DBMS and manage
partitions, functions for lookup operations, file acquisition, parallel read/write, and
so on as shown in Figure 4. Then, on top of this infrastructural layer, one can write
simpler and more readable code that implements the application specific logic,
resolving each possible trick not covered by the lower layer. The infrastructure is,
in fact, a library, so one can pick only what is needed; some constraints on how to
organize the application may exist or not, but in general these are not so strict as a
well-structured tool imposes.
Pages:
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206