This ???row pipelined??? model, very clean in theory, has a
drawback in implementation; passing rows through many stages involves a lot of
synchronization that can become relevant compared with the cost of single stage
itself. A better solution is to extend the ???row pipeline??? model to a ???block pipeline???
one, where single blocks contain a certain number of rows. In this manner, we
maintain parallelism/pipeline mechanism reducing the weight of synchronization,
Extraction, Transformation, and Loading Processes 0
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission
of Idea Group Inc. is prohibited.
memory passing, and routines invocation times. Working with blocks of data moreover
paves the way for treating master-detail and other strange formats in which
records are correlated. In summary, the block pipeline model contains the cost of
synchronization, preserves a general parallel/pipeline mechanism, and leaves the
possibility to manage some form of sequential order.
Transforming a record essentially means working with strings and performing
lookup operations in memory (foreign keys valorization). One can use the wellknown
string functions available in Unix, but a more efficient way is to write adhoc
simple macros that copy tokens from input to output buffer, save it in local
variables, pad the output string with blanks, and so on, thus increasing the set every
time one needs some new basic functionality.
Pages:
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219