The ability to package
jobs and synchronize them is fundamental for an ETL infrastructure for both
performance and recovery reasons.
We have chosen to define the ???processUnit??? as an elementary piece of code that
can be implemented both as thread or process in the specified number of instances,
without formal parameters, but with an exit code. The processUnit is the
basic element of synchronization graph; each processUnit has a dependence list
based on its exit code. These processUnits and dependences are all declared (via
a simple API) in a main module called ???master??? and these definitions are then
stored (at startup) in a memory segment shared across all jobs (processes/threads)
that constitute the application. The master supervises all application activities and
forks the processes/threads when necessary according to the graph maintained
in the shared memory segment. Each processUnit registers its status in the appropriate
structure; the master, at polling interval, checks them, and verifying the
dependences, starts or cancels the appropriate processUnits and so on up to the
completion of the execution of defined graph. For recovery reasons, as detailed
in the Operation and Maintenance Issues section, the processUnit status is also
registered on DBMS.
Whenever it is possible, useful, and quite simple, we have adopted a ???declarative
approach??? for implementing the defined functionality.
Pages:
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208