). This is exactly a ???group by??? operation, but performed on the fly.
The aggregation operation is conceptually simple, but very complex in practice;
memory allocation and indexing, for example, are not at all prosaic when the volume
grows and with compound aggregation keys. Building a complete set of functions
to perform these operations with the necessary flexibility is hard and expensive
compared with the facility of the SQL ???group by??? clause. This is generally true in
02 Adzic, Fiore, & Sisto
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
batch systems where the loading phase occurs in the night hours, and spending one
or two hours to perform all the required aggregation is acceptable. However, in near
real time contexts, we have small volume but strict timing constraints; further, it is
also necessary to manage alarm tables that often require some aggregation. In this
area, a set of functions to perform simple aggregation on the fly can be useful and
not too hard to implement. Saving up some queries (after the loading) and avoidng
useless read/write can give significant advantage in NRT where ETL processing
has a time-slot of 5-10 minutes.
Acquire, Transform, and Load Hub
The core of every ETL system is the engine that brings together various data flows,
makes the transformations, and loads them into DBMS.
Pages:
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213