SEARCH
0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Prev | Current Page 231 | Next

Robert Wrembel and Christian Koncilia

"Data Warehouses and Olap: Concepts, Architectures and Solutions"


In addition, these data intensive workflows are quite complex in nature, involving
dozens of sources, cleaning and transformation activities, and loading facilities.
Bouzeghoub, Fabret, and Matulovic (1999) mention that the data warehouse refreshment
process can consist of many different subprocesses, like data cleaning,
archiving, transformations, and aggregations, interconnected through a complex
schedule. For instance, Adzic and Fiore (2003) report a case study for mobile network
traffic data, involving around 30 data flows and 10 sources, while the volume of data
rises to about 2 TB, with the main fact table containing about 3 billion records. The
throughput of the (traditional) population system is 80 million records per hour for
the entire process (compression, FTP of files, decompression, transformation, and
8 Simitsis, Vassiliadis, Skiadopoulos, & Sellis
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
loading), on a daily basis, with a loading window of only 4 hours. The request for
performance is so pressing that there are processes hard-coded in low level DBMS
calls to avoid the extra step of storing data to a target file to be loaded to the data
warehouse through the DBMS loader. In general, Strange (2002a) notes that the
complexity of the ETL process, as well as the staffing required to implement it, depends
mainly on the following variables: (a) the number and variety of data sources;
(b) the complexity of transformation; (c) the complexity of integration; and (d) the
availability of skill sets.


Pages:
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243