SEARCH
0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Prev | Current Page 192 | Next

Robert Wrembel and Christian Koncilia

"Data Warehouses and Olap: Concepts, Architectures and Solutions"

Dividing a job into many sequential stages (each with its input
and output on disk) is a good technique that simplifies the coding and debugging,
but reading and writing the same data many times is very expensive. Processing
many files (or extracting) sequentially is the simplest way but does not permit a
good utilization of computational resources.
To achieve high performance, there are only two ways:
??? execute the minimum number of machine instructions and avoid useless I/O
??? do not waste the wait time that occurs in I/O operations
These elementary rules imply a not always simple balance between the readability and
maintainability, on one hand, and an efficient but complex coding, on the other.
To avoid reading and writing the same data many times, split workload in conformity
to the application logic to exploit parallel features of the machines. Use pipelining
and parallelism as the main objective in ETL.
Parallelism is the ability to split workload in many tasks that work concurrently and
synchronize each other. Split workload is obviously useful when we have more than
one processor, but is even useful in a single processor machine; in the latter case,
we can utilize the I/O idle time (orders of magnitude of CPU time) to do other useful
jobs. In ETL, parallelism primarily means the full utilization of multiprocessor
machines and minimization of waste of time correlated to I/O operations.


Pages:
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204