In terms of
the transformation tasks, we distinguish two main classes of problems (Lenzerini,
2002): (a) conflicts and problems at the schema level (e.g., naming and
structural conflicts) and (b) data level transformations (i.e., at the instance
level). The main problems with respect to the schema level are (a) naming
conflicts, where the same name is used for different objects (homonyms) or
different names are used for the same object (synonyms) and (b) structural
conflicts, where one must deal with different representations of the same object
in different sources. In addition, there are a lot of variations of data-level
conflicts across sources: duplicated or contradicting records, different value
representations (e.g., for marital status), different interpretation of the values
(e.g., measurement units dollar vs. euro), different aggregation levels (e.g.,
sales per product vs. sales per product group), or reference to different points in
time (e.g., current sales as of yesterday for a certain source vs. as of last week
for another source). The list is enriched by low-level technical problems like
data type conversions, applying format masks, assigning fields to a sequence
number, substituting constants, setting values to NULL or DEFAULT based on a
condition, or using simple SQL operators, for instance, UPPER, TRUNC, SUBSTR.
The integration and transformation programs perform a wide variety of
functions, such as reformatting, recalculating, modifying key structures, adding
an element of time, identifying default values, supplying logic to choose
between multiple sources, summarizing, merging data from multiple sources,
and so forth.
Pages:
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248