The intuition is that the underlying
data clusters are located naturally during the chunking process, exactly because
hierarchy value combinations form the dense and sparse data areas. For example, a
Figure 6. (a) A cube hierarchically chunked; (b) the whole subtree up to the data
chunks under chunk 0|0 (corresponding to the grayed cells on the left figure)
48 Karayannidis, Tsois, & Sellis
Copyright ?© 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
sparse area would be formed in a 3-dimenisonal cube, along the subspace (Sept00-
Dec00, Books, Italy) if we did not sell any books during Sept00-Dec00 in Italy.
A subtree at chunking-depth D corresponds to a ???family??? (i.e., a subspace) of hierarchy-
related data points. In fact the taller this subtree is (i.e., the smaller D is), the
larger is the subspace of hierarchy-related data points that it ???covers.??? Based on this
observation, the CUBE File construction algorithm tries to ???pack??? into buckets (i.e.,
disk pages) whole subtrees of the smaller possible depth. This is the basic heuristic
exploited by the CUBE File for achieving hierarchical clustering of the data. Note
that the packing of chunks into buckets, so as to preserve hierarchical clustering, is
an NP-Hard problem (Karayannidis, 2003).
Pages:
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296