SEARCH
0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Prev | Current Page 472 | Next

Robert Wrembel and Christian Koncilia

"Data Warehouses and Olap: Concepts, Architectures and Solutions"

Copying or distributing in print or electronic forms without written permission of
Idea Group Inc. is prohibited.
called TID. A subset Y of I where k=|Y| is referred to as a k-itemset (or simply an
itemset), and k is called the length of Y. A transaction database TD is the whole set
of transactions. A set X ??† TD is called a tidset while the fraction of transactions in
TD that contains an itemset Y is called the support of Y and is denoted by supp(Y).
Thus, an itemset is frequent (or large) when supp(Y) reaches at least a user-speci-
fied minimum threshold called minsupp.
An association rule r is an implication of the form Y1 ?‡’ Y2, where Y1 and Y2 are
subsets of I, Y1???Y2 is a frequent itemset, and Y1 ??© Y2 = ??…. The support of the rule r
is equal to supp(Y1???Y2) while its confidence is computed as the ratio supp(Y1???Y2)/
supp(Y1).
In our running example, an association rule could be the following:
??? Female = 1 and Internal = 1 ?‡’ Govern = 3 [10%, 52%]. The rule means that
if there are females on the board and if the proportion of Top Management
sitting on the board is less than 10% (i.e., Internal = 1), then the quality of
governance is good (at least equal to 70%) with a confidence of 52%.
In Apriori-like algorithms (Agrawal & Srikant, 1994), rule mining is conducted as
follows. For every frequent itemset Y, all nonempty subsets of Y are extracted.


Pages:
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484