Association Rules | ML Method

Association rule learning is a machine learning method that applies a set of rules to discover interesting relations between the variables in large databases i.e. the transaction database of a store. thus, Association Rule determines frequent associations among variables called association rules.

There are two types of Association learning algorithms- Apriori and Eclat.

Apriori: 

The Apriori algorithm uses various itemsets to generate association rules, and it is designed to work on the databases that contain transactions. With the help of these association rules, it determines how strongly or how weakly two objects are related.

The associations are measured using three common metrics- Support, Confidence, and lift.

Support:

Support: This gives the fraction of transactions which contains item A and B. This tells us about the frequently bought items or the combination of items bought frequently.

support = freq(A,B)N 

Confidence

It tells us how often items A and B occur together, given the number of times A occurs.

confidence = freq(A,B)freq(A)

Lift

It indicates the strength of a rule over the random occurrence of A and B. This tells us the strength of any rule.
Lift = supportsupport(A) * support(B)

Steps for Apriori Algorithm

Step-1: Initially we need to determine the support of itemsets in the transactional database, and select the minimum support and confidence.

Step-2: Choose all supports in the transaction with a higher support value than the minimum or selected support value.

Step-3: Find all the rules of these subsets that have a higher confidence value than the threshold or minimum confidence.

Step-4: Order the rules as the decreasing order of lift

Apriori Algorithm Working

For example, if we have the following dataset that has various transactions, and from this dataset, we need to find the frequent itemsets and generate the association rules using the Apriori algorithm.

TIDITEMSET
T1A, B
T2B, D
T3B, C
T4A,B,D
T5A, C
T6B, C
T7A, C
T8A,B,C,E
T9A,B,C
Step-1: Calculating C1 and L1:
  • In the first step, we will create a table that contains the support count (The frequency of each itemset individually in the dataset) of each itemset in the given dataset. The table is called the Candidate set or C1.
Itemsetsupport_count
A6
B7
C5
D2
E1
  • Now, we will take out all the itemsets that have a greater support count than the Minimum Support (2). This will give us the table for the frequent itemset L1.
  • Since all the itemsets have a greater or equal support count than the minimum support, except the E, so E itemset will be removed.
Itemsetsupport_count
A6
B7
C5
D2
Step-2: Candidate Generation C2, and L2:
  • During this step, we will generate C2 with the help of L1. In C2, we will obtain the pair of the itemsets of L1 in the form of subsets.
  • After generating the subsets, we will again find the support count from the main transaction table of datasets, i.e., how many times these pairs have occurred together in the given dataset. Then we will get the below table for C2:
Itemsetsupport_count
{A, B}4
{A, C}4
{A, D}1
{B, C}4
{B, D}2
{C, D}2
  • Again, we required to compare the C2 Support count with the least support count, and after comparing, the itemset with smaller support count will be eliminated from the table C2. It will give the below table for L2
Itemsetsupport_count
{A, B}4
{A, C}4
{B, C}4
{B, D}2
Step-3: Candidate generation C3, and L3:

For C3, we will again repeat the same two processes, but now we will form the C3 table with subsets of three itemsets together and will calculate the support count from the dataset. It will give the below table:

Itemsetsupport_count
{A,B,C}2
{B,C,D}1
{A,C,D}0
{A,B,D}0

Now we will create the L3 table. From the above C3 table, there is only one combination of itemset that has a support count equal to the minimum support count. So, the L3 will have only one combination, i.e., {A, B, C}.

Step-4: Determining the association rules for the subsets:

To get the association rules, first, we will create another table with the possible rules from the occurred combination {A, B.C}. For all the rules, we will calculate the Confidence using the formula sup( A ^B)/A. After calculating the confidence value for all rules, we will exclude the rules that produce less confidence than the minimum threshold(50%).

Consider the below table:

RulesSupportConfidence
A ^B → C2Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50%
B^C → A2Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50%
A^C → B2Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50%
C→ A ^B2Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40%
A→ B^C2Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%
B→ B^C2Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

As the given threshold or minimum confidence is 50%, so the first three rules A ^B → C, B^C → A, and A^C → B can be considered as the strong association rules for the given problem.

Advantages of Apriori Algorithm

  • This is easy to understand the algorithm
  • The join and prune steps of the algorithm can be efficiently implemented on large datasets.

Disadvantages of Apriori Algorithm

  • The apriori algorithm works slowly compared to other algorithms.
  • The overall performance can be reduced as it scans the database multiple times.
  • The time complexity and space complexity of the apriori algorithm is O(2D), which is very high. Here D represents the horizontal width present in the database.

ECLAT(Equivalence Class Clustering and bottom-up Lattice Traversal):

Eclat algorithm determines the elements from the bottom like depth-first search.  Eclat algorithm is a very simple algorithm to find frequent itemsets.  This algorithm uses the vertical database. It cannot use horizontal database. 

If there is any horizontal database,  then we need to convert it into a vertical database. There is no requirement to scan the database again and again.  Eclat algorithm scans the database only once. Support is counted in this algorithm.

It is similar to the a priori algorithm, But we do not have confidence and lift factors. We are only looking for support. It much faster and the steps involved are set minimum support so we want to set up a support level.

There are several steps:

  1. Set minimum support.
  2. Take all the subsets in a transaction that are having higher support than minimum support.
  3. Order those subsets by decreasing support.

Conclusion on Association Rule:

Association rule mining collects interesting associations and correlation relationships among large sets of data items. The Association rules show attribute value conditions that occur frequently together in a given data set. A simple example of association rule mining is Market Basket Analysis.

Written By: Prudvi Raj

Reviewed By: Rushikesh Lavate

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *