Finding relationships among stores using Apriori Algorithm
As seen in our previous article, Association Rule Mining is a great way to solve problems, but it was computationally expensive, to reduce this expense there is a simple solution known as Apriori Algorithm. In this article, we will see how this algorithm works with an example.
Association Rule generates rules based on the number of items present, like in our previous article we had 11 items (Bread, Milk, Eggs, Yogurt, Cheese, Coke, Snickers, Ice Cream, Ketchup, Potatoes, Chips), that means we would have to generate every possible rule.
The total number of rules can be found by the following equation, TR= 3^d – 2^(d+1) + 1
, where d
is the number of items.
To reduce the number of rules, we will generate a frequency dataset by using bruteforce approach (2^11 combinations in our case)
, which is still computationally expensive but unlike the previous method. To further reduce this expense, we canapply pruning based on Apriori Principle which states that “If an itemset isfrequent, then all of its subsets must also be frequent”.
The following Support property holds the principle. ∀X ,Y : (X ⊆ Y ) ⇒s(X ) ≥ s(Y )
, which says Support of an itemset never exceeds the support of its subsets.
Next step is to setup an optimized minimum support count to reduce the dataset. Support count is indirectly proportional to the number of rows in a dataset.
Now let see Apriori works, following is a sample randomly generated dataset.
Customer ID
Stores
1
Walmart, ShopRite, Indian Store
2
Walmart, Wegmans, Target
3
Wegmans, Costco, Indian Store
4
ShopRite, Home Depot, Walmart
5
ShopRite, Walmart, Indian Store
6
Costco, ShopRite, Walmart
7
Walmart, Indian Store
8
ShopRite, Wegmans, Indian Store
9
Walmart, ShopRite
The above dataset tells where each customer has shopped, like customer 9
shopped at Walmart and ShopRite.
Setting Support Count minimum = 2
, Support Count >= Support minimum
, (Items in red fail to qualify).
Finding count for Itemset having 1
item per itemset.
Stores
Count
Walmart
7
ShopRite
6
Indian Store
5
Costco
2
Target
1
Wegmans
3
Home Depot
1
Finding count for Itemset having 2
items per itemset. There is no need to generate itemset having Target and Home Depot.
Stores
Count
Walmart, ShopRite
5
Walmart, Indian Store
3
Walmart, Costco
1
Walmart, Wegmans
1
ShopRite, Indian Store
3
ShopRite, Costco
1
ShopRite, Wegmans
1
Indian Store, Costco
1
Indian Store, Wegmans
2
Costco, Wegmans
1
Finding count for Itemset having 3
items per itemset. There is no need to generate itemset having highlighted items, plus Indian Store and Wegmans cannot go with either Walmart or Shoprite as Wegmans count with those is less than the support count.
Stores
Count
Walmart, ShopRite, Indian Store
2
If we had considered every subset, 7C1 + 7C2 + 7C3 = 63
, but with support based pruning we have only 18
combinations.
If we had followed the previous method, we would have generated 3^7 – 2^(7+1) + 1=
1932
rules, but now we can do itin 3^3 – 2^(3+1) + 1 =
12
rules.
Lets answer a simple question, where should ShopRite open their next store? In other words, how often ShopRite appears in transactions that contain Walmart or/and Indian Store.
Finding Support and Confidence for the following rules. (Consider these stores are not located within 5
miles radius)
Rules
Support
Confidence
Walmart -> ShopRite
5 / 9 = 55.5 %
5 / 7 = 71 %
Indian Store -> ShopRite
3 / 9 = 33.3 %
3 / 5 = 60 %
Walmart, Indian Store -> ShopRite
2 / 9 = 22.2 %
2 / 3 = 66.6 %
This means that there are 60%
customers who shop at Indian Store also shop at ShopRite, same idea goes for the other two, so opening a new ShopRite store near Indian Store could increase the number of sales as majority of people like to shop at both the stores.