Finding relationships among stores using Apriori Algorithm

Jan 20, 2018

As seen in our previous article, Association Rule Mining is a great way to solve problems, but it was computationally expensive, to reduce this expense there is a simple solution known as Apriori Algorithm. In this article, we will see how this algorithm works with an example.

Association Rule generates rules based on the number of items present, like in our previous article we had 11 items (Bread, Milk, Eggs, Yogurt, Cheese, Coke, Snickers, Ice Cream, Ketchup, Potatoes, Chips), that means we would have to generate every possible rule.

The total number of rules can be found by the following equation, TR= 3^d – 2^(d+1) + 1, where d is the number of items.

To reduce the number of rules, we will generate a frequency dataset by using bruteforce approach (2^11 combinations in our case), which is still computationally expensive but unlike the previous method. To further reduce this expense, we canapply pruning based on Apriori Principle which states that “If an itemset isfrequent, then all of its subsets must also be frequent”.

The following Support property holds the principle. ∀X ,Y : (X ⊆ Y ) ⇒s(X ) ≥ s(Y ), which says Support of an itemset never exceeds the support of its subsets.

Next step is to setup an optimized minimum support count to reduce the dataset. Support count is indirectly proportional to the number of rows in a dataset.
Now let see Apriori works, following is a sample randomly generated dataset.

Customer ID

Stores

Walmart, ShopRite, Indian Store

Walmart, Wegmans, Target

Wegmans, Costco, Indian Store

ShopRite, Home Depot, Walmart

ShopRite, Walmart, Indian Store

Costco, ShopRite, Walmart

Walmart, Indian Store

ShopRite, Wegmans, Indian Store

Walmart, ShopRite

The above dataset tells where each customer has shopped, like customer 9 shopped at Walmart and ShopRite.
Setting Support Count minimum = 2, Support Count >= Support minimum, (Items in red fail to qualify).
Finding count for Itemset having 1 item per itemset.

Stores

Count

Walmart

ShopRite

Indian Store

Costco

Target

Wegmans

Home Depot

Finding count for Itemset having 2 items per itemset. There is no need to generate itemset having Target and Home Depot.

Stores

Count

Walmart, ShopRite

Walmart, Indian Store

Walmart, Costco

Walmart, Wegmans

ShopRite, Indian Store

ShopRite, Costco

ShopRite, Wegmans

Indian Store, Costco

Indian Store, Wegmans

Costco, Wegmans

Finding count for Itemset having 3 items per itemset. There is no need to generate itemset having highlighted items, plus Indian Store and Wegmans cannot go with either Walmart or Shoprite as Wegmans count with those is less than the support count.

Stores

Count

Walmart, ShopRite, Indian Store

If we had considered every subset, 7C1 + 7C2 + 7C3 = 63, but with support based pruning we have only 18 combinations.

If we had followed the previous method, we would have generated 3^7 – 2^(7+1) + 1= 1932 rules, but now we can do itin 3^3 – 2^(3+1) + 1 = 12 rules.

Lets answer a simple question, where should ShopRite open their next store? In other words, how often ShopRite appears in transactions that contain Walmart or/and Indian Store.

Finding Support and Confidence for the following rules. (Consider these stores are not located within 5 miles radius)

Rules

Support

Confidence

Walmart -> ShopRite

5 / 9 = 55.5 %

5 / 7 = 71 %

Indian Store -> ShopRite

3 / 9 = 33.3 %

3 / 5 = 60 %

Walmart, Indian Store -> ShopRite

2 / 9 = 22.2 %

2 / 3 = 66.6 %

This means that there are 60% customers who shop at Indian Store also shop at ShopRite, same idea goes for the other two, so opening a new ShopRite store near Indian Store could increase the number of sales as majority of people like to shop at both the stores.

Finding relationships among stores using Apriori Algorithm

Discussion about this post