Apriori¶
Importing the libraries¶
In [10]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Data Preprocessing¶
In [11]:
df = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
df.head(10)
Out[11]:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | shrimp | almonds | avocado | vegetables mix | green grapes | whole weat flour | yams | cottage cheese | energy drink | tomato juice | low fat yogurt | green tea | honey | salad | mineral water | salmon | antioxydant juice | frozen smoothie | spinach | olive oil |
| 1 | burgers | meatballs | eggs | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | chutney | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | turkey | avocado | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | mineral water | milk | energy bar | whole wheat rice | green tea | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | low fat yogurt | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 6 | whole wheat pasta | french fries | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 7 | soup | light cream | shallot | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 8 | frozen vegetables | spaghetti | green tea | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 9 | french fries | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
In [12]:
#Number of Rows and Columns
df.shape
Out[12]:
(7501, 20)
In [13]:
transactions = []
for i in range(0, 7501):
transactions.append([str(df.values[i,j]) for j in range(0, 20)])
In [14]:
transactions[0]
Out[14]:
['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']
Training the Apriori model on the dataset¶
In [15]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003,
min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)
Visualising the results¶
Displaying the first results coming directly from the output of the apriori function¶
In [16]:
results = list(rules)
In [17]:
results
Out[17]:
[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0.2450980392156863, lift=5.164270764485569)]),
RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3234501347708895, lift=3.2919938411349285)]),
RelationRecord(items=frozenset({'ground beef', 'tomato sauce'}), support=0.005332622317024397, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato sauce'}), items_add=frozenset({'ground beef'}), confidence=0.3773584905660377, lift=3.840659481324083)]),
RelationRecord(items=frozenset({'olive oil', 'light cream'}), support=0.003199573390214638, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'olive oil'}), confidence=0.20512820512820515, lift=3.1147098515519573)]),
RelationRecord(items=frozenset({'whole wheat pasta', 'olive oil'}), support=0.007998933475536596, ordered_statistics=[OrderedStatistic(items_base=frozenset({'whole wheat pasta'}), items_add=frozenset({'olive oil'}), confidence=0.2714932126696833, lift=4.122410097642296)]),
RelationRecord(items=frozenset({'shrimp', 'pasta'}), support=0.005065991201173177, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'shrimp'}), confidence=0.3220338983050847, lift=4.506672147735896)])]
Putting the results well organised into a Pandas DataFrame¶
In [18]:
def inspect(results):
lhs = [tuple(result[2][0][0])[0] for result in results]
rhs = [tuple(result[2][0][1])[0] for result in results]
supports = [result[1] for result in results]
confidences = [result[2][0][2] for result in results]
lifts = [result[2][0][3] for result in results]
return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])
Displaying the results non sorted¶
In [19]:
resultsinDataFrame
Out[19]:
| Left Hand Side | Right Hand Side | Support | Confidence | Lift | |
|---|---|---|---|---|---|
| 0 | light cream | chicken | 0.004533 | 0.290598 | 4.843951 |
| 1 | mushroom cream sauce | escalope | 0.005733 | 0.300699 | 3.790833 |
| 2 | pasta | escalope | 0.005866 | 0.372881 | 4.700812 |
| 3 | fromage blanc | honey | 0.003333 | 0.245098 | 5.164271 |
| 4 | herb & pepper | ground beef | 0.015998 | 0.323450 | 3.291994 |
| 5 | tomato sauce | ground beef | 0.005333 | 0.377358 | 3.840659 |
| 6 | light cream | olive oil | 0.003200 | 0.205128 | 3.114710 |
| 7 | whole wheat pasta | olive oil | 0.007999 | 0.271493 | 4.122410 |
| 8 | pasta | shrimp | 0.005066 | 0.322034 | 4.506672 |
Displaying the results sorted by descending lifts¶
In [20]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift')
Out[20]:
| Left Hand Side | Right Hand Side | Support | Confidence | Lift | |
|---|---|---|---|---|---|
| 3 | fromage blanc | honey | 0.003333 | 0.245098 | 5.164271 |
| 0 | light cream | chicken | 0.004533 | 0.290598 | 4.843951 |
| 2 | pasta | escalope | 0.005866 | 0.372881 | 4.700812 |
| 8 | pasta | shrimp | 0.005066 | 0.322034 | 4.506672 |
| 7 | whole wheat pasta | olive oil | 0.007999 | 0.271493 | 4.122410 |
| 5 | tomato sauce | ground beef | 0.005333 | 0.377358 | 3.840659 |
| 1 | mushroom cream sauce | escalope | 0.005733 | 0.300699 | 3.790833 |
| 4 | herb & pepper | ground beef | 0.015998 | 0.323450 | 3.291994 |
| 6 | light cream | olive oil | 0.003200 | 0.205128 | 3.114710 |