Apriori¶

Importing the libraries¶

In [10]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Data Preprocessing¶

In [11]:
df = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
df.head(10)
Out[11]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 shrimp almonds avocado vegetables mix green grapes whole weat flour yams cottage cheese energy drink tomato juice low fat yogurt green tea honey salad mineral water salmon antioxydant juice frozen smoothie spinach olive oil
1 burgers meatballs eggs NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 turkey avocado NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 mineral water milk energy bar whole wheat rice green tea NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 low fat yogurt NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 whole wheat pasta french fries NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 soup light cream shallot NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 frozen vegetables spaghetti green tea NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 french fries NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [12]:
#Number of Rows and Columns
df.shape
Out[12]:
(7501, 20)
In [13]:
transactions = []
for i in range(0, 7501):
  transactions.append([str(df.values[i,j]) for j in range(0, 20)])
In [14]:
transactions[0]
Out[14]:
['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

Training the Apriori model on the dataset¶

In [15]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, 
                min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

Visualising the results¶

Displaying the first results coming directly from the output of the apriori function¶
In [16]:
results = list(rules)
In [17]:
results
Out[17]:
[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0.2450980392156863, lift=5.164270764485569)]),
 RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.015997866951073192, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3234501347708895, lift=3.2919938411349285)]),
 RelationRecord(items=frozenset({'ground beef', 'tomato sauce'}), support=0.005332622317024397, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomato sauce'}), items_add=frozenset({'ground beef'}), confidence=0.3773584905660377, lift=3.840659481324083)]),
 RelationRecord(items=frozenset({'olive oil', 'light cream'}), support=0.003199573390214638, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'olive oil'}), confidence=0.20512820512820515, lift=3.1147098515519573)]),
 RelationRecord(items=frozenset({'whole wheat pasta', 'olive oil'}), support=0.007998933475536596, ordered_statistics=[OrderedStatistic(items_base=frozenset({'whole wheat pasta'}), items_add=frozenset({'olive oil'}), confidence=0.2714932126696833, lift=4.122410097642296)]),
 RelationRecord(items=frozenset({'shrimp', 'pasta'}), support=0.005065991201173177, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'shrimp'}), confidence=0.3220338983050847, lift=4.506672147735896)])]

Putting the results well organised into a Pandas DataFrame¶

In [18]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))

resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

Displaying the results non sorted¶

In [19]:
resultsinDataFrame
Out[19]:
Left Hand Side Right Hand Side Support Confidence Lift
0 light cream chicken 0.004533 0.290598 4.843951
1 mushroom cream sauce escalope 0.005733 0.300699 3.790833
2 pasta escalope 0.005866 0.372881 4.700812
3 fromage blanc honey 0.003333 0.245098 5.164271
4 herb & pepper ground beef 0.015998 0.323450 3.291994
5 tomato sauce ground beef 0.005333 0.377358 3.840659
6 light cream olive oil 0.003200 0.205128 3.114710
7 whole wheat pasta olive oil 0.007999 0.271493 4.122410
8 pasta shrimp 0.005066 0.322034 4.506672

Displaying the results sorted by descending lifts¶

In [20]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift')
Out[20]:
Left Hand Side Right Hand Side Support Confidence Lift
3 fromage blanc honey 0.003333 0.245098 5.164271
0 light cream chicken 0.004533 0.290598 4.843951
2 pasta escalope 0.005866 0.372881 4.700812
8 pasta shrimp 0.005066 0.322034 4.506672
7 whole wheat pasta olive oil 0.007999 0.271493 4.122410
5 tomato sauce ground beef 0.005333 0.377358 3.840659
1 mushroom cream sauce escalope 0.005733 0.300699 3.790833
4 herb & pepper ground beef 0.015998 0.323450 3.291994
6 light cream olive oil 0.003200 0.205128 3.114710