MARKET BASKET ANALYSIS

 MARKET BASKET ANALYSIS




Author

Sidhika Dugar

1.Introduction

Market Basket Analysis (MBA) is a data mining technique used to identify relationships and associations between items that are frequently purchased together. It’s based on the theory that if a customer buys a certain item or group of items, they are more likely to buy another specific item or group of items.

2.Market Basket Analysis using using Groceries Data from Kaggle

  Member_number       Date  itemDescription
1          1808 21-07-2015   tropical fruit
2          2552 05-01-2015       whole milk
3          2300 19-09-2015        pip fruit
4          1187 12-12-2015 other vegetables
5          3037 01-02-2015       whole milk
6          4941 14-02-2015       rolls/buns
      Member_number       Date       itemDescription
38760          3364 06-05-2014                   oil
38761          4471 08-10-2014         sliced cheese
38762          2022 23-02-2014                 candy
38763          1097 16-04-2014              cake bar
38764          1510 03-12-2014 fruit/vegetable juice
38765          1521 26-12-2014              cat food
'data.frame':   38765 obs. of  3 variables:
 $ Member_number  : int  1808 2552 2300 1187 3037 4941 4501 3803 2762 4119 ...
 $ Date           : chr  "21-07-2015" "05-01-2015" "19-09-2015" "12-12-2015" ...
 $ itemDescription: chr  "tropical fruit" "whole milk" "pip fruit" "other vegetables" ...
[1] 3898

The Groceries dataset from Kaggle has 38765 observations and 3 features including Member_number, Date of the purchase, and items in the basket. The transaction data span over two years ( 2014 and 2015). There are 3898 unique members in the transaction history.

2.1 Data Restructuring for creating Association Rules using “A rules” library

2.2 Creating the rules

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.2    0.1    1 none FALSE            TRUE       5   5e-04      2
 maxlen target  ext
     10  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 7 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[167 item(s), 14963 transaction(s)] done [0.00s].
sorting and recoding items ... [158 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.04s].
writing ... [19 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
     lhs                                rhs          support      confidence
[1]  {artif. sweetener}              => {whole milk} 0.0005346521 0.2758621 
[2]  {brandy}                        => {whole milk} 0.0008688097 0.3421053 
[3]  {spices}                        => {soda}       0.0006014837 0.2250000 
[4]  {softener}                      => {whole milk} 0.0008019782 0.2926829 
[5]  {house keeping products}        => {whole milk} 0.0007351467 0.2444444 
[6]  {finished products}             => {whole milk} 0.0008688097 0.2031250 
[7]  {rolls/buns, white bread}       => {whole milk} 0.0006014837 0.2812500 
[8]  {other vegetables, white bread} => {whole milk} 0.0005346521 0.2051282 
[9]  {margarine, soda}               => {whole milk} 0.0005346521 0.2051282 
[10] {curd, rolls/buns}              => {whole milk} 0.0006014837 0.2195122 
     coverage    lift     count
[1]  0.001938114 1.746815  8   
[2]  0.002539598 2.166281 13   
[3]  0.002673261 2.317051  9   
[4]  0.002740092 1.853328 12   
[5]  0.003007418 1.547872 11   
[6]  0.004277217 1.286229 13   
[7]  0.002138609 1.780933  9   
[8]  0.002606429 1.298914  8   
[9]  0.002606429 1.298914  8   
[10] 0.002740092 1.389996  9   

Whenever we change the minimum support, the number of rules created is changed.

i.e., from 0.005 to 0.001 the support changes ===> the rules reduces from 20 to 2 and more confidence us shown.

Market Basket Analysis on the Kaggle groceries dataset revealed frequent item associations, showing how customers tend to buy certain products together. The number and strength of rules varied with support and confidence levels. These insights can help retailers in cross-selling, promotions, and better product placement.

3. Conclusion

In conclusion, this aggregates individual product entries into distinct shopping baskets based on customer and date. It then converts these baskets into the specialized “transactions” format required by the a rules library. This restructuring is the essential preparatory step for mining association rules like “customers who buy bread also tend to buy milk.”

Comments

Popular Posts