MARKET BASKET ANALYSIS
MARKET BASKET ANALYSIS
1.Introduction
Market Basket Analysis (MBA) is a data mining technique used to identify relationships and associations between items that are frequently purchased together. It’s based on the theory that if a customer buys a certain item or group of items, they are more likely to buy another specific item or group of items.
2.Market Basket Analysis using using Groceries Data from Kaggle
Member_number Date itemDescription
1 1808 21-07-2015 tropical fruit
2 2552 05-01-2015 whole milk
3 2300 19-09-2015 pip fruit
4 1187 12-12-2015 other vegetables
5 3037 01-02-2015 whole milk
6 4941 14-02-2015 rolls/buns Member_number Date itemDescription
38760 3364 06-05-2014 oil
38761 4471 08-10-2014 sliced cheese
38762 2022 23-02-2014 candy
38763 1097 16-04-2014 cake bar
38764 1510 03-12-2014 fruit/vegetable juice
38765 1521 26-12-2014 cat food'data.frame': 38765 obs. of 3 variables:
$ Member_number : int 1808 2552 2300 1187 3037 4941 4501 3803 2762 4119 ...
$ Date : chr "21-07-2015" "05-01-2015" "19-09-2015" "12-12-2015" ...
$ itemDescription: chr "tropical fruit" "whole milk" "pip fruit" "other vegetables" ...[1] 3898The Groceries dataset from Kaggle has 38765 observations and 3 features including Member_number, Date of the purchase, and items in the basket. The transaction data span over two years ( 2014 and 2015). There are 3898 unique members in the transaction history.
2.1 Data Restructuring for creating Association Rules using “A rules” library
2.2 Creating the rules
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.2 0.1 1 none FALSE TRUE 5 5e-04 2
maxlen target ext
10 rules TRUE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 7
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[167 item(s), 14963 transaction(s)] done [0.00s].
sorting and recoding items ... [158 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.04s].
writing ... [19 rule(s)] done [0.00s].
creating S4 object ... done [0.00s]. lhs rhs support confidence
[1] {artif. sweetener} => {whole milk} 0.0005346521 0.2758621
[2] {brandy} => {whole milk} 0.0008688097 0.3421053
[3] {spices} => {soda} 0.0006014837 0.2250000
[4] {softener} => {whole milk} 0.0008019782 0.2926829
[5] {house keeping products} => {whole milk} 0.0007351467 0.2444444
[6] {finished products} => {whole milk} 0.0008688097 0.2031250
[7] {rolls/buns, white bread} => {whole milk} 0.0006014837 0.2812500
[8] {other vegetables, white bread} => {whole milk} 0.0005346521 0.2051282
[9] {margarine, soda} => {whole milk} 0.0005346521 0.2051282
[10] {curd, rolls/buns} => {whole milk} 0.0006014837 0.2195122
coverage lift count
[1] 0.001938114 1.746815 8
[2] 0.002539598 2.166281 13
[3] 0.002673261 2.317051 9
[4] 0.002740092 1.853328 12
[5] 0.003007418 1.547872 11
[6] 0.004277217 1.286229 13
[7] 0.002138609 1.780933 9
[8] 0.002606429 1.298914 8
[9] 0.002606429 1.298914 8
[10] 0.002740092 1.389996 9 Whenever we change the minimum support, the number of rules created is changed.
i.e., from 0.005 to 0.001 the support changes ===> the rules reduces from 20 to 2 and more confidence us shown.
Market Basket Analysis on the Kaggle groceries dataset revealed frequent item associations, showing how customers tend to buy certain products together. The number and strength of rules varied with support and confidence levels. These insights can help retailers in cross-selling, promotions, and better product placement.
3. Conclusion
In conclusion, this aggregates individual product entries into distinct shopping baskets based on customer and date. It then converts these baskets into the specialized “transactions” format required by the a rules library. This restructuring is the essential preparatory step for mining association rules like “customers who buy bread also tend to buy milk.”


Comments
Post a Comment