E-commerce Customer & Product Analytics

2021-07-21| Category: Clustering, RFM Model, Association Rules Mining| GitHub: Click to View GitHub Repository
work-single-image

Market Basket Analysis in E-commerce incorporating Customer Segmentation

Project Summary

Another big scale project that I had initiated during my Master’s degree studies was this project, whereby I was required to combine 2 unsupervised learning algorithms into a single data science project. Using the Online Retail V3 dataset that I found it pretty interesting on Kaggle, the SEMMA data mining framework was adopted for this data project using Python. Focusing on extracting insights from both customers and the products that were involved, the objectives that were defined are:

  1. To apply clustering algorithms to perform segmentation of customers into different customer groups.
  2. To conduct market basket analysis on the products from different customer segments using A-Priori algorithm.
Clustering Results

The experiment for customer segmentation included the use of K-means clustering and hierarchical clustering and based on the silhouette score analysis and elbow method, it was decided that 4 clusters were suitable for the project. The figure above illustrated the RFM customer segmentation results in terms of snake plot for K-means clustering, showing that 4 types of customer segments were identified which are at-risk customers (cluster 0), recent customers (cluster 1), lost/churned customers (cluster 2), and champion/loyal customers (cluster 3). Each of the segments demonstrated different behaviours which can be analysed through the RFM features.

A-Priori Results

As for market basket analysis, frequent item sets and association rules were successfully extracted using A-Priori algorithm for both K-means customer segments and hierarchical customer segments. Brief graphical analyses were also carried out by comparing the support, confidence and lift distribution of different customer segments. Besides that, the total count of frequent item sets that were obtained were also compared using graphs. The figure above demonstrates the support distribution of the 4 customer segments that were extracted for K-means clustering and the differences can be seen clearly.

The results showed that the objectives were achieved and the project was concluded based on the fact that the use of SEMMA framework with 2 unsupervised algorithms were successful in obtaining hidden useful insights from both customers and products regarding customer buying behaviours.

Dataset source: Kaggle dataset link

Dataset
Online Retail V3 dataset
Source of Data
Kaggle Open Dataset
Completion Date
June 3, 2021
Main Objective
Clustering, Association Rules Mining

SO WHAT DO YOU THINK ?

I’m planning to add a few more projects into my portfolio in future when I am free.
If you like my work and would like to have a chat, feel free to contact me or connect with me on LinkedIn!

Get in touch