Customer Segmentation with RFM Analysis

Hello everyone! I am back with another article. This post will describe RFM analysis and show how to use it for customer segmentation by analyzing an online retail shop’s data set on python. Based on the results of the RFM analysis, I will exemplify what kind of actions can be taken for different kinds of customers.

What is RFM?

RFM represents a method used for measuring customer value. An RFM analysis can show you who are the most valuable customers for your business. The ones who buy most frequently, most often, and spend the most. First of all, the metrics you have seen are calculated.

Recency: The value indicates how much time has passed since a customer’s last activity or transaction with the brand. The activity is usually purchased, but sometimes variations are used, such as the last visit to a website or a mobile app. (Formula: RFM analysis date — Last purchase date)

Frequency: Total number of purchases. How often a customer shows in a certain period of activity value. Customers who operate more often may be more loyal than others.

Monetary: Total spending by the customer. It is the value that shows how much a customer has spent in a certain period of time. Dividing the value of the spend by frequency shows us the average purchase amount.

Customer segmentation is the process of separating these values into groups by scoring between 1 and 5. Depending on these scores, the customers are segmented into different groups. These groups can be shown on the Recency and Frequency Grid as the following:

A low recency and frequency score (bottom left) shows the hibernating customers who haven’t been purchased anything recently or frequently. A high recency and frequency score (top right) shows the champions who have been recently and frequently purchasing.

After deciding which customers belong to which group, customer-specific sales, and marketing techniques are developed.

RFM is the key to see the current position of the customers with metrics and scores for each and every customer, but it’s also very important for a company to have a projection of what’s coming next. There is no proof that a customer who is scored as a Champion would continue on their purchasing habits for years. It shouldn’t be forgotten that there is always a risk for churn. It’s not possible to calculate the overall value of the company with RFM scores. Moreover, segmenting a new customer that has only purchased once is not possible with RFM. These problems will be tackled in the second part.

Now, I will apply RFM analysis to an online retail shop’s data set, and I will suggest few marketing strategies based on different customer segments. This analysis is done by using Python.

Dataset and Story

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments. The company believes that marketing activities specific to customer segments that exhibit common behaviors will increase revenue. For example, it is desired to organize different campaigns for new customers and different campaigns to retain very profitable customers for the company.

The dataset named Online Retail includes the sales of a UK-based online store between 01/12/2009–09/12/2011. This company’s product catalog includes souvenirs. The majority of the company’s customers are corporate customers.

Variables of the data set:

  • InvoiceNo: The number of the invoice, unique per each purchase. Refund invoice numbers contain “C”
  • StockCode: Unique code per item
  • Description: Name of the item
  • Quantity: The number of items within the invoice
  • InvoiceDate: Date and time of the purchase
  • UnitPrice: Price of a single item, as of Sterlin
  • CustomerID: Unique id number per each customer
  • Country: The country where the customer is living

You can find my code in Kaggle:

The first part of this series was about “today.” In the second part of this series which will cover the “future,” I will discuss the customer’s lifetime value, CLTV, and two different approaches that are needed to calculate this value: BG/NBD and Gamma Gamma Models.



Data Science Enthusiast — For more information check out my LinkedIn page here: