SUBSCRIBE & CONTINUE

Subscribe to get our latest content by email.

    We respect your privacy. Unsubscribe at any time.

    How to Compute Customer Attrition in 5 minutes in Pandas?
     How to Compute Customer Attrition in 5 minutes in Pandas?
    Saeed
    By Saeed Mirshekari

    February 27, 2023

    What Is Customer Attrition?

    Customer Attrition is the fancy way of saying "customer leakage". This metric is used to understand how many customers are leaving the business. It can be calculated using time series techniques in Python, using pandas. This tutorial will show you how to compute this metric on customer data given in a csv file, without any Tableau workbook.

    Why Does It Matter?

    Customers are the life of any business and therefore it's important to know retention rate of customers. This is where customer attrition comes in, which is a very popular metric used in retail businesses. In this article, you will learn how to compute customer attrition in pandas.

    This is a question I have never received, but it can be helpful for a small business understanding their customer base. If you notice high attrition over time, you can use this formula to identify what is causing that attrition.

    It is not an easy task though. There are some tricks. In this post we will learn how to calculate customer attrition in pandas. So let's get started!

    The Input Data

    We will show you the details steps in a real example of a data science project, and see how we can use the results. I am going to be using the Transaction Data dataset from Kaggle to demonstrate how to compute customer attrition.

    Here is a snapshot of the input data for you to see what we are dealing with:

    The Expected Output

    It is always nice to have a clear idea of the expected output shape of the data. For computing the Customer Attrition here is what we would expect as the output:

    Data Manipulations

    We have transactions for an e-commerce website, and we need to identify users' spending decrease (soft attrition problem).

    Before we get into the python scripts that takes the input data and returns the expected output as described above, let's review the main steps along the way in data manupulation to set things up to compute the attrition:

    1. Read the transaction table from the input file.

    2. Clean up data removing duplicate data and userid = -1

    3. Calculate the "aggregated_table" with each month's dollar value of transactions per user. (hint: what happens if a user does not have transactions in month X?)

    4. Now that things are ready for the final step, we add a new column, "soft_attrition", to aggregated_table, which is "1" if the total amount of transactions in the next three months is less than 25% of the past three months; otherwise, it is "0".

    Here is the python scripts you need to compute the Customer Attrition:

    # Loading pandas library
    import pandas as pd
    # loading csv data from a local location (after downloading from Kaggle)
    data_org = pd.read_csv('./transactions/transaction_data.csv',
    usecols = ['UserId', 'TransactionTime', 'NumberOfItemsPurchased', 'CostPerItem', 'ItemDescription'])
    # compute spent amount
    data_org['spent'] = data_org['NumberOfItemsPurchased']*data_org['CostPerItem']
    # removing bad data
    data_org = data_org.loc[data_org['UserId']!=-1]
    # extracting the strings that we need
    data_org['yearmonth'] = [x[-4:]+x[4:7] for x in data_org['TransactionTime']]
    # convering string to datetime
    data_org['yearmonth'] = pd.to_datetime(data_org['yearmonth'], format='%Y%b')
    # reset index
    data_org.set_index([pd.DatetimeIndex(data_org.yearmonth)], inplace=True)
    # filling the gaps in month-year
    data_org_mod = data_org.groupby('UserId').resample('1M').sum()
    data_org_mod = data_org_mod.drop(columns=['UserId'])
    data_org_mod = data_org_mod.reset_index(level=['UserId', 'yearmonth'])
    # compute attrition labels
    def get_attrition(grp, freq):
    prev_roll = grp.rolling(freq, on='yearmonth')['spent'].sum().shift(1)
    next_roll = grp.rolling(freq, on='yearmonth')['spent'].sum().shift(-3)
    return 1*(next_roll < 0.25*prev_roll)
    data_org_mod['soft_attrition'] = data_org_mod.groupby('UserId', as_index=False, group_keys=False).apply(get_attrition, '90D')

    Summary

    If you're just getting started with data analysis, you will quickly come to realize the importance of computing customer attrition. It's a critical aspect of any business to have an understanding of customer retention metrics, and pandas makes the process extremely easy.

    In this tutorial I showed you how to compute customer attrition in pandas using some basic pandas functions.

    I hope this was useful! To see more content like this subscribe to O'Fallon Labs or check out this website again.

    If you like our work, you will love our newsletter..💚

    About O'Fallon Labs

    In O'Fallon Labs we help recent graduates and professionals to get started and thrive in their Data Science careers via 1:1 mentoring and more.


    Saeed

    Saeed Mirshekari

    Saeed is currently a Director of Data Science in Mastercard and the Founder & Director of OFallon Labs LLC. He is a former research scholar at LIGO team (Physics Nobel Prize of 2017).

    leave a comment



    Let's Talk One-on-one!

    SCHEDULE FREE CALL

    Looking for a Data Science expert to help you score your first or the next Data Science job? Or, are you a business owner wanting to bring value and scale your business through Data Analysis? Either way, you’re in the right place. Let’s talk about your priorities!

    StatCounter - Free Web Tracker and Counter