By Davood Shahabi

Apr 15, 2023

The Credit-Risk repository on GitHub by Davood contains a machine learning model to predict credit risk for a given set of customers.

The model is developed using Python and Jupyter Notebook and employs several machine learning algorithms, including logistic regression, decision tree, random forest, and gradient boosting. The dataset used in the model is from a bank and includes information such as age, income, loan amount, and credit history of customers.

The repository also includes a data cleaning script to preprocess the dataset and a detailed report explaining the methodology and results of the model. The code and report are well-documented and can serve as a useful reference for anyone interested in building a credit risk prediction model.

You can check out the code and details in Davood's GitHub directly. Below is an executive summary of the completed project.

Credit-Risk Project

The goal of the project is to predict the risk of financial failure based on historical data.

Banks and financial organizations use the historical data on customers' behaviour to predict the probability of future defaults of credit card borrowings.

Banks can use credit risk prediction to decide whether to issue a credit card to the applicant.

Data Sets

for this project, we have 2 different datasets:

  • Application Records: all data related to the demographic information of applicants
  • Credit Records: historical credit card status of each applicant

Applicant ID is a primary key to joining datasets.

Project Phases

The project was led in 2 major phases:

  • Exploratory Data Analytics: 1) Data Cleaning, 2) Feature engineering
  • Predictive modeling

Feature Engineering

A label indicating delinquency occurrences 3 months in advance can be generated by considering the account history and status of each individual account.

The dat set containes an imbalanced label:

Therefore, we applied under sampling technique for the model training.
Finally, the dataset was ready for modeling. Here is a sample of processed data:

Based on the model, we may use onehot_encoding or other techniques for effective predictive model development.


We performed a correlation analysis to assess the relationship between the features and the label prior to building the model:

The purpose of creating a correlation heatmap of features before developing predictive models is to understand the relationship between different features in a dataset. A correlation heatmap is a graphical representation of the correlation matrix that shows the correlation coefficients between each pair of features in a dataset. The correlation coefficient measures the strength and direction of the linear relationship between two variables. Creating a correlation heatmap before developing predictive models helps us to understand the relationship between features in a dataset and can guide us in making decisions about feature selection, model interpretation, and improving model performance.


Developed 3 major predictive models:

  • Random Forest


The model metrics are saved in a table for a fast and clear comparison:

NO Model accuracy_score precision_score recall_score roc_auc_score f1_score
1 randomforest_Final 0.864091 0.888571 0.678404 0.817851 0.769394
2 xgboost_model_fin 0.781087 0.952576 0.363041 0.676985 0.525722
3 ctboost_model_fin 0.867162 0.921760 0.658228 0.815100 0.768016

Catboost model has a better performance to predict high-risk applicants. Since we have an imbalanced data, the recall score can be an important metric for this project. Catboost has the highest precision and recall scores.

Davood Shahabi

Davood is one of the recent graduates from 1-on-1 Mentoring Program in O'Fallon Labs. This article is an overview of one of his Data Science project that he did in O'Labs in early 2023.

leave a comment

Let's Talk

Schedule Your FREE Meeting Now

Looking for a Data Science expert to help you score your first or the next Data Science job? Or, are you a business owner wanting to bring value and scale your business through Data Analysis? Either way, you’re in the right place. Let’s talk about your priorities!