An AI-based Pricing Model for NYC Housing Market
Bahar
By Bahar Biazar

May 21, 2023

Introduction

Welcome to the "NYC Housing" project! This repository is a comprehensive exploration of New York City's housing market, offering valuable insights and analysis for those interested in understanding the dynamics of the city's real estate landscape. Whether you're a data enthusiast, a researcher, or a curious individual, this project provides a wealth of information to navigate the complexities of housing trends, pricing patterns, and neighborhood dynamics in the bustling metropolis of New York City.

Within this repository, you will find an array of datasets, code scripts, and visualizations that shed light on various aspects of the housing market. These resources have been meticulously curated and organized to facilitate easy access and comprehension. By leveraging powerful analytical tools and techniques, this project aims to uncover valuable patterns and correlations, empowering users to make informed decisions, gain valuable insights, and explore the ever-changing nature of the New York City real estate scene.

Whether you're interested in examining historical housing trends, understanding the factors influencing pricing fluctuations, or exploring the relationship between location and property values, this project serves as a valuable resource. The data-driven approach adopted here enables users to gain a deeper understanding of the intricate dynamics that shape the New York City housing market.

Feel free to browse the repository, explore the various datasets and code scripts, and leverage the visualizations to uncover hidden patterns and trends. The "NYC Housing" project is an open invitation to join in the exploration of New York City's housing market, and we encourage you to contribute, ask questions, and engage with the community as we collectively dive into the rich world of real estate in the city that never sleeps.

This project has completed as a part of a mentoring program in O'Fallon Labs.

NYC Housing Price Prediction & Streamlit App

The goal of this project is to use regression models in predicting housing prices and to deploy the final model so everyone can have access to it.

Click to access Streamlit App
While personally looking for housing, I tried to outline the most important features in determining the price of a property and investigate if there are any other attributes that drive the housing market other than the common knowledge.

Data and EDA

Data is from Kaggle , which was originally collected on 1/20/21 and consists of 75,629 housing listings on Zillow.com using Zillow's API. Each listing has 1507 attributes that makes data processing and feature selection very time consuming. Final features used in the model are: number of bedrooms, number of bathrooms, year built, property tax rate, living area, lot size, schools rating, HOA, number of garage spaces, has fireplace, has basement, latitude and longitude.

Heads up: This model is based on 2021 and current year predictions require retraining the model with most recent data.

Model and Evaluation

The best Model with the lowest was CatBoost with `RMSE: 92593.91` and `R2: 0.84`.

Shap Values and Feature Importance

App

Please look at app.py for more information or go to Streamlit link to see some predictions.
Click to access Streamlit App

Takeaways

- Knowing the data and preparing the right features are keystone, yet the most time consuming part of of any ML model. - Be aware of outliers! Look at what's available and what outliers to take out. Some need domain knowledge and some are detectable simply by looking at the distribution graphs. If you don't see many data points around a certain value it might impact your predictions drastically. It's better to take those outliers out. - Make sure to know what you will be predicting for.

Next Steps

1- Add more features to the training set

2- Expand the model to predict current day prices.

In reality, many factors are involved in this matter and there is a big difference between how much a house is worth and how much it's sold in the market. As I explore more I'll update this repository with my findings.

Conclusion

Whether you're a prospective homebuyer, a researcher studying urban development, or simply curious about the intricacies of New York City's housing scene, the "NYC Housing" project provides a valuable resource for exploration and analysis. We encourage you to delve into the repository, leverage the available resources, and join us in unraveling the complexities and uncovering the hidden gems within the dynamic world of New York City's housing market.
If you like our work, you will love our newsletter..💚

About O'Fallon Labs

In O'Fallon Labs we help recent graduates and professionals to get started and thrive in their Data Science careers via 1:1 mentoring and more.


Bahar

Bahar Biazar

Bahar is one of the recent graduates from 1-on-1 Mentoring Program in OFallon Labs. This article is an overview of one of his Data Science project that he did in OLabs in early 2023.


taking on the advanture to become a data scientist
Let's Go💊 I'm Good

leave a comment



Let's Talk One-on-one!

SCHEDULE FREE CALL

Looking for a Data Science expert to help you score your first or the next Data Science job? Or, are you a business owner wanting to bring value and scale your business through Data Analysis? Either way, you’re in the right place. Let’s talk about your priorities!