Equity Clustering & Valuation Model

This project clusters the S&P 500 equities using a machine learning model and applies a relative valuation framework to compare them. After grouping companies into peer sets, the model performs statistical analysis — including z-tests — to identify meaningful deviations in valuation ratios (such as P/E or EV/EBITDA) from their cluster averages. This helps surface potentially mispriced equities based on peer-relative metrics.

Tech Stack:

Python
Pandas
Scikit-learn
Yahoo Finance API

We scrape and clean daily financial data from Yahoo Finance for the entire S&P 500. This involves handling missing data, outliers, and normalizing input features to ensure robust downstream analysis.

We apply unsupervised machine learning (e.g., K-Means clustering) to group similar stocks based on their financial attributes. This enables identification of outliers and peer comparisons.

Our valuation model compares each stock to its cluster peers using valuation multiples like P/E, EV/EBITDA, and P/B. Stocks significantly deviating from their cluster average are flagged for further analysis.

All code and documentation are available on GitHub:

View GitHub Repository

I’m a finance enthusiast building tools to better understand and model markets. This space is a portfolio of personal work in quantitative finance.

Connect with me on LinkedIn