Equity Clustering & Valuation Model

This project clusters the S&P 500 equities using a machine learning model and applies a relative valuation framework to compare them. After grouping companies into peer sets, the model performs statistical analysis — including z-tests — to identify meaningful deviations in valuation ratios (such as P/E or EV/EBITDA) from their cluster averages. This helps surface potentially mispriced equities based on peer-relative metrics.

Tech Stack: Python, Pandas, Scikit-learn, Yahoo Finance API

1. Data Pipeline

We scrape and clean daily financial data from Yahoo Finance for the entire S&P 500. This involves handling missing data, outliers, and normalizing input features to ensure robust downstream analysis.

2. Clustering Algorithm

We apply unsupervised machine learning (e.g., K-Means clustering) to group similar stocks based on their financial attributes. This enables identification of outliers and peer comparisons.

3. Relative Valuation

Our valuation model compares each stock to its cluster peers using valuation multiples like P/E, EV/EBITDA, and P/B. Stocks significantly deviating from their cluster average are flagged for further analysis.

Explore the Code

All code and documentation are available on GitHub:

View GitHub Repository
About Me

I’m a finance enthusiast building tools to better understand and model markets. This space is a portfolio of personal work in quantitative finance.

Connect with me on LinkedIn