Predicting Success in AI Startups: A Data-Driven Investment Analysis Part 2 - 2021 Vintage

This version 2 research extends our machine learning approach to identify high-potential AI startups from the 2021 vintage, yielding compelling results that further validate our investment methodology.


Project Framework

We applied our established machine learning methodology to identify promising AI ventures in the 2021 cohort, maintaining consistency with our previous analysis.

Our approach continued to incorporate:

- Six predictive variables: Industries, Company Description, Founder Biography, Founder Gender, Location, and Educational Background

- A dual-model ensemble combining Random Forest and XGBoost algorithms

- Advanced text vectorization for unstructured data


Portfolio Performance

Our model selected 19 companies from the 2021 vintage all the way till 2024 (evenly distributed) with the following current performance:


- 9 companies (47.37%) have already achieved valuations exceeding $500M, including:

  - Perplexity AI: Reached $9 billion valuation in December 2024, with over $100 million in annualized revenue as of March 2025

  - Cyera: Secured $300 million in Series D funding, reaching a $3 billion valuation in November 2024

  - Hippocratic AI: Achieved unicorn status with a $1.64 billion valuation in January 2025

  - Anumana: Showcasing leadership in AI-powered cardiovascular solutions

  - World Labs, Protect AI, Mytra, DatologyAI, and Revefi


- 4 companies (21.05%) demonstrate strong growth trajectories

- 4 companies (21.05%) are too early in their development to evaluate conclusively

- 2 companies (10.53%) have not ceased operations but are unlikely to achieve significant success


This 47.37% high-performer rate significantly outperforms the best venture capital unicorn success rates of 5% (Sequoia), with potential to reach 68.42% as companies currently on track continue to develop. The 10.53% failure rate thus far remains substantially lower than industry averages of 75%, not factoring in various constraints of real investing.

Our model continues to demonstrate strong predictive capability while serving as a decision support tool rather than a replacement for comprehensive due diligence.

We will continue to analyze additional vintages across larger geographies and sectors, publishing results as they become available.


Disclaimer: This analysis is for educational purposes only. Past performance does not guarantee future outcomes.