Predicting Success in AI Startups: A Data-Driven Investment Analysis Part 3 - 2024 Vintage

Our machine learning approach to identify high-potential AI startups has yielded exceptional results yet again but much fine-tuning, significantly outperforming industry benchmarks and validating our investment methodology.

Project Framework

We applied our established machine learning methodology to identify promising AI ventures, maintaining consistency with our previous analysis. Our approach incorporated:

- Six predictive variables: Industries, Company Description, Founder Biography, Founder Gender, Location, and Educational Background
- A dual-model ensemble combining Random Forest and XGBoost algorithms
- Advanced text vectorization for unstructured data

Portfolio Performance

Our model-selected portfolio of 15 companies founded in 2024-2025 demonstrates remarkable performance:

- 40% Success Rate: 6 companies have already achieved significant success
- 86.67% Projected Success Rate: Including companies currently "on track"
- 5.43x Outperformance: Compared to the industry baseline of 7.37% (note: real world constrains have not been factored)

Geographic Distribution and Success Patterns

Our portfolio shows strategic geographic diversity while maintaining concentration in key tech hubs:

- San Francisco Bay Area: 7 companies (4 successful, 57% success rate)
- New York: 2 companies
- Other locations: 6 companies across Texas, Delaware, and Washington
- California companies show a 50% success rate vs 20% for non-California companies


Key Successes (sample selection):

- Safe Superintelligence (Palo Alto): AI safety and systems
- World Labs (San Francisco): 3D perception and interaction
- Sapien (San Francisco): AI for finance

Significance and Implications

This experiment provides several significant insights:

Model Validation
- Demonstrates the effectiveness of ML-driven startup selection
- Shows strong predictive power for early-stage success indicators
- Validates the use of historical patterns for future success prediction especially in a vertical context (i.e AI)

Portfolio Strategy Validation
- Confirms the value of geographic diversity while maintaining focus on tech hubs
- Shows the importance of confidence thresholds in investment decisions
- Demonstrates successful risk management (only 1 failure in 15 investments)

Industry Implications
- Suggests potential for systematic outperformance using ML-driven selection
- Indicates high success potential in specific AI subsectors (cybersecurity, financial services)
- Demonstrates the value of data-driven decision making in venture capital

Looking Forward

With a projected success rate of 86.67% and current performance 5.43x above industry baseline, our results strongly validate the ML-driven approach to startup selection. The model's ability to identify promising companies across different locations and AI applications suggests scalability and broader applicability.

The strong correlation between model confidence scores and actual outcomes provides a compelling case for incorporating machine learning into venture capital decision-making processes. As we continue to monitor the portfolio, the early results suggest that AI-powered startup selection could significantly improve venture capital returns while reducing investment risks.

**Disclaimer**: This analysis is for educational purposes only. Past performance does not guarantee future outcomes.

Predicting Success in AI Startups: A Data-Driven Investment Analysis Part 2 - 2021 Vintage

This version 2 research extends our machine learning approach to identify high-potential AI startups from the 2021 vintage, yielding compelling results that further validate our investment methodology.


Project Framework

We applied our established machine learning methodology to identify promising AI ventures in the 2021 cohort, maintaining consistency with our previous analysis.

Our approach continued to incorporate:

- Six predictive variables: Industries, Company Description, Founder Biography, Founder Gender, Location, and Educational Background

- A dual-model ensemble combining Random Forest and XGBoost algorithms

- Advanced text vectorization for unstructured data


Portfolio Performance

Our model selected 19 companies from the 2021 vintage all the way till 2024 (evenly distributed) with the following current performance:


- 9 companies (47.37%) have already achieved valuations exceeding $500M, including:

  - Perplexity AI: Reached $9 billion valuation in December 2024, with over $100 million in annualized revenue as of March 2025

  - Cyera: Secured $300 million in Series D funding, reaching a $3 billion valuation in November 2024

  - Hippocratic AI: Achieved unicorn status with a $1.64 billion valuation in January 2025

  - Anumana: Showcasing leadership in AI-powered cardiovascular solutions

  - World Labs, Protect AI, Mytra, DatologyAI, and Revefi


- 4 companies (21.05%) demonstrate strong growth trajectories

- 4 companies (21.05%) are too early in their development to evaluate conclusively

- 2 companies (10.53%) have not ceased operations but are unlikely to achieve significant success


This 47.37% high-performer rate significantly outperforms the best venture capital unicorn success rates of 5% (Sequoia), with potential to reach 68.42% as companies currently on track continue to develop. The 10.53% failure rate thus far remains substantially lower than industry averages of 75%, not factoring in various constraints of real investing.

Our model continues to demonstrate strong predictive capability while serving as a decision support tool rather than a replacement for comprehensive due diligence.

We will continue to analyze additional vintages across larger geographies and sectors, publishing results as they become available.


Disclaimer: This analysis is for educational purposes only. Past performance does not guarantee future outcomes.

New Zealand's Innovation Pathway

As a Singapore-based VC, I've witnessed how innovation ecosystems evolve naturally when properly supported. Singapore initially emphasised deep tech but allowed market forces to shape developments organically.

Critiques of New Zealand's funding imbalance misses a crucial point: successful startups need significant market power quickly, regardless of their technological depth. Creating numerous small non-deep tech ventures won't deliver the economic impact New Zealand seeks. You need to continue to focus on both.

Three focused recommendations:

  1. Establish a national coordination body with a hands-on advisory panel of experienced entrepreneurs and investors who can directly mentor founders to scale globally. This addresses both fragmentation and practical scaling challenges.

  2. Develop diverse funding mechanisms prioritizing ventures with global potential rather than simply increasing startup quantities. Government initiatives on grants, fund of funds support should continue with momentum but understand the signs of change and adapt to it.

  3. Implement more talent development/retention programs, one example to take note of is Singapore's NUS Overseas College, which immerses students in innovation hubs like Silicon Valley, creating globally-minded entrepreneurs with valuable networks. Net new migration into New Zealand needs to be positive over time, but this is likely to be the toughest challenge yet.

New Zealand should focus more on building globally competitive companies with proper ecosystem support. I know you can do it. You know you can do it. Whāia te iti kahurangi - pursue that which is precious.

Predicting Success in AI Startups: A Data-Driven Investment Analysis

This version 1 research applies machine learning to identify high-potential AI startups from 2017-2019, yielding significant insights for investment decision-making.

Project Framework

We developed a machine learning methodology to identify promising AI ventures across two cohorts: 2017-2018 (475 companies) and 2019 (329 companies). 

Our approach incorporated:

  • Six predictive variables: Industries, Company Description, Founder Biography, Founder Gender, Location, and Educational Background

  • A dual-model ensemble combining Random Forest and XGBoost algorithms

  • Advanced text vectorization for unstructured data

Portfolio Performance

Our model selected 15 companies across both time periods:

2017-2018 Selections (10): Jerry, Health Note, Cylera, Deep Cognition, Determined AI, NoTraffic, MovieBot, SupplyHive, Kami Vision, Rowzzy

2019 Selections (5): Eleos Health, Anyscale, Baseten, Anvilogic, Fairmatic

Current Performance:

  • 6 companies (40%) achieved valuations exceeding $500M

  • 3 companies (20%) demonstrate strong growth trajectories

  • 3 companies (20%) show steady growth

  • 3 companies (20%) have ceased operations

This 40% high-performer rate significantly outperforms typical venture capital success rates of 10-20%, while the 20% failure rate is substantially lower than industry averages of 75%. This do not factoring in various constraints of real investing.

Key Investment Domains

Four predominant themes emerged:

  1. Enterprise AI Infrastructure (Determined AI, Anyscale)

  2. Healthcare AI Applications (Eleos Health, Health Note)

  3. Security Solutions (Cylera, Anvilogic)

  4. Financial Technology (Jerry, Fairmatic)

Investment Implications

Successful AI ventures consistently demonstrate:

  • Enterprise-focused solutions with clear value propositions

  • Technical excellence within founding teams

  • Strategic presence in major technology ecosystems

While our model demonstrates strong predictive capability, it remains a decision support tool rather than a replacement for comprehensive due diligence.

We will continue to do more and larger permutations in AI and work larger geographies and sectors and publish the results once they are done.


Disclaimer: This analysis is for educational purposes only. Past performance does not guarantee future outcomes.