Oxford says “gut.” I say “objective + proof.”

Oxford’s The Impact of Artificial Intelligence on Venture Capital argues AI accelerates sourcing and diligence, but investment decisions stay human because durable moats are socially grounded conviction, gut feeling, and networks.

I agree with the workflow diagnosis. I disagree with the implied endgame.

Not because “gut” is fake—but because “gut” is often a label we apply when we haven’t defined success tightly enough, or when we don’t have a measurement loop that forces our beliefs to confront outcomes.

Dealflow is getting commoditized. The edge is moving.

AI expands visibility, speeds up pipelines, and pushes the industry toward shared tools and shared feeds. When everyone can scan more of the world, “who saw it first” decays.

But convergence of inputs does not imply convergence of results. The edge moves from access to learning rate.

The outlier problem isn’t mystical. It’s an evaluation problem.

Oxford’s strongest point is that the power-law outliers are indistinguishable from “just bad” in the moment, and that humans use conviction to step into ambiguity.

I accept that premise and I still think the conclusion is wrong.

Because “conviction” is not a supernatural faculty. It’s a policy under uncertainty. And policies can be evaluated.

If your decision rule can’t be backtested, it’s not conviction. It’s narrative.

Don’t try to read souls. Build signals you can audit.

Some firms try to extract psychology from language data. Sometimes it works as a cue; often it’s noisy. And founders adapt as soon as they sense the scoring system.

So the goal isn’t “measure personality with high accuracy.” The goal is: build signals that are legible, repeatable, falsifiable and then combine them with a process that forces updates when reality disagrees.

Verification beats vibes.

If founders optimize public narratives, then naive text scoring collapses into a Goodhart trap.

The difference between toy AI and investable AI is verification: triangulate claims, anchor them in time, reject numbers that can’t be sourced, and penalize inconsistency across evidence.

That’s how you turn unstructured noise into features you can actually test.

Status is a market feature—not a human moat.

Networks and brand matter because markets respond to them—follow-on capital, recruiting pull, distribution, acquisition gravity.

So yes: status belongs in the model.

But modeling status is not the same thing as needing a human network as the enduring edge. One is an input signal. The other is a claim about irreducible advantage.

If an effect is systematic, it’s modelable.

Objective function: I’m optimizing for fund outcomes.

A lot of debates about “AI can’t do VC” hide an objective mismatch.

If your target is “eventual truth at year 12,” you’ll privilege a certain kind of human judgment. If your target is “realizable outcomes within a fund horizon,” you’ll build a different machine.

I’m comfortable modeling hype—not because fundamentals don’t matter, but because time and liquidity are part of the label. Markets pay for narratives before they pay for final verdicts, and funds get paid on the path, not just the destination.

The punchline

Oxford is right about current practice: AI reshapes the funnel, while humans still own the final decision and accountability.

My reaction is that this is not a permanent moat. It’s a temporary equilibrium.

Define success precisely. Build signals that survive verification. Backtest honestly. Update fast.

That’s not gut.

That’s an investing operating system.

2026 is the year we stop confusing scaling with solving

I called neuro-symbolic AI a 600% growth area back when I analyzed 20,000+ NEURIPS papers. I wrote that world models would unlock the $100T bet because spatial intelligence beats text prediction. I predicted AGI would expose average VCs because LLMs struggle with complex planning and causal reasoning.

Now Ilya Sutskever—co-founder of OpenAI, the guy who built the thing everyone thought would lead to AGI—just said it out loud: "We are moving from the age of scaling to the age of research".

That's not a dip. That's a ceiling.

Here's what the math actually says:

Meta, Amazon, Microsoft, Google, and Tesla have spent $560 billion on AI capex since early 2024. They've generated $35 billion in AI revenue. That's a 16:1 spend-to-revenue ratio. AI-related spending now accounts for 50% of U.S. GDP growth. White House AI Czar David Sacks admitted that a reversal would risk recession.

The 2000 dot-com crash was contained because telecom was one sector. AI isn't. This is systemic exposure dressed up as innovation.

The paradigm that just died:

The Kaplan scaling laws promised a simple formula: 10x the parameters, 10x the data, 10x the compute = 10x better AI. It worked from GPT-3 to GPT-4. It doesn't work anymore. Sutskever's exact words: these models "generalize dramatically worse than people".

Translation: we hit the data wall. Pre-training has consumed the internet's high-quality text. Going 100x bigger now yields marginal, not breakthrough, gains. When your icon of deep learning says that, you're not in a correction—you're at the end of an era.

The five directions I've been tracking—now validated:

The shift isn't abandoning AI. It's abandoning the lazy idea that "bigger solves everything." Here's where the research-to-market gap is closing faster than most realize:

1. Neuro-symbolic AI (the 600% growth area I flagged)

I wrote that neuro-symbolic was the highest-growth niche with massive commercial gaps. Now it's in Gartner's 2025 Hype Cycle. Why? Because LLMs hallucinate, can't explain reasoning, and break on causal logic. Neuro-symbolic systems don't. Drug discovery teams are deploying them because transparent, testable explanations matter when lives are on the line. MIT-IBM frames it as layered architecture: neural networks as sensory layer, symbolic systems as cognitive layer. That separation—learning vs. reasoning—is what LLMs never had.

2. Test-time compute (the paradigm I missed, but now understand)

OpenAI's o1/o3 flipped the script: spend compute at inference, not just training. Stanford's s1 model—trained on 1,000 examples with budget forcing—beat o1-preview by 27% on competition math. That's proof that intelligent compute allocation beats brute scale. But there's a limit: test-time works when refining existing knowledge, not generating fundamentally new capabilities. It's a multiplier on what you already have, not a foundation for AGI.

3. Small language models (the efficiency play enterprises actually need)

Microsoft's Phi-4-Mini, Mistral-7B, and others with 1-10B parameters are matching GPT-4 in narrow domains. They run on-device, preserve privacy, cost 10x less, and don't require hyperscale infrastructure. Enterprises are deploying hybrid strategies: SLMs for routine tasks, large models for multi-domain complexity. That's not compromise—that's architecture that works at production scale.

4. World models (the $100T bet I wrote about)

I argued that world models—systems that build mental maps of reality, not just predict text—would define the next era. They're now pulling $2B+ in funding across robotics, autonomous vehicles, and gaming. Fei-Fei Li's World Labs hit unicorn status at $230M raised. Skild AI secured $1.5B for robotic world models. And of course Yann Lecun's new startup. This isn't hype—it's the shift from language to spatial intelligence I predicted.

5. Agentic AI (the microservices moment for AI)

Gartner reports a 1,445% surge in multi-agent inquiries from Q1 2024 to Q2 2025. By end of 2026, 40% of enterprise apps will embed AI agents, up from under 5% in 2025. Anthropic's Model Context Protocol (MCP) and Google's A2A are creating HTTP-equivalent standards for agent orchestration. The agentic AI market: $7.8B today, projected $52B by 2030. This is exactly the shift I described in AGI VCs—unbundling monolithic intelligence into specialized, composable systems.

What kills most AI deployments (and what I've been saying):

I wrote that the gap isn't technology—it's misaligned expectations, disconnected business goals, and unclear ROI measurement. Nearly 95% of AI pilots generate no return (MIT study). The ones that work have three things: clear kill-switch metrics, tight integration loops, and evidence-first culture.

Enterprise spending in 2026 is consolidating, not expanding. While 68% of CEOs plan to increase AI investment, they're concentrating budgets on fewer vendors and proven solutions. Rob Biederman of Asymmetric Capital Partners: "Budgets will increase for a narrow set of AI products that clearly deliver results and will decline sharply for everything else".

That's the bifurcation I predicted: a few winners capturing disproportionate value, and a long tail struggling to justify continued investment.

The punchline:

The scaling era gave us ChatGPT. The research era will determine whether we build systems that genuinely reason, plan, and generalize—or just burn a trillion dollars discovering the limits of gradient descent.

My bet: the teams that win are the ones who stop optimizing for benchmark leaderboards and start solving actual constraints—data scarcity, energy consumption, reasoning depth, and trust. The ones who recognized early that neuro-symbolic, world models, and agentic systems weren't academic curiosities but the actual path forward.

I've been tracking these shifts for two years. Sutskever's admission isn't news to anyone reading this blog—it's confirmation that the research-to-market timeline just accelerated.

Ego last, evidence first. The founders who internalized that are already building what comes next.

AGI Will Replace Average VCs. The Best Ones? Different Game.

The performance gap between tier-1 human VCs and current AI on startup selection isn't what you think. VCBench: a new standardized benchmark where both humans and LLMs evaluate 9,000 anonymized founder profiles, shows top VCs achieving 5.6% precision. GPT-4o hit 29.1%. DeepSeek-V3 reached 59.1% (though with brutal 3% recall, meaning it almost never said "yes").[1]​

That's not a rounding error. It's a 5-10x gap in precision, the metric that matters most in VC, where false positives (bad investments) are far costlier than false negatives (missed deals).[1]​

But here's what the paper doesn't solve: VCBench inflated the success rate from real-world 1.9% to 9% for statistical stability, and precision doesn't scale linearly when you drop the base rate back down. The benchmark also can't test sourcing, founder relationships, or board-level value-add, all critical to real fund performance. And there's a subtle time-travel problem: models might be exploiting macro trend knowledge (e.g., "crypto founder 2020-2022 = likely exit") rather than true founder quality signals.[2]​

Still, the directional message is clear: there is measurable, extractable signal in structured founder data that LLMs capture better than human intuition. The narrative that "AI will augment but never replace VCs" is comforting and wrong. The question isn't if AGI venture capitalists will exist—it's when they cross 15-20% unicorn hit rates in live portfolios (double the best human benchmark) and what that phase transition does to the rest of us.​

The math is brutal for average funds

Firebolt Ventures has been cited as leading the pack at a 10.1% unicorn hit rate—13 unicorns from 129 investments since 2020. (Stanford GSB VCI-backed analysis, as shared publicly) Andreessen Horowitz sits at 5.5% on that same "since 2020" hit-rate framing, albeit at far larger volume. And importantly: Sequoia fell just below the 5% cutoff on that ranking—less because of a lack of wins and more because high volume dilutes hit rate.[3]​

The 2017 vintage—now mature enough to score—shows top-decile funds hitting 4.22x TVPI. Median? 1.72x. Most venture outcomes are random noise dressed up as strategy.​

Here's the punchline: PitchBook's 20-year LP study has been summarized as finding that even highly skilled manager selectors (those with 40%+ hit rates at picking top-quartile funds) generate only ~0.61% additional annual returns, and that skilled selection beats random portfolios ~98.1% of the time in VC (vs. ~99.9% in buyouts). (PitchBook analysis, as summarized).​

If the best fund pickers in the world can barely separate signal from noise, what does that say about VC selection itself?​

AGI VCs won't need warm intros

Current ML research suggests models can identify systematic misallocation even within the set of companies VCs already fund. In "Venture Capital (Mis)Allocation in the Age of AI," the median VC-backed company ranks at the 83rd percentile of model-predicted exit probability—meaning VCs are directionally good, but still leave money on the table. (Lyonnet & Stern, 2022). Within the same industries and locations, the authors estimate that reallocating toward the model's top picks would increase VCs' imputed MOIC by ~50%.​

That alpha exists because human VCs are bottlenecked by:

Information processing limits. Partners evaluate ~200-500 companies/year. An AGI system can scan orders of magnitude more continuously.​

Network constraints. You can't invest in founders you never meet. AGI doesn't need warm intros—it can surface weak signals from GitHub velocity, hiring patterns, or web/social-traffic deltas before the traditional network even sees the deck.​

Cognitive biases. We over-index on storytelling, pedigree, and pattern-matching to our last winner. Algorithms don't care if the founder went to Stanford or speaks confidently. They care about predictors of tail outcomes.​

Bessemer's famous Anti-Portfolio—the deals they passed on Google, PayPal, eBay, Coinbase is proof that even elite judgment systematically misfires. If the misses are predictable in hindsight, they're predictable in foresight given the right model.​

The five gaps closing faster than expected

AGI isn't here yet because five bottlenecks remain:

Continual learning. Current models largely freeze after training. A real VC learns from every pitch, every exit, every pivot. Research directions like "Nested Learning" have been proposed as pathways toward continual learning, but it's still not a solved, production-default capability.​

Visual perception. Evaluating pitch decks, product demos, team dynamics from video requires true multimodal understanding. Progress is real, but "human-level" is not the default baseline yet.​

Hallucination reduction. For VC diligence—where one wrong fact about IP or founder background kills the deal—today's hallucination profile is still too risky. Instead of claiming a universal "96% reduction," the defensible claim is that retrieval-augmented generation plus verification/guardrails can sharply reduce hallucinations in practice, with the magnitude depending on corpus quality and evaluation method. ​

Complex planning. Apple's research suggests reasoning models can collapse beyond certain complexity thresholds; venture investing is a 7-10 year planning problem through pivots, rounds, and market shifts.​

Causal reasoning. Correlation doesn't answer "If we invest $2M vs. $1M, what happens?" Causal forests and double ML estimate treatment effects while controlling for confounders. The infrastructure exists; it's not yet integrated into frontier LLMs. Give it 18 months.​

Unlike the theoretical barriers to general AGI (which may require paradigm shifts), the barriers to an AGI VC are engineering problems with known solutions.​

The phase transition nobody's pricing in

Hugo Duminil-Copin won the Fields Medal for proving how percolation works: below a critical threshold, clusters stay small. Above it, a giant component suddenly dominates. That's not a metaphor—it's a rigorous model of network effects.​

Hypothesis (not settled fact): once AGI-allocated capital crosses something like 15-25% of total VC AUM, network effects could create nonlinear disadvantage for human-only VCs in deal flow access and selection quality. Why? Because:​

Algorithmic funds identify high-signal companies before they hit the traditional fundraising circuit. If you're a founder and a fund can produce a high-conviction term sheet on a dramatically shorter clock—with clear, inspectable reasoning—you take the meeting.​

Network effects compound. The AGI with the best proprietary outcome data (rejected deals, partner notes, failed pivots) trains better models. That attracts better founders. Which generates better data. Repeat.​

LPs will demand quantitative benchmarks. "Show me your out-of-sample precision vs. the AGI baseline" becomes table stakes. Funds that can't answer get cut.​

The first AGI VC to hit 15% unicorn rates and 6-8x TVPI will trigger the cascade. My estimate: 2028-2029 for narrow domains (B2B SaaS seed deals), 2030-2032 for generalist funds. That's not decades—it's one fund cycle.​

What survives: relationship alpha and judgment at the edge

The AGI VC will systematically crush humans on sourcing, diligence, and statistical selection. What it won't replace—at least initially:

Founder trust and warm intros. Reputation still opens doors. An algorithm can't build years of relationship capital overnight.​

Strategic support and crisis management. Board-level judgment calls, operational firefighting, ego management in founder conflicts—those require human nuance.​

Novel situations outside the training distribution. Unprecedented technologies, regulatory black swans, geopolitical shocks. When there's no historical pattern to learn from, you need human synthesis.​

VCs will bifurcate: algorithmic funds competing on data/modeling edge and speed, versus relationship boutiques offering founder services and accepting lower returns. The middle—firms that do neither exceptionally—will get squeezed out.​

Operating system for the transition

If you're building or managing a fund today, three moves matter:

1. Build proprietary outcome data now. The best training set isn't Crunchbase—it's your rejected deal flow with notes, your portfolio pivots, your failed companies' post-mortems. That's the moat external models can't replicate. Track every pitch, every IC decision, every update. Structure it for ML ingestion.​

2. Instrument your decision process. Precommit to hypotheses ("We think founder X will succeed because Y"). Log the reasoning. Compare predicted vs. actual outcomes quarterly. This builds the feedback loop that lets you detect when your mental model is miscalibrated—and when an algorithm beats you.​

3. Segment where you add unique value vs. where you're replaceable. If your edge is "I know this space and can move fast," you're exposed. If it's "founders trust me in a crisis and I've navigated three pivots with them," you're defensible. Be honest about which deals came from relationship alpha versus statistical pattern-matching. Double down on the former; automate the latter.​

The real test

In three years, when an AGI fund publishes live performance data showing 12-15% unicorn rates and 5-6x TVPI, the LP conversation changes overnight. Not because the technology is elegant—because the returns are real and the process is transparent.​

That's the moment VCs have to answer: What alpha do we generate that a model can't? For many funds, the answer will be uncomfortable. For the best ones—the ones who've always known that determination, speed, and earned insight compound faster than credentials—it'll be clarifying.​

The AGI VC era doesn't kill venture capital. It kills the pretense that average judgment plus a warm network equals outperformance. What's left is a smaller, sharper game where human edge has to be provable, not performative.​

And if you can't articulate your edge in a sentence—quantifiably, with evidence—you're not competing with other humans anymore. You're competing with an algorithm that already sees your blind spots better than you do.​

  1. https://arxiv.org/pdf/2509.14448.pdf
  2. https://www.reddit.com/r/learnmachinelearning/comments/1no8xji/vcbench_new_benchmark_shows_llms_can_predict/
  3. https://www.linkedin.com/posts/ilyavcandpe_top-unicorn-investors-by-hit-rate-since-2020-activity-7362200145880367104-7zTv


The LeCun Pivot: Why the Smartest Researcher in AI Just Changed His Mind—Publicly

Yann LeCun, the Turing Award winner who helped build the GPU-fueled LLM machine, just walked away from it. He didn't retire. He didn't fade. He started a new company and said out loud: we've been optimizing the wrong problem.

That's not ego protection. That's credibility.

What changed

For three years, while Meta poured hundreds of billions into scaling language models, LeCun watched the returns flatten. Llama 4 was supposed to be the inflection point. Instead, the benchmarks were manipulated and the real-world performance was middling. Not because he lacked conviction—because he paid attention to what the data was actually saying.

His diagnosis: predicting the next token in language space isn't how intelligence works. A four-year-old processes more visual data in four years than all of GPT-4's training combined. Yet that child learns to navigate the physical world. Our LLMs can pass the bar exam but can't figure out if a ball will clear a fence.

The implication: we've been solving the wrong problem at massive scale.

The funder's dilemma

Here's what makes this important for founders and investors: LeCun isn't alone. Ilya Sutskever left OpenAI making the same call. Gary Marcus has been saying it for years. The question isn't whether they're right—it's how to position when the entire industry is collectively getting less wrong, but slowly.

LeCun's answer is world models—systems that learn to predict and simulate physical reality, not language. Instead of tokens, predict future world states. Instead of chatbots, build systems that understand causality, physics, consequence.

Theoretically sound. Practically? Still fuzzy.

His JEPA architecture learns correlations in representation space, not causal relationships. Marcus, his longtime critic, correctly notes this: prediction of patterns is not understanding of causes. A system trained only on balls going up would learn that "up" is the natural law. It wouldn't understand gravity. Same correlation problem, new wrapper.

What founders should actually watch

The real lesson isn't which architecture wins. It's that capital allocation is broken and about to correct.

Hundreds of billions flowed into scaling LLMs because the returns were obvious and fast—chips, cloud, closed APIs. The infrastructure calcified. Investors became trapped in the installed base. When the problem shifted from "scale faster" to "solve different," the entire system had inertia.

Now LeCun, with €500 million and Meta's partnership, is betting that world models will see traction faster than skeptics expect. Maybe he's right. Maybe the robotics industry, tired of neural networks that fail on novel environments, will actually deploy these systems. Maybe autonomous vehicles finally move because prediction of physical futures beats reactive pattern-matching.

Or maybe it takes a decade and world models remain research while LLMs compound their current dominance.

For founders: this is the opening. When paradigm-level uncertainty exists, the cost of hedging drops. Build toward physical understanding, not linguistic sophistication. Robotics, manufacturing, autonomous systems—these verticals benefit immediately from world models and can't be solved by bigger LLMs. That's your wedge.

The adaptability play

What separates LeCun's move from ego-driven pivots: he didn't blame market conditions or bad luck. He said: "I was wrong about where to allocate effort, and here's why."

That transparency that public course-correction without shame changes how people bet on him.

The founders who win in 2026-2027 won't be the ones married to LLM scaling or world model purity. They'll be the ones who notice when reality diverges from the plan and move—fast, openly, without defensiveness.

LeCun just did that at scale.

The question isn't whether he's right about world models. It's whether his willingness to change publicly, with evidence, keeps him first-mover on whatever intelligence actually looks like next.