Generative Biology Is Already Clinical. So Why Are Founders Still Sleeping?

Generate:Biomedicines just announced Phase 3 trials for GB-0895, an antibody entirely designed by AI, recruiting patients from 45 countries as of late 2025. Isomorphic Labs has human trials "very close." That's not hype. That's proof that AI-designed drugs work in humans.

And the market hasn't priced this in yet.

Generative biology, applying the same transformer architectures behind ChatGPT to protein design doesn't incrementally improve drug discovery. It compresses it. Traditional timelines: 6 years from target to first human dose. Generative biology: 18-24 months. That's not faster iteration. That's a category shift.

Here's what's actually happening: A handful of well-funded companies have already won the scaling race. Profluent's ProGen3 model demonstrated something critical that scaling laws (bigger models = better results) apply to protein design just like they do to LLMs. The company raised $106M in Series B funding in November 2025. EvolutionaryScale built ESM3, a 98-billion-parameter model trained on 2.78 billion proteins, and created novel GFP variants that simulate 500 million years of evolution computationally. Absci is validating 100,000+ antibody designs weekly in silico, reducing discovery cycles from years to months.

These aren't startups anymore. They're infrastructure.

The Market Opportunity Is Massive, But Concentrated

The AI protein design market is $1.5B today (2025) and grows to $7B by 2033 (25% CAGR). Protein engineering more broadly: $5B → $18B in the same window. But here's the friction: success requires vertical integration. Algorithms alone are defensible for exactly six months. What matters is the ability to design, synthesize, test, and iterate at scale: wet lab automation, manufacturing readiness, regulatory playbooks.

Generate raised $700M+ because it built all three. Profluent raised $150M because it owns the data and the model. Absci went public because it combined proprietary platform with clinical validation. The solo-algorithm play? Dead on arrival.

This matters for founders evaluating entry points. The winning thesis isn't "better protein design." It's "compressed drug discovery + manufacturing at scale + regulatory clarity." Pick one of those three and you're a feature. Own all three and you're a platform.

Follow the Partnerships, Not the Press Releases

Novartis: $1B deal with Generate:Biomedicines (Sept 2024). Bristol Myers Squibb: $400M potential with AI Proteins (Dec 2024). Eli Lilly + Novartis: Both partnered with Isomorphic Labs. Corteva Agrisciences: Multi-year collab with Profluent on crop gene editing.

These deals aren't about technology proving. They're about risk transfer. When Novartis commits $1B and strategic alignment, they're not hedging on whether AI-designed proteins work they're betting on speed-to-market mattering more than incremental efficacy improvements. That's a macro signal: pharma's risk tolerance is shifting from "is it better?" to "can we deploy it in 36 months?"

For investors, this is the tell. Follow where the check sizes are growing, not where the valuations are highest.

The Real Risk Isn't Technical—It's Regulatory and Biosecurity

Can generative biology design novel proteins? Yes. Can those proteins fold predictably? Mostly. Will they work in vivo? That's the test happening right now in Phase 3 trials.

But the bigger risk is slower: regulatory alignment. Agencies are adapting, but they're not leading. Gene therapy has 3,200 trials globally. Only a fraction navigated the approval gauntlet successfully. AI-designed therapeutics will face the same friction unless founders invest heavily in regulatory affairs early not late.

And then there's dual-use risk. Generative biology lowers barriers to misuse. AI models could design pathogens or toxins for bad actors. This isn't hypothetical, it's why 94% of countries lack biosecurity governance frameworks. Founders that build secure-by-design architectures and engage proactively with regulators on dual-use mitigation will differentiate themselves sharply from those that don't.

The Next 24 Months: Clinical Data Wins. Everything Else Is Narrative

Generate's Phase 3 readout will determine whether the market reprices generative biology from "interesting" to "inevitable." If it works, expect a flood of follow-on funding, accelerated IND filings, and a stampede of partnerships. If it fails or if safety signals emerge you'll see valuation compression and investor skepticism that lasts years.

For founders: don't chase market size. Chase clinical validation. For investors: don't chase valuations. Chase clinical milestones.

The inflection point is here. The question is whether you're positioned to capture it or just watch it pass.

Moltbook Isn’t a Reverse Turing Test — It’s a Containment Test

Naval called Moltbook the “new reverse Turing test,” and everyone immediately treated it like a profound milestone. I think it’s something else: a live-fire test of whether we can contain agentic systems once they’re networked together.

Let’s be precise. Moltbook is an AI-only social platform, roughly “Reddit, but for agents,” where humans can watch but not participate. The pitch is simple: observe how AI agents behave socially when left alone. Naval’s label is elegant because it implies the agents are now the judges—humans are the odd ones out.

But if you’re a founder or an operator, you should ignore the poetry and ask: what is the product really doing to the world?

Moltbook’s real innovation is not “AI social behavior.” It’s a new topology: lots of agents, from different builders, connected in a public arena where they can feed each other instructions, links, and narratives at scale. That’s not a reverse Turing test. It’s a coordination surface.​

And coordination surfaces create externalities.

In the old internet, humans spammed humans. In the new internet, agents will spam agents—except “spam” won’t just be annoying; it will be executable. If you give agents permissions (email, calendars, bank access, code execution, “tools”), and then you let them ingest untrusted content from a network like Moltbook, you are building the conditions for what security folks call the “lethal trifecta.”

This is where the discussion gets serious.

Forbes contributor Amir Husain’s critique is basically a warning about permissions: people are already connecting agents to real systems—home devices, accounts, encrypted messages, emails, calendars—and then letting those agents interact with unknown agents in a shared environment. That’s an attack surface, not a party trick. If the platform enables indirect prompt injection—malicious content that causes downstream agents to leak secrets or take unintended actions—then your “social experiment” becomes a supply chain problem.

You don’t need science fiction for this to go wrong. You just need one agent that can persuade another agent to do something slightly dumb, repeatedly, across thousands of interactions. We already know that when systems combine high permissions, external content ingestion, and weak boundaries, bad things happen—fast.

So here’s my different perspective:

Moltbook isn’t proving that agents are becoming “more human.” It’s proving that we’re about to repeat the Web2 security arc—except the users are autonomous processes with tools, and the cost of an error is not just misinformation, it’s action.

And yes, that matters for investors.

I’m optimizing for fund outcomes within a horizon, not for philosophical truth at year 12. The investable question is not “is this emergent intelligence?” It’s: “does this create durable value that survives the cleanup required to make it safe?”

If Moltbook becomes the standard sandbox for red-teaming agents—great. If it becomes the public square where autonomous tool-using systems learn adversarial persuasion from each other, that’s not a product category; that’s a systemic risk generator, and regulators will come for everyone adjacent to it.

What should founders do?

First, treat any agent-to-agent network as hostile-by-default. Second, sandbox tools like your company depends on it—because it does. Third, stop marketing autonomy until you can measure and bound it, because markets pay for narratives on the way up, and punish you when the story breaks.

Naval’s phrase is catchy. But the real test isn’t whether humans can still tell who’s who.

The real test is whether we can build agent networks that don’t turn “conversation” into “compromise.”

Oxford says “gut.” I say “objective + proof.”

Oxford’s The Impact of Artificial Intelligence on Venture Capital argues AI accelerates sourcing and diligence, but investment decisions stay human because durable moats are socially grounded conviction, gut feeling, and networks.

I agree with the workflow diagnosis. I disagree with the implied endgame.

Not because “gut” is fake—but because “gut” is often a label we apply when we haven’t defined success tightly enough, or when we don’t have a measurement loop that forces our beliefs to confront outcomes.

Dealflow is getting commoditized. The edge is moving.

AI expands visibility, speeds up pipelines, and pushes the industry toward shared tools and shared feeds. When everyone can scan more of the world, “who saw it first” decays.

But convergence of inputs does not imply convergence of results. The edge moves from access to learning rate.

The outlier problem isn’t mystical. It’s an evaluation problem.

Oxford’s strongest point is that the power-law outliers are indistinguishable from “just bad” in the moment, and that humans use conviction to step into ambiguity.

I accept that premise and I still think the conclusion is wrong.

Because “conviction” is not a supernatural faculty. It’s a policy under uncertainty. And policies can be evaluated.

If your decision rule can’t be backtested, it’s not conviction. It’s narrative.

Don’t try to read souls. Build signals you can audit.

Some firms try to extract psychology from language data. Sometimes it works as a cue; often it’s noisy. And founders adapt as soon as they sense the scoring system.

So the goal isn’t “measure personality with high accuracy.” The goal is: build signals that are legible, repeatable, falsifiable and then combine them with a process that forces updates when reality disagrees.

Verification beats vibes.

If founders optimize public narratives, then naive text scoring collapses into a Goodhart trap.

The difference between toy AI and investable AI is verification: triangulate claims, anchor them in time, reject numbers that can’t be sourced, and penalize inconsistency across evidence.

That’s how you turn unstructured noise into features you can actually test.

Status is a market feature—not a human moat.

Networks and brand matter because markets respond to them—follow-on capital, recruiting pull, distribution, acquisition gravity.

So yes: status belongs in the model.

But modeling status is not the same thing as needing a human network as the enduring edge. One is an input signal. The other is a claim about irreducible advantage.

If an effect is systematic, it’s modelable.

Objective function: I’m optimizing for fund outcomes.

A lot of debates about “AI can’t do VC” hide an objective mismatch.

If your target is “eventual truth at year 12,” you’ll privilege a certain kind of human judgment. If your target is “realizable outcomes within a fund horizon,” you’ll build a different machine.

I’m comfortable modeling hype—not because fundamentals don’t matter, but because time and liquidity are part of the label. Markets pay for narratives before they pay for final verdicts, and funds get paid on the path, not just the destination.

The punchline

Oxford is right about current practice: AI reshapes the funnel, while humans still own the final decision and accountability.

My reaction is that this is not a permanent moat. It’s a temporary equilibrium.

Define success precisely. Build signals that survive verification. Backtest honestly. Update fast.

That’s not gut.

That’s an investing operating system.

2026 is the year we stop confusing scaling with solving

I called neuro-symbolic AI a 600% growth area back when I analyzed 20,000+ NEURIPS papers. I wrote that world models would unlock the $100T bet because spatial intelligence beats text prediction. I predicted AGI would expose average VCs because LLMs struggle with complex planning and causal reasoning.

Now Ilya Sutskever—co-founder of OpenAI, the guy who built the thing everyone thought would lead to AGI—just said it out loud: "We are moving from the age of scaling to the age of research".

That's not a dip. That's a ceiling.

Here's what the math actually says:

Meta, Amazon, Microsoft, Google, and Tesla have spent $560 billion on AI capex since early 2024. They've generated $35 billion in AI revenue. That's a 16:1 spend-to-revenue ratio. AI-related spending now accounts for 50% of U.S. GDP growth. White House AI Czar David Sacks admitted that a reversal would risk recession.

The 2000 dot-com crash was contained because telecom was one sector. AI isn't. This is systemic exposure dressed up as innovation.

The paradigm that just died:

The Kaplan scaling laws promised a simple formula: 10x the parameters, 10x the data, 10x the compute = 10x better AI. It worked from GPT-3 to GPT-4. It doesn't work anymore. Sutskever's exact words: these models "generalize dramatically worse than people".

Translation: we hit the data wall. Pre-training has consumed the internet's high-quality text. Going 100x bigger now yields marginal, not breakthrough, gains. When your icon of deep learning says that, you're not in a correction—you're at the end of an era.

The five directions I've been tracking—now validated:

The shift isn't abandoning AI. It's abandoning the lazy idea that "bigger solves everything." Here's where the research-to-market gap is closing faster than most realize:

1. Neuro-symbolic AI (the 600% growth area I flagged)

I wrote that neuro-symbolic was the highest-growth niche with massive commercial gaps. Now it's in Gartner's 2025 Hype Cycle. Why? Because LLMs hallucinate, can't explain reasoning, and break on causal logic. Neuro-symbolic systems don't. Drug discovery teams are deploying them because transparent, testable explanations matter when lives are on the line. MIT-IBM frames it as layered architecture: neural networks as sensory layer, symbolic systems as cognitive layer. That separation—learning vs. reasoning—is what LLMs never had.

2. Test-time compute (the paradigm I missed, but now understand)

OpenAI's o1/o3 flipped the script: spend compute at inference, not just training. Stanford's s1 model—trained on 1,000 examples with budget forcing—beat o1-preview by 27% on competition math. That's proof that intelligent compute allocation beats brute scale. But there's a limit: test-time works when refining existing knowledge, not generating fundamentally new capabilities. It's a multiplier on what you already have, not a foundation for AGI.

3. Small language models (the efficiency play enterprises actually need)

Microsoft's Phi-4-Mini, Mistral-7B, and others with 1-10B parameters are matching GPT-4 in narrow domains. They run on-device, preserve privacy, cost 10x less, and don't require hyperscale infrastructure. Enterprises are deploying hybrid strategies: SLMs for routine tasks, large models for multi-domain complexity. That's not compromise—that's architecture that works at production scale.

4. World models (the $100T bet I wrote about)

I argued that world models—systems that build mental maps of reality, not just predict text—would define the next era. They're now pulling $2B+ in funding across robotics, autonomous vehicles, and gaming. Fei-Fei Li's World Labs hit unicorn status at $230M raised. Skild AI secured $1.5B for robotic world models. And of course Yann Lecun's new startup. This isn't hype—it's the shift from language to spatial intelligence I predicted.

5. Agentic AI (the microservices moment for AI)

Gartner reports a 1,445% surge in multi-agent inquiries from Q1 2024 to Q2 2025. By end of 2026, 40% of enterprise apps will embed AI agents, up from under 5% in 2025. Anthropic's Model Context Protocol (MCP) and Google's A2A are creating HTTP-equivalent standards for agent orchestration. The agentic AI market: $7.8B today, projected $52B by 2030. This is exactly the shift I described in AGI VCs—unbundling monolithic intelligence into specialized, composable systems.

What kills most AI deployments (and what I've been saying):

I wrote that the gap isn't technology—it's misaligned expectations, disconnected business goals, and unclear ROI measurement. Nearly 95% of AI pilots generate no return (MIT study). The ones that work have three things: clear kill-switch metrics, tight integration loops, and evidence-first culture.

Enterprise spending in 2026 is consolidating, not expanding. While 68% of CEOs plan to increase AI investment, they're concentrating budgets on fewer vendors and proven solutions. Rob Biederman of Asymmetric Capital Partners: "Budgets will increase for a narrow set of AI products that clearly deliver results and will decline sharply for everything else".

That's the bifurcation I predicted: a few winners capturing disproportionate value, and a long tail struggling to justify continued investment.

The punchline:

The scaling era gave us ChatGPT. The research era will determine whether we build systems that genuinely reason, plan, and generalize—or just burn a trillion dollars discovering the limits of gradient descent.

My bet: the teams that win are the ones who stop optimizing for benchmark leaderboards and start solving actual constraints—data scarcity, energy consumption, reasoning depth, and trust. The ones who recognized early that neuro-symbolic, world models, and agentic systems weren't academic curiosities but the actual path forward.

I've been tracking these shifts for two years. Sutskever's admission isn't news to anyone reading this blog—it's confirmation that the research-to-market timeline just accelerated.

Ego last, evidence first. The founders who internalized that are already building what comes next.