Gimlet Labs Raises $80M to Break NVIDIA GPU Lock-In
Gimlet Labs just raised $80 million to do something the entire AI industry has been quietly begging for: make AI models run across every chip architecture simultaneously. NVIDIA, AMD, Intel, ARM, Cerebras, d-Matrix — all at once, on the same workload. Menlo Ventures led the Series A, bringing total funding to $92 million for a company that's barely a year old and already counting a top-three frontier lab and a top-three hyperscaler among its customers.
This isn't just another infrastructure startup chasing the AI gold rush. This is a direct assault on the most lucrative lock-in in tech history.
The Problem Everyone Knows But Nobody Solved
Here's the dirty secret of AI infrastructure in 2026: most GPU clusters are catastrophically underutilized. Companies are spending billions on NVIDIA hardware, then running workloads that leave massive compute on the table. The reason is simple — CUDA lock-in. NVIDIA's software ecosystem is so dominant that switching costs are astronomical, even when alternative silicon would be faster, cheaper, or more power-efficient for specific tasks.
Gimlet Labs, founded by Zain Asgar, Michelle Nguyen, Omid Azizi, and Natalie Serrino, attacks this problem at the software layer. Asgar is a former NVIDIA GPU architect and Google AI engineering lead — he knows exactly where the bodies are buried. The team previously built Pixie Labs, which was acquired by New Relic. They understand infrastructure abstraction at a deep level.
Their core insight: AI inference workloads aren't monolithic. A single model can be disaggregated into components, and each component has different computational characteristics. Some parts are memory-bound. Some are compute-bound. Some are latency-sensitive. Forcing all of them onto the same chip architecture is, in Gimlet's framing, an engineering failure masquerading as a business model.
How It Actually Works
Gimlet's proprietary software stack does something genuinely novel. It takes an AI workload — say, an agentic reasoning chain — and automatically maps its components to the most efficient silicon available. One piece might run on an NVIDIA H100. Another on an AMD Instinct. Another on a d-Matrix Corsair accelerator. The model doesn't care. The developer doesn't have to rewrite a single line of code.
The claimed results are striking: 3-10x faster inference for the same cost and power envelope. If those numbers hold at scale — and the fact that top-tier frontier labs are already paying eight figures suggests they do — the implications for AI infrastructure economics are enormous.
Think about what this means for total cost of ownership. If you can extract 3x more performance from the same power budget by intelligently routing across heterogeneous silicon, you've just cut your inference cost by two-thirds. In an industry that's projected to spend hundreds of billions on inference infrastructure over the next five years, that's not an optimization. That's a paradigm shift.
The d-Matrix Partnership Signals the Future
Alongside the funding announcement, Gimlet revealed a partnership with d-Matrix to integrate d-Matrix's Corsair accelerators into the Gimlet Cloud. The target: 10x speedups for agentic AI inference workloads by the second half of 2026, with massive improvements in throughput-per-watt compared to GPU-only solutions.
This is the real play. Gimlet isn't just a compatibility layer — it's building the marketplace where every chip vendor can compete on merit for every sub-task within an AI workload. That's a fundamentally different power dynamic than today's NVIDIA-dominated monoculture.
The Investor Signal
The cap table tells a story. Menlo Ventures led the Series A, with Factory (who led the $12M seed), Eclipse Ventures, Prosperity7, and Triatomic participating. But look at the seed-round angels: Intel CEO Lip-Bu Tan, Figma CEO Dylan Field, former VMware CEO Raghu Raghuram, and Notion COO Akshay Kothari.
When Intel's sitting CEO personally invests in a startup that makes it easier to run AI on non-NVIDIA chips, that's not subtle. It's a strategic endorsement wrapped in an angel check. Every chip vendor not named NVIDIA has a vested interest in a world where hardware selection is based on workload characteristics rather than software lock-in.
Why This Matters Now
The timing is perfect, and here's why. AI is shifting from training to inference. Training is a batch problem — you can tolerate high latency, you optimize for throughput, and NVIDIA's A100/H100/B200 stack is genuinely excellent at it. Inference is different. It's real-time. It's cost-sensitive. It's happening at massive scale. And it's where the actual money gets made.
As agentic AI goes mainstream, inference workloads are becoming far more complex. A single agent might need to reason, retrieve, generate, evaluate, and act — each step with different computational profiles. Running all of that on the same GPU architecture is like using a Formula 1 car for both highway driving and parallel parking. Technically possible. Economically insane.
Gimlet's bet is that the future of AI compute isn't about the best chip — it's about the best combination of chips, orchestrated by software that's smarter than any single hardware vendor's ecosystem.
The Risks
Let's be honest about the challenges. NVIDIA isn't going to watch its moat get drained without a fight. CUDA's ecosystem advantage is real and deep — frameworks, libraries, developer tooling, community knowledge. Gimlet needs its abstraction layer to be so good that developers never feel the friction of multi-silicon deployment.
There's also the execution risk of maintaining compatibility across six different chip architectures as each vendor ships new silicon on aggressive timelines. The testing matrix alone is a nightmare. And claiming 3-10x improvements is bold — those numbers need to hold across diverse workloads, not just cherry-picked benchmarks.
But with ~30 employees, eight-figure revenues, a tripled customer base, and now $92 million in the bank, Gimlet has the resources and market validation to prove it out. The company emerged from stealth just five months before this raise. That velocity is unusual.
The Bottom Line
Gimlet Labs is building the middleware layer that makes AI hardware a commodity. If they succeed, every chip maker competes on silicon merit rather than software lock-in. NVIDIA still wins plenty of workloads — their chips are genuinely great — but they win them on performance, not on switching costs.
For AI companies burning cash on inference at scale, this could be the single most important infrastructure bet of 2026. The AI industry spent the last four years building an NVIDIA dependency. Gimlet Labs just raised $80 million to make that dependency optional.
That's not a funding round. That's a declaration of war.
Follow ultrathink.ai for sharp analysis of the funding rounds, technology shifts, and infrastructure bets reshaping AI. No fluff — just the signal that matters.
This article was ultrathought.
Get breaking news, funding rounds, and analysis delivered to your inbox. Free forever.