2026-03-02

Runtime now runs end-to-end on Mistral models

Runtime clients can now ship full features without a single call to a proprietary model. From intent capture to pull request, including PRD creation and code generation.

By Daniel

Runtime clients can now ship full features without a single call to a proprietary model. From intent capture to pull request, including PRD creation and code generation. The entire pipeline can run on Devstral 2.

Why most AI coding systems default to large models

Most AI-assisted engineering tools compensate for missing structure with bigger models. When system context is implicit, the model has to infer architecture, reconstruct domain rules, guess at invariants, and approximate historical decisions. That drives high token consumption, increased randomness, and architectural drift. The instinct is to throw a more powerful model at the problem. I went the other way.

Context as infrastructure

Runtime is not a coding assistant. It is a governance and control plane for AI-driven software delivery.

Before any agent runs, Runtime leverages a Context File System built during the setup phase: system maps, domain boundaries, interface contracts, invariants, ADR history, coding standards, and risk constraints. It is versioned, structured, and deterministic.

Agents don't figure out the system. They operate within a defined model of it. The language model executes against engineered constraints. The architecture carries the cognitive load, not the model.

A pipeline, not a prompt

Runtime PRD.jpg

Every feature flows through a governed pipeline: intent capture, PRD generation, architectural decision record, delivery plan, code generation, code review, test generation, and pull request creation.

Each stage is handled by a specialised agent operating on structured context. The output of one stage becomes a controlled input for the next. This reduces variance and increases traceability across the full delivery chain.

LLM-agnostic by design

Runtime's architecture is model-independent. The canonical context layer and orchestration pipeline don't assume any specific provider. We've run this stack on Opus, Codex, and now end-to-end on Mistral.

When your context is engineered and your agents are governed, the model becomes a replaceable component, not a structural dependency. You choose based on cost, latency, data residency, or capability, and you switch without rearchitecting.

Why Devstral 2

Devstral 2 is built for structured reasoning, tool usage, and code generation at low latency. In a multi-agent workflow, those properties compound. A single feature can trigger dozens of agent invocations. If each one requires an excessive token budget, the economics of agentic delivery collapse before you reach any useful scale.

Runtime Code.jpg

Runtime's architecture makes efficient models viable. When agents receive precise, structured context rather than raw codebases, the reasoning burden drops. The model does what it's good at: generating code within well-defined constraints.

Sovereign agentic engineering

This is not just about efficiency. It's about infrastructure independence. Devstral 2 is a European model. Runtime is a European platform. The canonical context never leaves your infrastructure. For organisations operating under data residency requirements, IP sensitivity, or regulatory constraints, this matters.

As of today, I am not aware of any other platform running fully governed agentic delivery end-to-end on Mistral models.

The ability to run fully autonomous software delivery without dependency on non-European hyperscaler APIs isn't a philosophical preference. For a growing number of organisations, it's a prerequisite.

What this means

At this stage I cannot claim this is cheaper or faster than running on flagship proprietary models. We still have a lot of features to build in production until we can make that claim. What I can say is that it works. End to end. In production workflows. The architecture is designed to make token economy, sovereignty, and predictable costs achievable. And this is measured in practice.

But the thesis is already clear: the future of agentic engineering is not bigger models. It's engineered context.

If this sounds like your problem

If you run a software organisation where sovereignty, intellectual property, or token economics are real constraints, not theoretical ones, I’d like to hear from you. We’re actively benchmarking Runtime across different codebases and delivery contexts. The best way to validate this architecture is against real-world constraints, not synthetic demos.

If you want to see what end-to-end agentic delivery looks like on your stack, get in touch!