Ornith-1.0: Open-Weight Agentic Coding Model from DeepReinforce

DeepReinforce's Ornith-1.0 is a new MIT-licensed model family for agentic coding, with variants from 9B to 397B built on Gemma 4 and Qwen 3.5.

TL;DR

DeepReinforce released Ornith-1.0, a family of open-weight (MIT licensed) models purpose-built for agentic coding tasks, with early hands-on reports suggesting it handles multi-step tool use well.

What happened

DeepReinforce, a lab without much public history before a June 2025 paper on CUDA optimization, shipped Ornith-1.0 as their first model release. The family spans four variants: 9B Dense, 31B Dense, 35B MoE, and 397B MoE. (MoE, or mixture-of-experts, is an architecture where only a fraction of the model’s parameters are active for any given token — so a 397B MoE runs cheaper than its parameter count implies.) The models are fine-tuned on top of Gemma 4 and Qwen 3.5, both Apache 2.0 licensed, which means the MIT license on Ornith-1.0 should actually hold up cleanly.

Simon Willison ran the 35B MoE variant locally via LM Studio using the Q4_K_M GGUF quantization (about 20GB on disk) and connected it to the Pi coding agent. His tests included multi-turn codebase navigation against a real Datasette checkout — asking it to trace specific UI interactions across the source — and it handled both tasks without issues. He clocked SVG generation at 103 tokens per second locally.

Code example

# Pull the 35B GGUF via Hugging Face CLI
pip install huggingface_hub

python -c "
from huggingface_hub import hf_hub_download
hf_hub_download(
    repo_id='deepreinforce-ai/Ornith-1.0-35B-GGUF',
    filename='ornith-1.0-35b-Q4_K_M.gguf',
    local_dir='./models'
)
"

# Then load in llama.cpp or LM Studio and point your OpenAI-compatible client at localhost

Why it matters

The genuinely interesting thing here isn’t the benchmark claims — “state-of-the-art among open-source models of comparable size” is something you read every week and it usually means little without knowing which benchmarks, on what eval set, with what prompting strategy. What’s more interesting is the specific design goal: a model trained to self-scaffold, meaning it’s supposed to manage its own agentic loop across many tool calls rather than just respond to single prompts. Most coding models are optimized for “given a function signature, complete the body.” Ornith-1.0 is aiming at “given a goal and a codebase, figure out what to look at and in what order.” That’s a harder and more useful problem.

The clean licensing story also matters. Gemma 4 dropped the restrictive Gemma Terms of Use that made earlier Gemma models awkward to build on commercially. Combined with Qwen 3.5’s Apache 2.0 license, you actually have a clear path to shipping products on top of this. For a lab with almost no public track record, releasing under MIT with genuinely permissive base models is a credibility signal worth noting.

What to watch

Whether the agentic performance holds on standardized agent evals (SWE-bench, for instance) rather than informal demos — that’s the real test of the “self-scaffolding” claim.
Whether DeepReinforce publishes more about their training methodology; right now the lab is essentially unknown, and reproducibility matters for trusting fine-tune results at this scale.