Open source ai news: NanoLang tests LLM coding limits “A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax,” wrote Simon Willison, zeroing in on the pitch for NanoLang, a new project released by Jordan Hubbard, co-founder of FreeBSD and alum of Apple and NVIDIA.

The language showed up this week just in time for Willison’s late-night write-up on January 19, 2026 at 11:58 pm, and it set off a tidy experiment: could a modern coding assistant generate a real program in a brand-new language if the language itself was tuned for the machine? A language built for LLMs lands — and a bold claim “A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax.” — Simon Willison’s Weblog “NanoLang transpiles to C for native performance while providing a clean, modern syntax optimized for both human readability and AI code generation.” — Simon Willison’s Weblog NanoLang’s headline features are aimed squarely at code-generation agents: strip ambiguity out of the grammar, bake tests into the workflow, and target a conservative backend by transpiling to C.

NanoLang LLM language: Hubbard published the project to GitHub under jordanhubbard/nanolang, and the repo includes a document called MEMORY.

md explicitly meant to feed language models the essentials.“Purpose: This file is designed specifically for Large Language Model consumption.

It contains the essential knowledge needed to generate, debug, and understand NanoLang code.Pair this with spec.

json for complete language coverage.” — NanoLang MEMORY. md (via Simon Willison’s Weblog ) That’s a straight-up admission that the audience here isn’t just human.Most new languages arrive with tutorials and syntax guides; NanoLang ships with a menu of tokens for an LLM to chew on.The C transpilation angle promises predictable builds, but no benchmarks accompanied the release.That lack isn’t a deal-breaker for an early language, just a reminder that “native performance” as a phrase hides a universe of trade-offs.Willison’s one-shot to working code: what changed Willison didn’t stop at summarizing the README.

He tried to generate a working NanoLang program two different ways, and only one of them worked. First attempt: He used his llm CLI plus llm-anthropic, grabbed MEMORY. md directly from raw.githubusercontent.com, and asked for a one-shot program.That compile failed.

“The resulting code… Did not compile.” — Simon Willison’s Weblog Second attempt: He cloned the NanoLang repo and opened it in Claude Code, running Claude Opus 4.5. With the examples directory in view, he asked for a command-line Mandelbrot fractal generator. That worked.

“…

And it worked! Claude happily grepped its way through the various examples/ and built me a working program.” — Simon Willison’s Weblog Two small details matter here.

The one-shot prompt only had a single source of truth: MEMORY.md.Inside Claude Code, the model could “grep” across examples/ and treat the repo more like a project than a trivia quiz.And the output wasn’t a toy hello-world either; a Mandelbrot CLI is a decent stress test for loops, numerical types, and simple I/O.It’s still a compact program, but it exercises more than a print statement.There’s a catch: the success required a full project checkout and curated context.That’s closer to how people actually code in an IDE, but it’s miles away from the dream that a single prompt can conjure correct, compilable code in a vacuum.That dream died on the first compile error.Two paths to AI coding: constrain the language vs. Scale the agents NanoLang is one bet: shrink ambiguity so a single model can hit the target.

Cursor is trying the opposite: throw lots of agents at the problem and coordinate them. Willison pointed to a post by Cursor’s Wilson Lin that chronicles what happened when they leaned into orchestration patterns across a large set of autonomous coding agents.“This post describes what we have learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.” — Simon Willison’s Weblog (on Cursor’s experiments) “They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks – similar to how Claude Code uses sub-agents.

Each cycle ended with a judge agent deciding if the project was completed or not.” — Simon Willison’s Weblog That’s an explicit planner/worker/judge flow—classic multi-agent playbook—with hard numbers attached: hundreds of agents, over a million lines of code, trillions of tokens processed.The scale is eye-catching, though the post (as relayed by Willison) didn’t break down error rates, rework, cost, or the human oversight needed to keep the system pointed at the right target.Those are the less glamorous metrics that decide whether the swarm approach is practical outside a demo.Put side by side, the strategies are almost opposites.NanoLang trims the search space: fewer footguns in the syntax, tests everywhere, examples in the repo, and a mature backend in C. Cursor leans into parallelism and structured roles to tolerate ambiguity in the problem itself. One is a tight loop around a small, testable target.

The other is a big net trying to cover a wide ocean. Why NanoLang helped the model: tests, examples, and tooling NanoLang’s design makes it friendlier to models, but the way Willison fed context to the assistant is equally important. Cause: The language has unambiguous syntax and requires tests.Effect: the model has fewer ambiguous choices at generation time, and there’s a built-in pass/fail signal.

Cause: The repo ships with examples. Effect: Claude Code can search and pattern-match those examples into a new program—in this case, a Mandelbrot CLI.Cause: An IDE-style session in Claude Code, with access to files and a project tree using Opus 4.5.Effect: a tighter feedback loop than a single prompt to a hosted API, which matched Willison’s experience with llm / llm-anthropic. That pairing—language constraints plus concrete examples plus richer context in the tool—was enough to cross the compile line.It doesn’t prove generality.We don’t know how many NanoLang tasks would succeed under the one-shot prompt versus the IDE session, or how often a model would trip over edge cases in the type system or runtime. We also don’t have performance numbers beyond the promise of C as a target. The post offers a narrative of one failed compile and one working program, which is still useful signal.

Willison’s takeaway isn’t shy: “I have suspected for a while that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.” — Simon Willison’s Weblog It’s a reasonable hunch, especially when the new language rolls out a welcome mat called MEMORY. md.

The Mandelbrot demo hints at a workflow where early language adopters ship lots of small examples and self-tests, and assistants fill in the blanks. Whether that holds for larger projects or libraries is an open that this single experiment doesn’t answer.

What this means for today’s coding assistants The NanoLang episode is a tidy case study for anyone tracking open source ai news. A model that struck out in a bare prompt succeeded once it could read nearby examples and reason inside a code-aware environment. That lines up with a common developer instinct: code better with your project tree visible than with a blank text box.

Cursor’s orchestration report from Wilson Lin points at another reality: coordinating many agents can generate a lot of code and even more tokens.That approach may demand careful guardrails, logging, and the kind of testing NanoLang makes mandatory.The two ideas are not in conflict.

A constrained language is a gift to the solo agent; a rigorous test culture is oxygen for a swarm. Missing pieces in both stories are the mundane ones: costs, rate limits, failure recovery, and human review.Willison’s post doesn’t get into those details, and Lin’s numbers, as summarized on Willison’s site, emphasize scale over postmortems.

These are not criticisms so much as reminders that the interesting part of AI coding is shifting from “can it write code?” to “how do we structure the work so we trust what it wrote?” Where to read more Simon Willison’s write-up and quotes: simonwillison.net Cursor/Wilson Lin’s orchestration post: tessl.io/blog Community chatter and related experiments: reddit.com/r/artificial NanoLang’s GitHub repository is listed as jordanhubbard/nanolang, and Willison pulled MEMORY. md from the raw GitHub URL as context for his first attempt. That detail matters: he treated the file the way an assistant would—no hand-holding, just the intended snack for an LLM—and the compile failed.

The win came when the model had a whole project to explore.

Why NanoLang helped the model: tests, examples, and tooling

NanoLang’s design makes it friendlier to models, but the way Willison fed context to the assistant is equally important.

Cause: The language has unambiguous syntax and requires tests. Effect: the model has fewer ambiguous choices at generation time, and there’s a built-in pass/fail signal.
Cause: The repo ships with examples. Effect: Claude Code can search and pattern-match those examples into a new program—in this case, a Mandelbrot CLI.
Cause: An IDE-style session in Claude Code, with access to files and a project tree using Opus 4.5. Effect: a tighter feedback loop than a single prompt to a hosted API, which matched Willison’s experience with llm/llm-anthropic.

That pairing—language constraints plus concrete examples plus richer context in the tool—was enough to cross the compile line. It doesn’t prove generality. We don’t know how many NanoLang tasks would succeed under the one-shot prompt versus the IDE session, or how often a model would trip over edge cases in the type system or runtime. We also don’t have performance numbers beyond the promise of C as a target. The post offers a narrative of one failed compile and one working program, which is still useful signal.

Willison’s takeaway isn’t shy:

“I’ve suspected for a while that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.” — Simon Willison’s Weblog

It’s a reasonable hunch, especially when the new language rolls out a welcome mat called MEMORY.md. The Mandelbrot demo hints at a workflow where early language adopters ship lots of small examples and self-tests, and assistants fill in the blanks. Whether that holds for larger projects or libraries is an open that this single experiment doesn’t answer.

What this means for today’s coding assistants

The NanoLang episode is a tidy case study for anyone tracking open source ai news. A model that struck out in a bare prompt succeeded once it could read nearby examples and reason inside a code-aware environment. That lines up with a common developer instinct: code better with your project tree visible than with a blank text box.

Cursor’s orchestration report from Wilson Lin points at another reality: coordinating many agents can generate a lot of code and even more tokens. That approach may demand careful guardrails, logging, and the kind of testing NanoLang makes mandatory. The two ideas are not in conflict. A constrained language is a gift to the solo agent; a rigorous test culture is oxygen for a swarm.

Missing pieces in both stories are the mundane ones: costs, rate limits, failure recovery, and human review. Willison’s post doesn’t get into those details, and Lin’s numbers, as summarized on Willison’s site, emphasize scale over postmortems. These are not criticisms so much as reminders that the interesting part of AI coding is shifting from “can it write code?” to “how do we structure the work so we trust what it wrote?”

Where to read more

Simon Willison’s write-up and quotes: simonwillison.net
Cursor/Wilson Lin’s orchestration post: tessl.io/blog
Community chatter and related experiments: reddit.com/r/artificial

NanoLang’s GitHub repository is listed as jordanhubbard/nanolang, and Willison pulled MEMORY.md from the raw GitHub URL as context for his first attempt. That detail matters: he treated the file the way an assistant would—no hand-holding, just the intended snack for an LLM—and the compile failed. The win came when the model had a whole project to explore. More details at NanoLang LLM language. More details at Jordan Hubbard NanoLang.

Open source ai news: NanoLang tests LLM coding limits “A minimal, LLM-friendly programming language with mandatory testing and unambiguous syntax,” wrote Simon Willison, zeroing in on the pitch for NanoLang, a new project released by Jordan Hubbard, co-founder of FreeBSD and alum of Apple and NVIDIA.

NanoLang LLM language: Hubbard published the project to GitHub under jordanhubbard/nanolang, and the repo includes a document called MEMORY.

md explicitly meant to feed language models the essentials.“Purpose: This file is designed specifically for Large Language Model consumption.

It contains the essential knowledge needed to generate, debug, and understand NanoLang code.Pair this with spec.

He tried to generate a working NanoLang program two different ways, and only one of them worked. First attempt: He used his llm CLI plus llm-anthropic, grabbed MEMORY. md directly from raw.githubusercontent.com, and asked for a one-shot program.That compile failed.

“…

And it worked! Claude happily grepped its way through the various examples/ and built me a working program.” — Simon Willison’s Weblog Two small details matter here.

The one-shot prompt only had a single source of truth: MEMORY.md.Inside Claude Code, the model could “grep” across examples/ and treat the repo more like a project than a trivia quiz.And the output wasn’t a toy hello-world either; a Mandelbrot CLI is a decent stress test for loops, numerical types, and simple I/O.It’s still a compact program, but it exercises more than a print statement.There’s a catch: the success required a full project checkout and curated context.That’s closer to how people actually code in an IDE, but it’s miles away from the dream that a single prompt can conjure correct, compilable code in a vacuum.That dream died on the first compile error.Two paths to AI coding: constrain the language vs. Scale the agents NanoLang is one bet: shrink ambiguity so a single model can hit the target.

Cursor is trying the opposite: throw lots of agents at the problem and coordinate them. Willison pointed to a post by Cursor’s Wilson Lin that chronicles what happened when they leaned into orchestration patterns across a large set of autonomous coding agents.“This post describes what we have learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching them write over a million lines of code and trillions of tokens.” — Simon Willison’s Weblog (on Cursor’s experiments) “They ended up running planners and sub-planners to create tasks, then having workers execute on those tasks – similar to how Claude Code uses sub-agents.

Each cycle ended with a judge agent deciding if the project was completed or not.” — Simon Willison’s Weblog That’s an explicit planner/worker/judge flow—classic multi-agent playbook—with hard numbers attached: hundreds of agents, over a million lines of code, trillions of tokens processed.The scale is eye-catching, though the post (as relayed by Willison) didn’t break down error rates, rework, cost, or the human oversight needed to keep the system pointed at the right target.Those are the less glamorous metrics that decide whether the swarm approach is practical outside a demo.Put side by side, the strategies are almost opposites.NanoLang trims the search space: fewer footguns in the syntax, tests everywhere, examples in the repo, and a mature backend in C. Cursor leans into parallelism and structured roles to tolerate ambiguity in the problem itself. One is a tight loop around a small, testable target.

The other is a big net trying to cover a wide ocean. Why NanoLang helped the model: tests, examples, and tooling NanoLang’s design makes it friendlier to models, but the way Willison fed context to the assistant is equally important. Cause: The language has unambiguous syntax and requires tests.Effect: the model has fewer ambiguous choices at generation time, and there’s a built-in pass/fail signal.

The Mandelbrot demo hints at a workflow where early language adopters ship lots of small examples and self-tests, and assistants fill in the blanks. Whether that holds for larger projects or libraries is an open that this single experiment doesn’t answer.

Cursor’s orchestration report from Wilson Lin points at another reality: coordinating many agents can generate a lot of code and even more tokens.That approach may demand careful guardrails, logging, and the kind of testing NanoLang makes mandatory.The two ideas are not in conflict.

A constrained language is a gift to the solo agent; a rigorous test culture is oxygen for a swarm. Missing pieces in both stories are the mundane ones: costs, rate limits, failure recovery, and human review.Willison’s post doesn’t get into those details, and Lin’s numbers, as summarized on Willison’s site, emphasize scale over postmortems.

The win came when the model had a whole project to explore.

Why NanoLang helped the model: tests, examples, and tooling

NanoLang’s design makes it friendlier to models, but the way Willison fed context to the assistant is equally important.

Cause: The language has unambiguous syntax and requires tests. Effect: the model has fewer ambiguous choices at generation time, and there’s a built-in pass/fail signal.
Cause: The repo ships with examples. Effect: Claude Code can search and pattern-match those examples into a new program—in this case, a Mandelbrot CLI.
Cause: An IDE-style session in Claude Code, with access to files and a project tree using Opus 4.5. Effect: a tighter feedback loop than a single prompt to a hosted API, which matched Willison’s experience with llm/llm-anthropic.

Willison’s takeaway isn’t shy:

“I’ve suspected for a while that LLMs and coding agents might significantly reduce the friction involved in launching a new language. This result reinforces my opinion.” — Simon Willison’s Weblog

What this means for today’s coding assistants

Where to read more

Simon Willison’s write-up and quotes: simonwillison.net
Cursor/Wilson Lin’s orchestration post: tessl.io/blog
Community chatter and related experiments: reddit.com/r/artificial

NanoLang LLM language tested on Simon Willison’s blog

NanoLang LLM language: Hubbard published the project to GitHub under jordanhubbard/nanolang, and the repo includes a document called MEMORY.

“…

The Mandelbrot demo hints at a workflow where early language adopters ship lots of small examples and self-tests, and assistants fill in the blanks. Whether that holds for larger projects or libraries is an open that this single experiment doesn’t answer.

Why NanoLang helped the model: tests, examples, and tooling

What this means for today’s coding assistants

Where to read more

NanoLang LLM language tested on Simon Willison’s blog

NanoLang LLM language: Hubbard published the project to GitHub under jordanhubbard/nanolang, and the repo includes a document called MEMORY.

“…

The Mandelbrot demo hints at a workflow where early language adopters ship lots of small examples and self-tests, and assistants fill in the blanks. Whether that holds for larger projects or libraries is an open that this single experiment doesn’t answer.

Why NanoLang helped the model: tests, examples, and tooling

What this means for today’s coding assistants

Where to read more