AutoResearch by Andrej Karpathy (2026): How Self-Improving Research Agents Run
By Paath.online•2 April 2026•8 min read
AutoResearch is a new style of research agent: instead of only answering questions, it runs experiments, evaluates results, keeps improvements, and iterates—often overnight.
The Core Loop (Simple Mental Model)
- Define a goal in plain English (and constraints) using a program plan.
- Generate a candidate change by updating a training script (safely, in a controlled sandbox).
- Run the experiment for a short fixed budget on a GPU.
- Score the run using a clear metric (example: validation bits per byte / bpb-style metrics).
- Decide: keep improvements or discard them, then repeat.
Why AutoResearch Matters in 2026
Most AI progress still happens through iteration: tuning training code, trying new settings, checking metrics, and repeating. AutoResearch automates this “try → measure → keep” loop.
- Less manual engineering for researchers and ML engineers
- Faster search across hyperparameters and training recipes
- A stronger feedback culture: changes must earn their keep via metrics
How to Practice (Beginner-Friendly)
You don’t need a supercomputer to practice the pattern. Start with a tiny training setup and a strict eval rule.
- Pick one model/training script and keep everything else stable.
- Define a single metric and a “stop if it gets worse” rule.
- Let the agent change only a safe subset of hyperparameters/code.
- Log each run and compare against a golden baseline.
Links You’ll Want Next
Build agentic research the right way
At Paath.online, you learn ML workflows with evaluation-first thinking: experiments, metrics, and what to change next.