AutoResearch by Andrej Karpathy (2026): How Self-Improving Research Agents Run

By Paath.online2 April 20268 min read

AutoResearch is a new style of research agent: instead of only answering questions, it runs experiments, evaluates results, keeps improvements, and iterates—often overnight.

The Core Loop (Simple Mental Model)

  1. Define a goal in plain English (and constraints) using a program plan.
  2. Generate a candidate change by updating a training script (safely, in a controlled sandbox).
  3. Run the experiment for a short fixed budget on a GPU.
  4. Score the run using a clear metric (example: validation bits per byte / bpb-style metrics).
  5. Decide: keep improvements or discard them, then repeat.

Why AutoResearch Matters in 2026

Most AI progress still happens through iteration: tuning training code, trying new settings, checking metrics, and repeating. AutoResearch automates this “try → measure → keep” loop.

  • Less manual engineering for researchers and ML engineers
  • Faster search across hyperparameters and training recipes
  • A stronger feedback culture: changes must earn their keep via metrics

How to Practice (Beginner-Friendly)

You don’t need a supercomputer to practice the pattern. Start with a tiny training setup and a strict eval rule.

  • Pick one model/training script and keep everything else stable.
  • Define a single metric and a “stop if it gets worse” rule.
  • Let the agent change only a safe subset of hyperparameters/code.
  • Log each run and compare against a golden baseline.

Links You’ll Want Next

Build agentic research the right way

At Paath.online, you learn ML workflows with evaluation-first thinking: experiments, metrics, and what to change next.