autoresearch is an experimental AI research automation tool from Andrej Karpathy that shows how coding agents can run small LLM training experiments without constant human editing. The repository is intentionally narrow: it uses a simplified single-GPU version of nanochat, a training script, a preparation script, and a program.md file that tells the agent how to behave. The point is not to replace a full research lab. The point is to make the research loop explicit enough that an AI coding agent can propose a change, edit the training code, run a fixed experiment, inspect the metric, and decide whether the change helped.
The default workflow uses a fixed five-minute wall-clock training budget so experiments stay comparable on the same machine. The metric highlighted in the README is validation bits per byte, where lower is better. That design matters because it prevents an agent from simply making an experiment longer or larger and calling it progress. The agent is expected to work inside the repo, modify train.py, respect prepare.py as stable setup code, and use program.md as the human-editable operating brief.
For builders, autoresearch is useful as a reference pattern for agentic experimentation. It demonstrates how to wrap a real objective, a bounded runtime, and a feedback signal around a codebase so an agent can iterate. That pattern applies beyond nanochat: evaluation-driven prompt work, architecture tests, hyperparameter sweeps, data-cleaning variants, and automated ablation studies can all borrow the same structure. Humans still own the goal, constraints, and review; the agent handles repetitive edit-run-measure cycles.
The tool is not a turnkey AutoML platform. The README assumes Python 3.10+, uv, a single NVIDIA GPU, and enough comfort with model training to understand failures. It has no hosted dashboard, no commercial support plan, and no guarantee that overnight agent runs will produce better models. Its value is educational and practical for advanced users who want to see a compact example of autonomous research loops with modern coding agents.
OpenTools lists autoresearch as a developer AI tool because it gives builders a concrete template for research agents: small surface area, measurable loop, hard time budget, and source code that can be inspected or modified directly.
For evaluation, treat autoresearch as a builder-focused open-source project rather than a managed SaaS. Review the upstream README, license, install path, and issue activity before adopting it. Teams should test it in a disposable repository or development environment first, document the exact version they use, and keep production workflows behind normal code review, monitoring, and rollback practices.