ResearchMarch 19, 2026· 7 min read

We rebuilt the browser-agent leaderboards

A fairer, reproducible way to compare browser agents on real tasks.

Marco T.

Engineering

Benchmarks age badly. We rebuilt our browser-agent leaderboards around reproducible tasks and a fixed harness, so the numbers mean the same thing every time you read them.

What we measure

Task success rate on a fixed set of real sites
Median steps to completion
Wall-clock time per task

Open by default

The harness and tasks are open so anyone can rerun them. A leaderboard you cannot reproduce is just a screenshot.

Build it on Ferr

Launch your first cloud browser for free.

Start For Free

Keep reading

All articles

ResearchMarch 12, 2026

An autopsy of a Claude Code deep-research run

We traced a long deep-research session step by step to see where the time and tokens went.

Read article8 min read

ResearchSeptember 18, 2025

Benchmarking remote browsers

How we measure session start time, throughput, and stability — and what the numbers say.

Read article7 min read

Launch WeekJune 27, 2026

Launch Week v3: everything we shipped

Five days, five launches. Here is the full recap of Launch Week v3, from faster cold starts to Ferr Skills.

Read article6 min read