Table of Contents
1. The $80 day that made us pause
We had been happily paying for GitHub Copilot Business. The number was comfortable - roughly $700 a month. Predictable. Easy to justify. The workflow was already the heavy one: a single developer running four agents in parallel, full eight-hour workdays. One agent on implementation, one on tests and refactoring, one turning the work into long technical articles, one on research and spikes. That was the normal operating mode.
Nothing in the process changed. We kept running the exact same four-agent parallel setup for one person. What changed was what that workflow suddenly cost.
The first time we saw a daily charge that looked like $80, we assumed it was a mistake. It wasn't. At that rate you're looking at roughly $2,000 a month just for the AI coding layer - before you pay any humans.
Quick napkin math: $80/day × ~22 working days = $1,760/month. That's 2.5× what we budgeted for "unlimited" AI assistance. And we were only getting started with the intensity.
The worst part wasn't even the absolute number. It was that the workflow we had already settled into - four agents grinding in parallel for one developer - was no longer economically viable. The same intensity of work that used to be covered by a flat subscription was now burning $80 a day.
2. What "agentic coding" actually looked like
This wasn't one engineer occasionally asking Copilot to finish a function. This was a deliberate attempt to multiply output.
But what really ate the tokens was analyzing cross-cutting concerns across many files in large codebases.
We would spin up many independent sessions (different models, different tools, sometimes different machines) and throw them at slices of the same project. One agent implemented a feature according to the spec. A second one immediately started writing tests and hunting edge cases. A third turned the whole thing into a long article - the 2,000-4,000 word technical posts about browser extensions and CRXJS that we publish.
Writing those extension guides with heavy AI help was actually one of the biggest drivers of consumption. The model wasn't just completing lines - it had to hold the full mental model of Manifest V3, Shadow DOM gotchas, CRXJS internals, real production patterns from 100+ shipped versions, and turn it all into coherent, readable, lightly clickbaity prose.
Every time an agent needed to "think", it generated tokens. Every time it ran a tool, searched context, or iterated on a bad output, more tokens. Four of these things running for eight hours is not the same usage profile as four engineers lightly tab-completing.
3. The math nobody wants to show
The uncomfortable truth is that most public discussion around AI coding tools still talks in terms of "10× developer" anecdotes and flat monthly fees. The reality of heavy agentic usage looks more like this:
Before (same workflow, flat price)
One developer. Four agents running simultaneously. Eight hours a day. Writing production code + tests + long technical articles. This exact usage pattern was covered by the ~$700/month GitHub Copilot Business subscription. It felt expensive but workable.
After (same workflow, new price)
The identical setup - one person, four parallel agents, full workday - now costs $70-85 per day. Some days higher when contexts stay large and agents do multi-step planning and verification loops. The daily burn rate became the only number that mattered.
This is per person. We are not a giant company. We are a focused team that ships browser extensions, internal tools, and the occasional piece of infrastructure (we're contributors to CRXJS, the Vite plugin that powers a lot of modern extension development). When running one developer at full agentic intensity starts looking like another full-time contractor in AI spend alone, the model needs to be re-examined.
The promise was always "pay a flat fee, get massive leverage." The emerging reality for anyone actually pushing the agentic envelope is "pay for every token the swarm consumes while it's being useful."
4. Why parallel agents destroy flat-rate assumptions
A single senior engineer using Copilot for light autocomplete generates a certain amount of traffic. The same engineer running four agents in parallel - constantly reading large chunks of the codebase, writing plans, executing steps, verifying output, and then writing the accompanying documentation and blog posts - that's a completely different load profile. And that was already our normal.
Agentic workflows love context. They love iterating. They love calling tools and then reasoning about the results. All of that is incredibly valuable when it works - and incredibly expensive when the pricing model still has one foot in the "smart autocomplete" era.
GitHub Copilot (at the time of writing) still sells itself heavily on the idea that you can use it as much as you want within the plan. For normal human usage patterns, that's mostly true. For one person running four agents grinding for eight hours every day, the definition of "normal" no longer applies - even though that was already our established workflow.
We didn't suddenly go harder. We kept doing exactly what we were already doing. The economics just broke underneath it.
5. We're not here to sell you the fix
This post is deliberately light on recommendations. We are not an AI coding tool company. We don't have a model to upsell you. We build browser extensions, we contribute to open source (CRXJS), and we run client projects. AI tooling is infrastructure for us, not a product.
We are still actively looking for a sustainable way to run this level of agentic work without the daily burn rate making the whole exercise questionable. We've tried different model mixes, different local setups, different ways of batching work, different prompting strategies that reduce back-and-forth. Some things help. Nothing has fully solved it yet.
If you're in the same spot - running heavy parallel AI coding sessions and watching the costs climb faster than the output value - we don't have a magic link for you. Just the numbers and the honest admission that we're still figuring it out too.
The only thing we're reasonably confident about is that the current generation of usage-based agentic pricing is going to create a lot more stories like this as more teams move beyond "autocomplete that sometimes chats" and into real multi-agent workflows.
6. What this means if you're shipping with AI right now
If you're a solo founder or small team treating AI agents like force multipliers, do the math early. Track actual daily spend under your real workflow, not the optimistic "we'll use it a bit more" scenario.
The leverage is real. We have shipped things faster, explored more ideas, and written more (and better) long-form technical content than we could have without the current tools. The CRXJS-related posts, the extension architecture deep dives, the case studies - a non-trivial amount of that volume and quality came from having AI partners that could hold context across thousands of lines while we drove the direction.
But leverage without sustainable unit economics is just expensive theater. At some point the question stops being "how much faster can we go?" and becomes "can we afford to go this fast?"
We'll keep experimenting. Different tool combinations, heavier use of local models for some workloads, more disciplined scoping of what actually needs the big frontier models versus what can be handled by smaller, cheaper, or self-hosted ones. If we land on something that feels stable and predictable, we might write about it. For now, this is the situation report from the middle of the experiment.
Related articles
Why We Build Products AND Services
The AI-native agency model and how we actually work
How to Build Chrome Extensions with React, Vite & CRXJS
Practical guide from a team that ships 100+ extension versions
The Real Cost of Building a Custom Browser Extension
Budgets, timelines, and complexity tiers in 2026
CRXJS vs Plasmo vs WXT
Choosing the right Chrome extension framework
Hitting similar walls with AI dev tooling?
We don't have all the answers yet, but we're deep in the same experiments. If you're building real software (especially browser extensions or automation tools) and want to compare notes, we're around.
