Deploying web apps
Deploying a website is harder than it might first appear. Even a simple html page with a single Javascript or CSS file is a distributed system prone to race conditions. A âmodernâ Single Page Application with code splitting is worse.
Consider our simple HTML page. We have v1
deployed to our live site. It depends on a couple of .js
and .css
files. A visitor requesting our page will get v1
, which will in turn trigger requests additional files, which will also be v1
. All good so far!
Now weâve started to deploy v2
. A visitor comes along a split second before our deploy and gets v1
of the page. In the time it takes for their browser to get the HTML and send requests for the rest of the files, v1
has been replaced with v2
.
What happens now?
Maybe nothing! Depending on which files we changed, and how backwards and forwards compatible those changes are, there may be no user perceivable impact.
Or, it could fail with varying levels of severity, from minor styling glitches to a Javascript exception resulting in a dead, white page.
âBut it mostly works most of the timeâ, you say. âThey just need to refresh and weâll call it a caching bug!â
It gets worse.
A Single Page Application can run for days without reloading the page. If you take no measures to force a reload (politely, I hope) when you deploy a new version, it could persist across several updates. The entire time itâs active, it may expect to be able to lazy load arbitrarily old versions of your assets (not to mention hitting other API endpoints).
Itâs a problem. What are our options?
Cache busting?
For starters, we can add a version to the names of our static assets. When a user gets v1
of the page, they send a request for main.v1.js
1. Except this happened just after we deploy v2
, so now they get 404. Which is more likely to reliably break things than a subtle version mismatch.
Have we made it worse?
Use a CDN?
We keep the cache buster file names, but put a CDN in front of our web server. Now if a v1
page requests main.v1.js
theyâll get it from the CDN, right? Well, hopefully. If their edge location happens to have a copy, and youâve set the caching headers properly so it doesnât try to fetch it from your freshly updated server. It may work for you personally, since youâll be constantly refreshing the page and priming your local cache.
Great! Weâve managed to get back into intermittently broken territory.
Consolidate assets
In You Donât Want Atomic Deploys, Kevin Cox distinguishes between âEntriesâ and âAssetsâ. Entries are the stably named locations that visitors access directly. hello.html
is an entry point, and visitors need to be able to navigate to it without knowledge of the current version. âAssetsâ are all of the files that âEntriesâ depend on, such as .js
/.css
files.
The key insight to solving this problem robustly is that all assets must remain available for as long as an entry point might request them.
The simplest way Iâve found to accomplish this goal is to upload static assets to a shared file store like S3, B2 or even a server you host yourself (also, keep the CDN!). Multiple versions of the same file must be able to coexist (the cache busted file names help), and you MUST do this before you deploy any of your entry points.
Thatâs it. Not actually very complicated. Just tedious.
Cleaning up
In terms of making a race condition filled space predictable and deterministic, this technique is pretty solid. It does have a bit of upfront setup cost, and it makes deploys a little less⊠discrete. Some day you also may want to garbage collect really old versions from the shared file storage.
It also does not address API versioning problems. However, there are plenty of documented approaches here. None of them are silver bullets, but there are at least plenty of ideas floating around to work with.
How is everyone else handling this?
Itâs hard to say. I remember first reading about this at least ten years ago, though I havenât been able to figure out where. I keep looking for resources to share with others as I try to explain the problem, but thereâs actually very little I can find written down2. I can only assume that places sophisticated enough to have solved this also donât see it as worth talking about.
Good: Flyâs static asset caching
One notable exception is Fly.ioâs static asset caching. When you deploy a container, you can specify folders inside your container to be served up directly from their edge caching servers. Hereâs what they say in First look: static asset caching:
And we keep a few versions around so you donât get blank pages mid deploy, people who request outdated static URLs will get what they expect.
I havenât thoroughly tested this. It also doesnât completely solve the problem for arbitrarily long lived SPA clients. But still, this sounds both awesome and unique.
Mixed: Static site hosting services
Vercel, Netlify, Render and many other similar services advertise their âatomic deploysâ and âinstant cache invalidationâ. This sounds great, but at best it only partially solves the problem outlined in this post. To be clear, I havenât tested each individual service, but I havenât found any documented indication that any of them take measures to avoid it.
I did test Vercel, which (as of this writing) hosts this site. To test, I opened a page and grabbed a cache busted .css
URL. Then I pushed a minor update to the file, waited for it to deploy and got a 404 back when I re-requested the file. No joy.
So. In this specific case? My site is very static, has no Javascript and very little CSS. It even renders completely fine without CSS. Vercel is super convenient, so for now I choose to live dangerously and accept the risk.
Further Reading
- You Donât Want Atomic Deploys (Kevin Cox)
- Static assets in an eventually consistent webapp deployment (REA Group)