Porting to Workers

2024-01-28

This website is now using Cloudflare Workers!

I went a little bit crazy over the weekend and decided to finally make the move from the three-region Fly.io deployment to Workers that run at the edge. This does mean I'm abandoning the gossip protocol based content cache dissemination implementation I was working on for the Fly.io deployment, but I still think the idea is valuable, and I'll try to find something else to add it to.

Choosing how to build this site on Workers was a bit tricky. My goal continues to be to overengineer what it ostensibly a static site for fun and negative profit. Thus, I completely ignored Cloudflare Pages. I wanted to keep a tiered caching strategy that I tried in the initial version of the Fly.io version, and it would be easier given I could leverage Cloudflare's caching systems. Ideally I want to regenerate pages from source when things change, but I also have to pay attention to how long it takes to execute a function when called. With Fly.io, I wanted fast page generation, but I didn't really care hold long it took since the CPU time wasn't limited in any way. Another limitation is that Workers don't really have a filesystem, so I'd have to leverage the various offerings available to store files. With Fly.io, I built the Docker image with the posts bundled in, which was great for speed but felt a little less than ideal since there was no clear separation between publishing a post and deploying changes to the backend.

So first, post storage! With Worker size limits, I decided to go with storing posts in their raw (.org or .md) format in an R2 bucket. While R2 is good for scaling storage, it does break my "live on the edge" mindset. So I needed a way to distribute the content to the edge, avoiding the Workers needing to reach into a region further away.

After some consideration, I scrapped the idea of generating and storing the result for other Workers on the fly and looked at the Queue option instead. The plan was to pre-render the content and store it somewhere (more on that later) so I can very quickly render content in the background when something is published. When a file is pushed to R2, I can fire of a webhook that queues up the new or changed files for rendering and storing on the edge. It does seem to introduce a little more latency when it comes to publishing content, but in reality it's faster because it doesn't require me to rebuild, push, and restart a container image.

Where to store the rendered content stuck with me for a bit. Initially I wanted to go with KV, since it seemed it would be faster, but I found after some experimentation it was substantially slower since there's no way to easily sort the keys based on content without reading everything into memory and then sorting during Worker execution. Thankfully, I could reach for a real database, and created a D1 instance to hold a single table with the posts. It being SQLite based, I can just use SQL for the queries and take advantage of much more optimised codepaths for sorting or fetching the data I actually need. While D1 doesn't currently replicate, it will be a huge speed boost when it is!

Note: this section originally said that D1 replicates. I was then told and disovered this is not the case at the moment. Whoops.

The workflow thus far is

A post is pushed to R2
A webhook is sent to a Worker (not by R2)
The worker fetches the list of files from R2 and queues them for "indexing"
Workers are executed to consume the queue, rendering the files and storing them in D1

The final piece is telling the Worker it can cache all the responses in Cloudflare's cache, and we're all set! Each response is cached for 4 hours before a Worker has to be hit to fetch the content from D1 again.

Of course, it wasn't all smooth sailing, which was mildly expected since D1 and Queues are both still in beta, and I'm using Rust, which, while having first party support through the workers-rs library, has some rough edges on Workers. The biggest one for my implementation is the inability to run the Worker locally when Queues are in use. Generally speaking my local development flow for this project needs improvement, and will be worked on, but the inability to run it locally with Queues (or even with --remote at the moment) makes iterating slow. I do believe this will improve over time, and when I hit blockers I plan to contribute back to the workers-rs library, but what I've needed to use works great.

As per usual, I'm planning to continue iterating on the site, and hopefully make it faster and smaller over time. I do of course have the source available in my Forgejo repo, although there are still a few thing I need to do like implementing the actual CI/CD for publishing and webhooking. Feedback is welcome of course! Feel free to poke me on the federated universe.