So... I built a Docker Swarm Cluster

So... I built a Docker Swarm Cluster

I've been hungry lately. Thirsty, even. Salivating at the idea of High Availability for Fifthdread Services. I had two "good" servers running over 100 docker containers, all running self-hosted services such as VaultWarden, Plex, Jellyfin, Matrix Synapse (Element Chat), Mumble, and a ton of other websites that I host for both me and my "customers". Those servers? They sometimes gotta reboot. And when they reboot, they bring down all those services with em.

šŸ‘Ž
"Unacceptable" - Fifthdread


"Unacceptable", I thought to myself. How could I allow Fifthdread Services, the most vital of services to everyone's lives, to go down even a little? It's unthinkable. That's when the dedicated team at 5DS went to work. They received a direct order to solve the problem, and solve they did.

The team discovered something fantastic: It was possible to run containers across something called a "cluster" of machines. And this cluster? It could "self-heal" in a way. If one machine in the cluster went down, it could pass the containers it was running to another machine in the cluster, making sure that no service could be down for more than seconds. The users wouldn't even know anything happened. There were a few technologies that could do this, but we decided Docker Swarm was the one we would use.

Exhibit A: A Baby PC

Observe the photograph above. What does it say to you? Does it make you mutter such things as "awwww" or perhaps it makes you question what a computer is at all? Indeed, this "Mini PC" may look small, but it'll show you that size doesn't matter. It has it where it counts, with 32gb RAM and 500g M.2 SSD out the box, not to mention the Ryzen 8 core 16 thread CPU, it's sort of a mini-BEAST.

With the power of buying 4 of them, suddenly you have a cluster.

Clustered

Now we're talking. With Docker Swarm, all of Fifthdread Services has achieved "High Availability". After rigorous testing, the Swarm will allow for uptime numbers the likes of which have never been seen before. At this very moment, practically all of the major services have moved over, and are now running within the Docker Swarm. If you're reading this right now, you're being served this article from it.

Of course, technically there's more that could be done. I don't have redundant WAN for example, but one step at a time. This is still leagues better than what was there previously. Regardless, Docker Swarm is very impressive, and I'd definitely recommend it to the home-lab or self-hosted community, provided you're as patient, stubborn, and uncompromising as I am.

Now that Fifthdread Services has achieved High Availability (or at least, higher availability) I can rest easy knowing that our loving customers can put their trust in the unwavering downtime of their most essential services. Never again will we receive tickets such as "Element down?". It helps us sleep at night. Of course, we aren't immune to someone forgetting their Element password...