The blog post you’re reading is hosted on a private Kubernetes cluster that runs inside my home. Another workload that’s running on same cluster is – Amsterdam Toilet & Urinal Finder (blog post). A web app which I created to help keep Amsterdam streets clean by helping folks find a nearby toilets & urinals.

Since its release was featured on popular web sites like DutchNews: featured on DutchNews

As the availability & uptime of ATUF app, this blog and few other private workloads become even more crucial. I kept thinking how to improve the reliability of these services.

Problem of running apps on a private Kubernetes cluster?

I created my private cloud & Kubernetes cluster with reliability in mind. It’s running on 3 RPI nodes, even if one RPI nodes died, it would continue running as nothing happened. Even if all 3 nodes went out, I created rpi-microk8s-bootstrap project which can be used for automated provisioning & setup of Ubuntu server & MicroK8s on a Raspberry Pi (RPI) node using Terraform. Which allows me to re-create and recover my cluster in matter of minutes.

Data is stored on Synology DS920+ NAS with 32TB of storage (4x8TB with SHR + Btrfs). In this setup, even if one of the disks completely failed, it would continue running as nothing happened. All data is backed up to another Synology DS415+ NAS, with 4 disk array with 1 disk redundancy.

Even in case of total power outage at my home, UPS unit would keep the cluster online for some time. I setup monitoring and I’m alerted on both email & SMS in case any of my sites go down.

But even with all these actions taken, accidents still happen and are a great way to ruin your weekend. If you’re not home or a laptop isn’t close by, SSH-ing to your cluster from smartphone to troubleshoot your app or K8s isn’t the best experience. Not to mention, at the time you might not even be able to do this if you’re at park with your kids, where it’s much more important to take care of them then your K8s cluster and its workloads.

Hence, lately I’ve been thinking a lot what would be the ideal failover and/or DR (disaster recovery) solution that would allow me to recover with minimal time & effort, or maybe even completely move to it, and one solution checked all the set criteria, Cloud Run.

Cloud Build & Cloud Run: A perfect solution for K8s cluster failover & DR?

Cloud Run is an execution environment based on Knative, a serverless platform offered by GCP (Google Cloud Platform). As such it won’t incur any costs if it’s in idle state and unless there’s incoming traffic. It allows you to deploy and run containers without managing the underlying infrastructure and automatically scales your workloads to meet demand. Which sounds like a perfect fit for containerized app that’s utilizing Kubernetes & HPA (horizontal pod autoscaler) for scalability purposes.

Cloud Build on other hand is a (serverless) CI/CD service on GCP which allows you to automate building, testing, and deployment of containerized apps. Something I couldn’t pass on, considering how seamless of an experience it is to build code with Cloud Build and then (automatically) deploy it to Cloud Run.

Automate app deployments (Github to Cloud Run) with Cloud Build

Since I couldn’t find these end to end workflows anywhere on the internet. I created this blog post & Youtube video hoping it could serve as an ultimate guide if you’re thinking of connecting your app hosted on Github repository with Cloud Build & Cloud Run.

For those who don’t want to watch the video and are only interested in Terraform & code part, I’ve created a reference repository.

In above video I’ll go through 2 different scenarios how to do this. For context, app I’ll be using, is a Python Flask web app which is hosted in Github repo: app architecture

Please note: that for each step there will be list requirements which I go through in detail in video above. While some of them seem like a lot of things that need to be done for automated procedures (especially step 2), most of the steps need to be done only once during the initial project setup.

1. Automatically Deploy app hosted in Github to Cloud Run using cloudbuild.yaml (Cloud Build)

Deploy ATUF app hosted in Github to Cloud Run using (cloudbuild.yaml) CloudBuild


Cloud build api quota filter
  • While filtered quota is selected:
    • 5. Click on edit “Edit Quotas” button
    • 6. Fill out request, example:
Cloud build API quota request

After quotas limits are requested you’ll get a confirmation email and it may take up to 2 days for quotas to be increased. In my case they were raised in less then a day, but your mileage might vary.

Example cloud build API quota request email
  • 6. Create “cloud-sa” service account
    • add roles under “Define additional roles to assign to cloud-sa” terraform:
      • roles/iam.serviceAccountUser – role needed for Cloud Build & Cloud run
      • roles/logging.logWriter – needed for logging
      • roles/artifactregistry.admin – needed for “Authenticate with GCP Artifacts Registry” with cloudbuild.yaml to work
      • roles/run.developer – needed for cloud run to work
      • roles/run.admin – needed for “Allow public (unauthenticated) access” with cloudbuild.yaml to work
  • 7. Create Artifact repository

2. Automatically deploy app hosted in Github to Cloud Run using Cloud Build (cloudbuild.yaml) & avoid any “ClickOps” with Terraform

Deploy ATUF app hosted in Github to Cloud Run using CloudBuild (cloudbuild.yaml) and avoiding any "ClickOps" with Terraform


Terraform steps (referenced in repository)

After requirements steps have been completed, perform the following Terraform steps:

  • Update terraform.tfvars with values you want to use
  • terraform init
  • terraform plan (optional)
  • terraform apply -auto-approve
  • Perform steps in to import Cloud Run service resource originating from cloudbuild.yaml so it could be managed by Terraform and i.e destroyed along with other resources.

Happy hacking & if you found this useful, consider becoming my GitHub sponsor!