Blog

  • Scaling GitHub Actions, Docker Caching, and Smarter Security Scans — Lessons From My Home Lab

    Over the last few weeks, my development environment has been a bit of a rollercoaster—in a good way. In my previous blog, I talked about experimenting with GitHub Actions runners. That journey has evolved from running random Ubuntu runners → running runners inside VMs → running them in Kubernetes → and now testing out GitHub ARC (Actions Runner Controller) to dynamically request runner capacity on demand.

    Honestly, it’s been a blast. But along the way, I hit some real-world challenges that mirror the same ones I see in the enterprise security space.


    Centralized Security Scanning + Hitting Rate Limits

    In my home lab, I run what I call Centralized GitHub Actions:

    • pull security scanning Docker images
    • run the scans
    • send results to DefectDojo
    • repeat this across multiple repos

    Since I love experimenting with open-source security tooling, I try to run just about every scanner I can get my hands on.

    And then I hit the wall.

    Docker pull limits. Pip download limits. Everything.

    Docker Hub’s limits are designed for normal users. I am not a normal user when it comes to automated pulls:

    • Unauthenticated Docker Hub pull limit: ~100
    • Authenticated: ~200
    • GitHub-hosted runners get special higher limits (but only when the request originates from GitHub’s infra)
    • On-prem GitHub Actions runners get none of those benefits

    So I burned through my pull quota constantly.


    Exploring Solutions to Rate Limits

    In enterprise environments, people usually take one of three approaches:

    1. Use AWS ECR or another paid cloud registry

    You pay for it, but you get predictable throughput and higher pull limits.

    2. Use a vendor-managed registry appliance

    Good for enterprise scale. Not worth it for a home lab.

    3. Build a Docker Proxy Cache (the path I picked)

    A Docker proxy cache isn’t a full registry—it’s more like a caching reverse proxy:

    • First pull: fetch from Docker Hub and store locally
    • Subsequent pulls: instant, local, no external rate limit hit

    I deployed mine per Kubernetes cluster at first, got it mostly working, and then moved toward centralizing it so all runners call the same cache endpoint.

    Then pip started complaining about SSL certificates…
    …so, I fixed that too.

    Now it mostly works—but I still want to revisit the idea of a true self-hosted registry with caching capabilities. Not sure if it’s worth the complexity yet.


    Enterprise Problems Show Up in Home Labs

    One eye-opening lesson:
    The exact same problems I see in enterprise CI/CD show up in my personal environment.

    Why?

    Because rate limits, scanner behavior, and Docker pull patterns don’t care whether you’re a Fortune 500 or a guy in his garage. The constraints are identical.


    Rethinking How We Run Security Scans

    Running every scanner on every pull request sounded awesome… until I tried it.

    Security scans are computationally expensive. Some scanners take minutes; some take forever. Running everything on every PR is:

    • wasteful
    • slow
    • not developer-friendly
    • and honestly not necessary

    A more practical pattern is emerging:

    1. Base Scans → Run on a Cron Schedule

    This keeps a high-level view of system health.

    2. PR Scans → Run selectively

    Only run the scanners that add value during development.

    3. Adaptive Scans → Run based on the diff

    Imagine:

    • If secrets are detected → run secret scanners
    • If Dockerfile changes → run container hardening checks
    • If a lot of files changed → bump up scan intensity

    This “context-aware scanning” is something I want to explore. Developers could even toggle this:

    • “Always run full security scans for my PRs.”
    • “Run lightweight scans unless something looks suspicious.”

    That flexibility is powerful.


    Running Scans Outside CI Jobs Using Webhooks

    One of the coolest things I’ve rediscovered:
    GitHub Webhooks let you run security scans outside the CI job entirely.

    That means:

    • CI stays fast
    • scanners can run asynchronously
    • failures don’t block merges
    • logs stay out of the GitHub Actions UI clutter

    When I was first setting up ARC, I noticed that every DefectDojo upload job appeared in the GitHub Actions queue—even though they didn’t belong there.

    This made it obvious:

    CI jobs should not handle everything.

    Sometimes you want:

    • CI job finishes →
    • a webhook triggers →
    • async scanners run somewhere else →
    • results go to DefectDojo
    • developers stay unblocked

    This is something I want to build into my workflow logic.


    To-Do List From This Work

    A few tasks emerged from this whole process:

    • Improve centralized Docker caching
    • Explore hybrid scanning (cron + PR-based + adaptive)
    • Build logic to run certain scans outside the CI job
    • Add a PR flag to allow developers to request extra scans
    • Clean up DefectDojo upload jobs to avoid cluttering the CI timeline
  • How I Built Automatic WordPress Failover Between On-Prem and AWS Using Coolify and Route53

    Running WordPress on-premise is great for control and performance, but if your home lab or self-hosted hardware goes down, your website shouldn’t go offline with it. In this guide, I’ll show you how I built a fully automated failover setup where traffic seamlessly moves from an on-prem server to AWS — and back — with no downtime.

    The best part?
    It uses tools you’re probably already familiar with:

    • Coolify (on-prem & AWS)
    • Route53 DNS failover
    • A custom health check endpoint
    • A lightweight PHP status script
    • Zero manual intervention once deployed

    Here’s how it works.


    🚀 Architecture Overview

    For this example, we’ll use a fake domain:

    examplefailover.com
    

    And two WordPress servers:

    EnvironmentLocationIP Address
    PrimaryOn-Prem10.10.10.10
    FailoverAWS EC2203.0.113.25

    DNS is hosted in Route53, and SSL certificates are issued using the Route53 DNS-01 method.

    Here’s the traffic flow:

    1. Route53 checks the on-prem server’s health using a dedicated endpoint.
    2. If healthy → traffic goes to on-prem.
    3. If unhealthy → traffic automatically fails over to AWS.
    4. When restored → traffic automatically returns to on-prem.

    You get real-world high availability without running load balancers or Kubernetes.


    🔧 Step 1: Configure SSL on Both Coolify Instances

    Both Coolify deployments (on-prem & AWS) need valid SSL certificates for:

    examplefailover.com
    www.examplefailover.com
    

    Inside each Coolify instance:

    1. Open Settings → Domains & SSL → ACME DNS Providers
    2. Add your Route53 IAM credentials
    3. Add domains to your WordPress app:
      • examplefailover.com
      • www.examplefailover.com
    4. Click Enable SSL

    Using DNS-01 validation means both servers can generate certificates no matter which one DNS currently points at.


    🔧 Step 2: Create a Reliable Health Check Endpoint

    Using your homepage for health checks is risky. WordPress crashes, plugin errors, or PHP upgrades can accidentally trigger failover.

    The fix is to build a dedicated health check endpoint that bypasses WordPress entirely:

    https://examplefailover.com/healthcheck/index.php
    

    Create the directory inside WordPress’s volume:

    mkdir -p /var/lib/docker/volumes/<YOUR_VOLUME_NAME>/_data/healthcheck
    

    Add a smart PHP health script:

    /healthcheck/index.php

    <?php
    header('Content-Type: application/json');
    
    $server_type = getenv('SERVER_TYPE') ?: 'unknown';
    
    $response = [
        "status" => "healthy",
        "server" => $server_type,
        "hostname" => gethostname(),
        "ip" => $_SERVER['SERVER_ADDR'] ?? 'unknown',
        "time" => date('Y-m-d H:i:s'),
    ];
    
    echo json_encode($response);
    

    Tag each environment in Coolify:

    On-Prem:

    SERVER_TYPE=on-prem
    

    AWS:

    SERVER_TYPE=aws-failover
    

    Now loading the endpoint returns clear JSON:

    {
      "status": "healthy",
      "server": "on-prem",
      "hostname": "coolify-primary",
      "ip": "10.10.10.10",
      "time": "2025-12-07 14:33:12"
    }
    

    This helps you debug and verify which server is responding.


    🔧 Step 3: Create a Route53 Health Check

    Go to:

    Route53 → Health checks → Create health check

    Use:

    • Protocol: HTTPS
    • Domain: examplefailover.com
    • Path: /healthcheck/index.php
    • Port: 443
    • Request interval: 30 seconds
    • Failure threshold: 3
    • Optional string matching: healthy

    If the endpoint fails, Route53 marks the server as UNHEALTHY.


    🔧 Step 4: Set Up DNS Failover in Route53

    You will create two A-records for each domain, a primary and a secondary.

    Root domain (examplefailover.com)

    Primary (on-prem)

    • Type: A
    • Value: 10.10.10.10
    • Routing policy: Failover → Primary
    • Health check: Use the one created above

    Secondary (AWS)

    • Type: A
    • Value: 203.0.113.25
    • Routing policy: Failover → Secondary
    • Health check: None

    Repeat the same for www.examplefailover.com.

    This ensures both root and www domain fail over correctly.


    ✔️ Failover Behavior Explained

    Normal Operation

    Healthcheck OK → Route53 routes traffic to on-prem (10.10.10.10)
    

    On-Prem Fails

    Healthcheck FAIL → Route53 routes traffic to AWS (203.0.113.25)
    

    On-Prem Recovers

    Healthcheck returns OK → Route53 routes traffic back to on-prem
    

    Visitors experience zero downtime — it’s seamless.


    💡 Why This Setup Works So Well

    • Zero cloud load balancers required
    • No need for highly available networking gear
    • Coolify deploys identical apps in both environments
    • DNS-01 SSL validation avoids certificate conflicts
    • Dedicated health endpoint avoids WordPress false positives
    • Route53’s global health check network ensures accuracy
    • Failover is fast and automatic

    This approach gives you “cloud-level high availability” with simple, inexpensive tools.


    🎉 Conclusion

    Pairing Coolify with Route53 failover lets you build a robust, self-healing WordPress environment without complex infrastructure. Whether you’re self-hosting for fun or running a real production site, combining:

    • on-prem hardware
    • AWS failover
    • automated SSL
    • a dedicated health check
    • and smart DNS logic

    allows your site to stay online under almost any circumstance.