TJ Hoag

From Chaos to Production

TJ
Timothy J. Hoag
IAM & IT Operations Specialist

When Homelabs Get Messy

Let me paint you a picture. It's Tuesday night at 11 PM. My Home Assistant automation has ceased working. Once more. I realize the difficulty is with Docker containers, but here's the thing: I have 31 containers in who knows how many compose files, and I really can't remember which one includes Home Assistant.

So I'm grepping through my entire Docker directory like some kind of digital archaeologist, finding secrets hardcoded in random places (past me was not smart), containers on different networks that can't talk to each other, and port conflicts that make zero sense.

Sound familiar?

That was my reality three weeks ago. Now? Same 31 containers, but organized into 7 clean stacks with proper secrets management, unified networking, and security practices I'm actually proud of.

Let me show you how I pulled this off without losing my mind (mostly).

The Mess I Created

Here's what my Docker setup looked like before I fixed it:

  • 31 containers doing their thing across 1 compose files that made sense at the time (they didn't)
  • Secrets everywhere - API keys in compose files, passwords in plain text, the works
  • Networking chaos - some containers on host networking, others on random bridges, none of them playing nice together
  • Zero documentation except whatever sleepy me thought was helpful at 2 AM
  • Troubleshooting? Good luck. "I know Pi-hole is in here somewhere..."

The wake-up call came after I moved to fresh compose files. Half of my services wouldn't start because they couldn't discover each other on the network. That's when I decided to quit covering up the holes with band-aids and really mend it.

The Four Critical Failures

After the migration, I faced four major issues that needed immediate attention:

  1. Network Subnet Conflicts - My Docker networks were fighting with my home network's subnet range. Containers couldn't communicate, and some services thought they were on the wrong network entirely. Solution: Moved all Docker networks to the 172.18.0.0/16 range to avoid conflicts with my 192.168.x.x home network.
  2. Firewall Blocking Docker Traffic - My host firewall was blocking container-to-container communication. Even though containers were on the same network, the firewall rules weren't allowing the traffic through. This one took me way too long to figure out because everything looked right in the Docker configs.
  3. Container Authentication Failures - Several services (particularly Authelia and Bitwarden) were failing to authenticate after migration because their session secrets and encryption keys didn't carry over. I had to properly regenerate and configure these in environment files instead of hoping Docker would remember them.
  4. Broken Service Dependencies - Services that depended on each other (like Home Assistant depending on Mosquitto MQTT) were starting in the wrong order or couldn't find each other. Fixed this with proper Docker Compose dependency chains and unified networking so containers could resolve each other by name.

These failures forced me to implement proper solutions instead of quick fixes. Which brings me to...


How I Fixed It

I broke everything into 7 purpose-built stacks. Each one has a specific job, and they all play nice together. Here's the breakdown:

1. Core Stack - The Foundation (5 services)

This is the stuff that manages everything else. If this stack is down, I'm having a bad day:

  • Portainer - Because clicking buttons is easier than memorizing Docker commands
  • Nginx Proxy Manager - Handles all the reverse proxy magic and SSL certificates
  • Homepage - My personal dashboard for everything
  • Dozzle - Real-time logs, because docker logs -f gets old fast
  • Homelab Docs - Where I document things I'll definitely remember later (I won't)

This stack also creates the main network that everything else connects to:

networks:
  homelab:
    driver: bridge
    ipam:
      config:
        - subnet: 172.18.0.0/16

Why this matters: Instead of each stack doing its own networking thing, they all share one network. Services can find each other by name, troubleshooting is way easier, and I'm not constantly fighting subnet conflicts at midnight.

2. Security Stack (3 services)

This is where all my passwords and authentication live:

  • Bitwarden Lite - My password manager (official lightweight version, not the heavier Vaultwarden)
  • Vaultwarden Nginx - SSL wrapper for Bitwarden with Cloudflare certificates
  • Authelia - Single sign-on so I'm not typing passwords all day

Here's the security setup:

vaultwarden-nginx:
  networks:
    homelab:
      ipv4_address: 172.18.0.100  # Static IP for SSL cert validation
  volumes:
    - ./cloudflare-origin.crt:/etc/nginx/ssl/cert.crt:ro
    - ./cloudflare-origin.key:/etc/nginx/ssl/cert.key:ro

All sensitive data moved to bitwarden-settings.env:

env_file:
  - ./bitwarden-settings.env

Real talk: I used to have SSL certs committed to git. Don't be like old me. Use .env files and keep them out of version control.

3. Network Stack (3 services)

The behind-the-scenes networking magic:

  • Pi-hole - Blocks ads network-wide (because ads are the worst)
  • Cloudflared - Secure tunnel to access my stuff remotely without opening ports
  • Smokeping - Shows me when my internet is being garbage

Network setup that actually works:

pihole:
  ports:
    - "53:53/tcp"
    - "53:53/udp"
  cap_add:
    - NET_ADMIN
  dns:
    - 127.0.0.1  # Uses itself as DNS
    - 8.8.8.8    # Fallback

The Cloudflare tunnel token is stored in network.env:

env_file:
  - network.env
environment:
  - TUNNEL_TOKEN=${CLOUDFLARE_TUNNEL_TOKEN}

4. Monitoring Stack (7 services)

Because you can't fix what you can't see:

  • Prometheus - Enterprise-grade metrics collection and time-series database
  • Grafana - Beautiful dashboards for visualizing all those metrics
  • Uptime Kuma - Tells me when services go down (usually at 3 AM)
  • Dockpeek - Pretty graphs of what my containers are doing
  • Portracker - Shows me what ports everything is using
  • Dockmon - Resource monitoring that doesn't make my eyes bleed
  • Tugtainer - Another Docker dashboard because apparently I collect them

Why monitoring gets its own stack: When everything else is on fire, I need my monitoring to still work so I can figure out what's burning.

Prometheus + Grafana setup:

prometheus:
  image: prom/prometheus:latest
  volumes:
    - ./prometheus/config:/etc/prometheus
    - prometheus-data:/prometheus
  command:
    - '--config.file=/etc/prometheus/prometheus.yml'
    - '--storage.tsdb.retention.time=30d'
  networks:
    homelab:

grafana:
  image: grafana/grafana:latest
  volumes:
    - grafana-data:/var/lib/grafana
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}
  networks:
    homelab:

Now I have real-time metrics on CPU, memory, disk I/O, network traffic, and container health across all 31 services. Custom dashboards show me exactly what's happening at a glance, and alerts notify me before small issues become big problems.

All containers have read-only Docker socket access:

volumes:
  - /var/run/docker.sock:/var/run/docker.sock:ro

5. Automation Stack (7 services)

Where the magic happens:

  • Home Assistant - Automates my entire house (lights, climate, you name it)
  • Mosquitto - MQTT broker that everything talks to
  • n8n (production) - My automation workflows that actually matter
  • n8n-beta - Where I break things before they hit production
  • Ollama - Local AI models running on my hardware
  • Open WebUI - ChatGPT-style interface for Ollama
  • Corrade - Second Life Bot

Secret sauce:

n8n:
  env_file:
    - automation.env
  environment:
    - N8N_HOST=${N8N_HOST}
    - N8N_PROTOCOL=${N8N_PROTOCOL}
    - WEBHOOK_URL=${N8N_WEBHOOK_URL}

The automation.env file contains:

N8N_HOST=n8n.yourdomain.com
N8N_PROTOCOL=https
N8N_WEBHOOK_URL=https://n8n.yourdomain.com/webhook
OPENWEBUI_SECRET_KEY=<generated-secret>

Life lesson: Having separate production and beta instances saved me so many times. Test your crazy automation ideas in beta first. Future you will be grateful.

6. Backup Stack (2 services)

The insurance policy for when I inevitably break something:

  • Backvault - Orchestrates all my backups
  • Duplicati - Encrypted backups to the cloud

Here's how I do volumes:

duplicati:
  volumes:
    - ../duplicati/config:/config
    - ../duplicati/backups:/backups
    - /home/tjthetech/docker:/source/docker:ro  # Read-only source
    - /mnt/storage:/source/storage:ro
    - /mnt/storage/duplicati-backups:/backups-storage

Important: Those :ro flags mean read-only. Learned this the hard way after accidentally deleting source data during a backup. Don't be like me.

Backup encryption keys in backup.env:

env_file:
  - backup.env
environment:
  - SETTINGS_ENCRYPTION_KEY=${DUPLICATI_SETTINGS_ENCRYPTION_KEY}
  - CLI_ARGS=--webservice-password=${DUPLICATI_WEBSERVICE_PASSWORD}

7. Utilities Stack (2 services)

The stuff that doesn't fit anywhere else but I still need:

  • Ntfy - Push notifications to my phone
  • Composetoolbox - Visual editor for Docker Compose files

Getting Secrets Out of My Compose

This was probably the biggest "wow I was doing this wrong" moment of the whole project.

What I was doing (embarrassing):

services:
  myapp:
    environment:
      - API_KEY=sk_live_abc123def456  # Yep, API keys right in the file
      - DATABASE_PASSWORD=mypassword123  # Super secure, I know

What I'm doing now (better):

services:
  myapp:
    env_file:
      - myapp.env
    environment:
      - API_KEY=${API_KEY}
      - DATABASE_PASSWORD=${DATABASE_PASSWORD}

My .env file strategy:

  1. Each stack has its own .env file (e.g., automation.env, backup.env)
  2. All .env files are in .gitignore
  3. I created .env.template files with placeholder values for documentation
  4. Actual secrets are stored in Bitwarden

Example .env.template:

# N8N Configuration
N8N_HOST=your-domain.com
N8N_PROTOCOL=https
N8N_WEBHOOK_URL=https://your-domain.com/webhook
OPENWEBUI_SECRET_KEY=generate-with-openssl-rand-hex-32

# Duplicati Backup
DUPLICATI_SETTINGS_ENCRYPTION_KEY=generate-strong-key
DUPLICATI_WEBSERVICE_PASSWORD=strong-password-here

One Network to Rule Them All

Instead of every stack creating its own network (which was causing me endless headaches), I simplified everything:

  1. Core stack creates the network
  2. Everything else just joins it
  3. Static IPs only when absolutely necessary
  4. Services talk to each other using container names
# In docker-compose.core.yml
networks:
  homelab:
    driver: bridge
    ipam:
      config:
        - subnet: 172.18.0.0/16

# In all other docker-compose files
networks:
  homelab:
    external: true

Why this is great:

  • Services just work together without fighting
  • No more port mapping hell for internal stuff
  • Troubleshooting is actually possible now
  • Everything stays separated from my host network

Making Firewall Rules Stick

Remember that firewall issue I mentioned? The one that was blocking Docker container traffic? Yeah, that was a pain. Even after I figured out the right iptables rules, they'd disappear after a reboot. Not ideal.

The problem: Docker creates its own iptables rules, but my host firewall was interfering with container-to-container communication. I needed rules that would:

  • Allow Docker containers to communicate on the homelab network (172.18.0.0/16)
  • Persist across reboots
  • Not conflict with Docker's automatic rule management

The solution - persistent iptables rules:

# Allow Docker network traffic
sudo iptables -I INPUT -s 172.18.0.0/16 -j ACCEPT
sudo iptables -I FORWARD -s 172.18.0.0/16 -j ACCEPT
sudo iptables -I FORWARD -d 172.18.0.0/16 -j ACCEPT

# Save the rules
sudo iptables-save | sudo tee /etc/iptables/rules.v4

# On Ubuntu/Debian, install iptables-persistent
sudo apt install iptables-persistent

# Rules now survive reboots

Why this matters: Without persistent rules, every reboot meant manually re-adding firewall exceptions or debugging why containers suddenly couldn't talk to each other. Now the rules stick, and my containers can communicate reliably.

How I Handle Ports Now

I used to just throw ports at the wall and see what stuck. Now I actually have a system:

Stuff anyone can access:

ports:
  - "8123:8123"  # Home Assistant - family uses this
  - "3000:3000"  # Homepage - my dashboard

Admin stuff (Tailscale VPN only):

ports:
  - "100.100.40.56:5678:5678"  # n8n - just me
  - "100.100.40.56:9000:9000"  # Portainer - definitely just me
  - "100.100.40.56:8181:81"    # NPM Admin - also just me

Why this matters: Those admin interfaces are only accessible through my Tailscale VPN, even if my firewall has a bad day. Defense in depth, baby.


How I Organized Everything

My file structure now actually makes sense:

~/docker/
├── docker-compose.automation.yml
├── docker-compose.backup.yml
├── docker-compose.core.yml
├── docker-compose.monitoring.yml
├── docker-compose.network.yml
├── docker-compose.security.yml
├── docker-compose.utilities.yml
├── automation.env
├── backup.env
├── bitwarden-settings.env
├── monitoring.env
├── network.env
├── .env.template (one for each stack)
├── .gitignore
├── authelia/
│   ├── config/
│   └── secrets/
├── bitwarden-lite/
├── duplicati/
│   ├── config/
│   └── backups/
├── home-assistant/
├── n8n/
├── n8n-beta/
├── nginx-proxy-manager/
│   ├── data/
│   └── letsencrypt/
├── ollama_data/
├── pihole/
│   ├── etc-pihole/
│   └── etc-dnsmasq.d/
├── portainer/
└── [everything else...]

The rules I follow:

  1. All compose files at the top level so they're easy to find
  2. Each service gets its own directory for data
  3. Secrets in .env files that never touch git
  4. Template files so I remember what goes where

How I Actually Start Everything

Starting the whole setup is pretty straightforward now:

# Core stack first (makes the network everyone else needs)
docker compose -f docker-compose.core.yml up -d

# Then everything else - doesn't matter what order
docker compose -f docker-compose.security.yml up -d
docker compose -f docker-compose.network.yml up -d
docker compose -f docker-compose.monitoring.yml up -d
docker compose -f docker-compose.automation.yml up -d
docker compose -f docker-compose.backup.yml up -d
docker compose -f docker-compose.utilities.yml up -d

Or if you're lazy like me, use a script:

#!/bin/bash
STACKS=(core security network monitoring automation backup utilities)

for stack in "${STACKS[@]}"; do
  echo "Starting $stack stack..."
  docker compose -f docker-compose.$stack.yml up -d
done

Takes about 2 minutes to bring up all 31 containers. Not bad.


Troubleshooting: Before and After

Before (pain):

  • "Where is that service?" - Grep through 5 different files, give up, search my chat history with Claude
  • "Why can't containers talk?" - Mixed networking, subnet conflicts, random bridge networks
  • "What's using port 8080?" - Port conflicts everywhere because I have no system

After (actually manageable):

  • Know exactly which stack has what - "It's in automation stack, duh"
  • Containers just communicate - Same network, no drama
  • Port conflicts don't exist - Planned it out like an adult
  • Dozzle shows all logs - One place instead of 31 terminal windows

What This Demonstrates for IT

This project showcases skills directly applicable to Systems Admin and IT Operations positions:

Infrastructure as Code

  • Docker Compose for declarative infrastructure
  • Version-controlled configuration
  • Reproducible deployments

Security Best Practices

  • No hardcoded secrets
  • Environment-based configuration
  • Network segmentation
  • Principle of least privilege (read-only mounts, limited capabilities)

Systems Thinking

  • Logical service grouping
  • Dependency management
  • Network architecture design
  • Scalability considerations

Documentation

  • Clear file organization
  • Template files for onboarding
  • Inline comments for complex configurations

Problem Solving

  • Identified root cause (poor organization)
  • Designed solution (stack architecture)
  • Implemented systematically
  • Validated results

What I Actually Learned

  1. Fix your networking first - Seriously. Get this right and everything else becomes so much easier. I wasted hours troubleshooting container issues that were just network problems.
  2. Secrets management isn't optional - Even in a homelab. Practice good habits now so you don't accidentally commit your API keys to GitHub later (ask me how I know).
  3. Organization saves your sanity - Future you at 2 AM trying to fix something will thank present you for being organized.
  4. Document everything - Those .env.template files? Worth their weight in gold when I need to remember what environment variables something needs.
  5. One thing at a time - I migrated one stack at a time, made sure it worked, then moved on. Trying to do everything at once is how you end up with nothing working.

The Results

  • 31 containers running across 7 logical stacks
  • Enterprise-grade monitoring - Prometheus + Grafana with custom dashboards and real-time metrics
  • Zero hardcoded secrets - all in environment files
  • Unified networking - single bridge network, container name resolution
  • Persistent firewall rules - iptables configuration that survives reboots
  • Clear security boundaries - admin services behind VPN, public services exposed
  • 5-minute deployments - bring up entire infrastructure with a script
  • Easy troubleshooting - know exactly where to look for any service
  • Production-ready patterns - following enterprise best practices

Why This Actually Matters

Look, I get it. "Homelab projects" can sound like hobby stuff that doesn't translate to real work. But here's the thing - everything I did here is the same stuff I'd do managing production infrastructure. The only difference is the scale.

Every decision made here mirrors real production work:

  • Docker Compose scales to Kubernetes manifests
  • Environment files scales to HashiCorp Vault or AWS Secrets Manager
  • Bridge networking scales to service meshes like Istio
  • Stack organization scales to microservices architecture
  • Documentation practices scale to team knowledge bases

The fundamentals are identical. The tools change, but the principles of infrastructure organization, security, and observability remain constant.


What I'm Doing Next

Because I can't leave well enough alone:

  • Automated health checks with smart restart policies
  • Automated backup testing (backing up is great, but can I actually restore?)
  • Advanced Grafana alerting rules that send notifications to Ntfy for critical issues
  • Prometheus exporters for hardware metrics (temperature, disk health, UPS status)

The Stack:

Docker & Docker Compose | Prometheus & Grafana | Linux System Administration | Network Architecture | Secrets Management | Infrastructure as Code | Security Hardening | Service Orchestration | System Design | Infrastructure Monitoring

Bottom Line: Good infrastructure isn't about using the fanciest tools - it's about thinking through organization, security, and making sure future you doesn't hate past you. These same principles apply whether you're managing a homelab or running infrastructure at a Fortune 500.

And if you're a hiring manager reading this thinking "does this person actually know what they're doing?" - the answer is yes, but I also learned a lot by breaking things first. That's kind of the point of a homelab.

Stay updated
Want More Content Like This?

Drop me a note if you'd like to be notified when I post new content about home lab setups, automation workflows, and lessons from the help desk.

Email Me Your Interest