$cat ~/projects/monitoring-homelab

Monitoring Homelab

Production-style Prometheus + Grafana + Alertmanager stack on a 3-node Proxmox cluster. PagerDuty for paging.

A full monitoring stack on a 3-node Proxmox VE cluster running on consumer hardware. Every config file written by hand so I actually understand what each setting does.

What’s running

  • Prometheus scraping 15s metrics from all nodes and services
  • Grafana dashboards (node health, networking, container)
  • Alertmanager routing to PagerDuty
  • node_exporter on every host
  • blackbox_exporter for HTTP/TCP endpoint checks
  • Tailscale mesh for remote access

Alert rules

Seven alert rules in production:

  • NodeDown — any host unreachable for > 2 min
  • DiskSpaceLow — < 15% free
  • HighCPU — > 90% sustained 10 min
  • HighRAM — > 90% sustained 5 min
  • EndpointDown — any blackbox target failing
  • ServiceCrashed — systemd unit in failed state
  • ProxmoxStorageWarn — storage pool > 85%

Paging

Alertmanager → PagerDuty. SMS on alert, voice escalation if not acknowledged within 5 minutes. Tested by triggering each rule deliberately.

Repo

github.com/E-mma9/Monitoring-Homelab

← back to projects