Building High-Availability DHCP and a Web Subnet Editor for Home Networks

Introduction

Every homelabber’s worst nightmare is the day their network goes dark because of a single point of failure. For me, that day arrived when my UDM Pro’s SFP+ port failed silently — taking down Plex, HDHomeRun, Home Assistant, and every other service dependent on DHCP. The device had become the bottleneck for my entire local network.

That failure sparked two parallel projects: replacing the built-in DHCP server with a high-availability ISC Kea pair running in LXC containers across two unRAID servers, and later building a web-based subnet editor to replace the Mac terminal scripts I’d been using for day-to-day management. Together, these projects transformed my network from fragile to resilient — and made managing it actually enjoyable.

This post walks through both projects: why they were built, how they work, key decisions along the way, and lessons learned that might save you time if you’re tackling something similar.


Part 1: High-Availability DHCP with ISC Kea

Why Replace the UDM Pro’s DHCP?

The UDM Pro was serving DHCP across five VLANs (Core, Network, IoT, Backend, and Home Assistant), but its single SFP+ port failure proved that it wasn’t truly redundant. When that port died, DHCP stopped working entirely — devices couldn’t get addresses, and the network effectively went dark.

The goal was clear: build a DHCP solution that survives the loss of either server independently, while providing better visibility and management capabilities than the UDM Pro’s built-in interface.

Architecture Overview

The solution uses two LXC containers running ISC Kea 2.6.5 in hot-standby mode across separate unRAID hosts:

DockOfTheBay (unRAID)                    SpaceDock (unRAID)
┌─────────────────────┐                  ┌─────────────────────┐
│ kea-primary         │◄──HA heartbeat──►│ kea-secondary       │
│ 10.10.10.10         │   :8080          │ 10.10.10.11         │
│ (active/primary)    │                  │ (standby)           │
└─────────────────────┘                  └─────────────────────┘
        │                                        │
        ▼                                        ▼
   kea-ctrl-agent                          kea-ctrl-agent
   :8000                                    :8000
        │                                        │
        ▼                                        ▼
   isc-stork-agent                           isc-stork-agent
   :8082                                      :8082

Both nodes serve the same five VLANs:

VLAN Name Subnet Pool Range
40 Core 10.10.0.0/16 10.10.100.0–255.254
1 Network 172.16.1.0/24 172.16.1.100–254
30 IoT 10.30.30.0/24 10.30.30.100–254
42 Backend 10.42.42.0/24 10.42.42.100–254
55 HASS 10.55.55.0/24 10.55.55.100–199

Why LXC Instead of Docker?

This was a critical architectural decision. Kea binds directly to network interfaces — one per VLAN. LXC containers get real network interfaces via the ich777 plugin, giving them direct access to the host’s networking stack. Docker would require either --net=host or macvlan, both adding complexity and fragility for this use case.

Hot-Standby vs Load-Balancing Mode

Kea supports two HA modes: load-balancing (splitting address pools between nodes) and hot-standby (full pool on primary, secondary takes over only if the primary fails). For a homelab with infrequent new-device events, hot-standby proved simpler and avoided pool-split complexity. The tradeoff is that the standby node’s resources sit idle until needed — but for this scale, that’s an acceptable cost for simplicity.

Monitoring with ISC Stork

ISC Stork 2.4.0 provides a monitoring dashboard for both Kea nodes. The architecture runs:
Stork agent natively inside each LXC container (not in Docker) because the Docker agent can’t discover Kea running in a different process namespace via Unix socket
Stork server as a Docker Compose stack on SpaceDock with PostgreSQL backend

The Stork UI at `http://10.10.20.79:8080` shows both nodes’ health, lease counts, and HA state in real-time.

Day-to-Day Management via Mac Scripts

Before the web editor, all DHCP management happened through a suite of shell scripts on my Mac:

Script Purpose
kea-leases.sh List active leases with VLAN selection menu
kea-reservations.sh List all fixed reservations across VLANs
kea-add-reservation.sh Add reservation, sync both nodes, reload
kea-remove-reservation.sh Remove reservation, sync both nodes, reload
kea-set-hostname.sh Update hostname on active lease or reservation
kea-subnet-info.sh Show pool, gateway, DNS per VLAN
kea-reload.sh Reload config on both nodes without restart

The sync workflow was: Mac pulls config from primary → applies sed swap for secondary name → pushes to secondary. This avoided cross-LXC SSH host key issues that plagued early attempts at direct primary-to-secondary sync.

Key Lessons Learned

Port conflicts are silent killers. Kea’s HA hook binds port 8080, and Stork agent defaults to the same port. They collide silently — no error, just broken communication. Fix: set STORK_AGENT_PORT=8082 in /etc/stork/agent.env.

Certificate permissions matter. After running stork-agent register, always fix ownership:

chown -R stork-agent:stork-agent /var/lib/stork-agent/certs/ \
                                  /var/lib/stork-agent/tokens/

The registration runs as root, but the service runs as stork-agent user.

Kea’s API returns arrays. The REST API always wraps responses in a JSON array [{}]. Access arguments with data[0]['arguments'], not data['arguments'].

LXC systemctl is unreliable via lxc-attach. Use pgrep -x kea-dhcp4 instead of systemctl is-active when checking service status through LXC containers.


Part 2: Web-Based Subnet Editor for Kea DHCP

The Problem with Terminal Scripts

The Mac operation scripts worked, but they had limitations:
– Required SSH access and terminal familiarity
– No visual feedback on current values before editing
– Manual sync between nodes prone to human error
– Harder for team members or family to use

Building the Web Interface

The solution was a Flask-based web application (kea-web) that provides an in-browser editor for Kea subnet settings. Previously, changing a VLAN’s gateway, DNS servers, NTP, IP pool, or lease time required running shell scripts from the Mac terminal. Now it’s a form in the Subnets tab of the dashboard.

Architecture

Browser → kea-web (SpaceDock:8085) → SSH/SFTP → kea-primary & kea-secondary
                                    → HTTP Control Agent → Config reload

The application is built as a Docker container deployed on SpaceDock, with the following flow for subnet edits:

  1. Browser submits POST /subnets/edit with subnet_id and changed fields
  2. Backend connects to kea-primary via SFTP, reads kea-dhcp4.conf as JSON, patches the target subnet’s pools/valid-lifetime/option-data entries in-place, writes back
  3. Same function repeats independently on kea-secondary — each node’s config is read and patched separately, so this-server-name is never corrupted
  4. Backend hits the Control Agent on each node via HTTP to hot-reload the config without a service restart
  5. User is redirected back with a flash message confirming success

Key Design Decisions

Independent node patching vs scp + sed. The earlier shell scripts synced by pulling from primary, running sed 's/kea-primary/kea-secondary/', and pushing to secondary. The web app takes a safer approach: each node’s config is read and patched locally. This preserves any node-specific fields beyond this-server-name and avoids the assumption that the only difference between configs is that one string.

Empty field behavior. The modal pre-populates all fields with live running values from the Kea Control Agent. Submitting with any field unchanged preserves the current value. An empty field removes the option-data entry at the subnet level (the global default then takes over). This matches the mental model of the shell scripts — you only change what you intend to change.

Single pool range support. The form supports one pool range per VLAN (e.g., 10.10.100.0 - 10.10.199.255). All production VLANs have exactly one pool, so multi-pool support was deferred to keep the UI simple.

Deployment with Komodo

The application is managed through a custom Komodo stack:
– Source code in Gitea repository (homelab-containers)
– Build job creates Docker image gitea.cossaboon.net/kcossabo/kea-web:latest
– ResourceSync pushes config to SpaceDock
– Deploy runs on SpaceDock at `http://10.10.20.84:8085`

Lessons from the Web Editor Build

Watch for invisible bugs in TOML. A clone_path = " " (two spaces, not empty) in resources.toml was silently breaking deployments until ResourceSync overwrote the valid path with the invalid one. Spaces-only values are invisible in many editors — always check carefully if a deploy fails at Stage 1.

Jinja2 template limitations. You can’t call dict.update() directly in Jinja2 templates. Register custom filters on the Flask app for list-to-dict conversions instead of trying inline mutations.


Conclusion

Building high-availability DHCP and a web management interface transformed my homelab from fragile to resilient, and more importantly, made network management actually enjoyable rather than a chore.

Key Takeaways

  1. Don’t trust single points of failure. The UDM Pro’s SFP+ port failure was the catalyst for everything. Even “redundant” consumer gear often isn’t truly redundant in practice.

  2. LXC over Docker for network-heavy services. When a service needs direct interface access, LXC with the ich777 plugin is simpler and more reliable than wrestling with macvlan or host networking.

  3. Hot-standby beats load-balancing for homelabs. Simpler to configure, simpler to debug, and your standby resources are available when you actually need them.

  4. Web interfaces beat terminal scripts for shared environments. Even if you’re the only operator, a browser-based editor with pre-populated values and visual feedback reduces errors and makes changes more intuitive.

  5. Monitor everything. ISC Stork’s real-time visibility into both Kea nodes’ health and lease counts was invaluable during troubleshooting and gave confidence that the HA setup actually worked as intended.

The result is a DHCP infrastructure that survives server failures, provides clear monitoring, and can be managed from any browser — not just my Mac terminal. If you’re running a homelab with multiple VLANs and devices depending on DHCP, these patterns are worth considering for your own setup.

About the Author

Kevin Cossaboon

A networking profesional located in Northren Virginia, USA. My hobbies are Technology and Photography. Love playing with the latest technology, and will try to post reviews of them. Also love my life long journey of learning to capture light, to trigger emotions, through photography.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.