Files
m2dev-client/docs/runbook-caddy-updates.md
Jan Nedbal fdb9e98075 docs: add caddy updates vhost bring-up runbook
Step-by-step operator runbook for turning on updates.jakubkadlec.dev:
create the webroot, append the site block, validate the Caddyfile
before reload, watch for Let's Encrypt cert issuance, verify from an
external client, plus explicit rollback for every mutating step and a
catastrophic-recovery section in case Caddy drops all sites. Targeted
at Jakub (VPS operator) so Claude does not touch the running service.
2026-04-14 11:35:39 +02:00

8.0 KiB

Runbook — bring up updates.jakubkadlec.dev

Operator runbook for turning the update channel on. Does the following on the production VPS mt2.jakubkadlec.dev:

  1. Creates the directory layout the update manager expects
  2. Adds the Caddy site block for updates.jakubkadlec.dev
  3. Validates the Caddy config before reloading
  4. Reloads Caddy so the new vhost serves HTTPS with a fresh Let's Encrypt cert
  5. Verifies the vhost is up from an external client

Pre-requisites:

  • Root or sudo access on the VPS.
  • DNS: updates.jakubkadlec.dev already resolves to the VPS IP (verified 2026-04-14: 194.163.138.177). If it stops resolving, fix DNS first.
  • Port 80 open from the public internet (Caddy uses it for the ACME HTTP-01 challenge). Already open because Caddy is serving other sites on 443.

Estimated time: 5 minutes, most of it waiting for LE cert issuance.

Rollback: every mutating step has an explicit rollback below. The safest rollback is to restore the backup Caddyfile and reload — Caddy will drop the new vhost and keep everything else running exactly as before.

Step 1 — SSH to the VPS

ssh -i ~/.ssh/metin mt2.jakubkadlec.dev@mt2.jakubkadlec.dev

All following commands run as the mt2.jakubkadlec.dev user unless marked sudo.

Step 2 — Create the directory layout

sudo mkdir -p /var/www/updates.jakubkadlec.dev/{files,manifests,launcher}
sudo chown -R mt2.jakubkadlec.dev:mt2.jakubkadlec.dev /var/www/updates.jakubkadlec.dev
sudo chmod -R 755 /var/www/updates.jakubkadlec.dev

# Drop a placeholder manifest so the vhost has something to serve during validation.
# This file will be overwritten by the first real release.
cat > /tmp/placeholder-manifest.json <<'EOF'
{
  "version": "0.0.0-placeholder",
  "created_at": "2026-04-14T00:00:00Z",
  "notes": "placeholder — replace with the first real signed release",
  "launcher": {
    "path": "Metin2Launcher.exe",
    "sha256": "0000000000000000000000000000000000000000000000000000000000000000",
    "size": 0,
    "platform": "windows"
  },
  "files": []
}
EOF
sudo mv /tmp/placeholder-manifest.json /var/www/updates.jakubkadlec.dev/manifest.json
sudo chown mt2.jakubkadlec.dev:mt2.jakubkadlec.dev /var/www/updates.jakubkadlec.dev/manifest.json

Rollback for step 2:

sudo rm -rf /var/www/updates.jakubkadlec.dev

Note that without a signed placeholder the launcher will refuse to launch, because the zero-hash signature won't verify. That's by design — the launcher treats signature failure as "server is lying" and blocks the game. The placeholder is only there to prove HTTPS works; the first real release will overwrite it with a properly signed manifest.

Step 3 — Back up the current Caddyfile

sudo cp /etc/caddy/Caddyfile /etc/caddy/Caddyfile.bak.$(date +%Y%m%d-%H%M%S)
ls -la /etc/caddy/Caddyfile.bak.*

Rollback for step 3: there's nothing to roll back — backup files are harmless.

Step 4 — Append the new vhost block to Caddyfile

The block lives in the repo at docs/caddy-updates.conf. Copy its contents and append them to /etc/caddy/Caddyfile:

# From your local machine, or by pulling the file onto the VPS:
sudo tee -a /etc/caddy/Caddyfile < /path/to/docs/caddy-updates.conf

Or, if you'd rather pull it from Gitea directly on the VPS:

curl -sS -H "Authorization: token $(cat ~/.config/metin/gitea-token)" \
  "https://gitea.jakubkadlec.dev/api/v1/repos/metin-server/m2dev-client/raw/main/docs/caddy-updates.conf" \
  | sudo tee -a /etc/caddy/Caddyfile

(Replace main with the PR branch if you want to test before merge.)

Open the Caddyfile and confirm the block is at the end with no mangled whitespace:

sudo tail -80 /etc/caddy/Caddyfile

Rollback for step 4:

sudo cp /etc/caddy/Caddyfile.bak.<timestamp> /etc/caddy/Caddyfile

Step 5 — Validate the new Caddyfile

Do not skip this. A broken Caddyfile + reload would take every Caddy-served site down together.

sudo caddy validate --config /etc/caddy/Caddyfile --adapter caddyfile

Expected output ends with Valid configuration. Any line starting with error means stop and roll back step 4:

sudo cp /etc/caddy/Caddyfile.bak.<timestamp> /etc/caddy/Caddyfile

Step 6 — Reload Caddy

sudo systemctl reload caddy
sudo systemctl status caddy --no-pager

reload is not restart — running connections are preserved and Caddy loads the new config in place. If something goes wrong Caddy keeps the old config active.

If reload fails (systemctl returns non-zero), run the validate step again and read journalctl -u caddy -n 50 to see the exact error, then roll back step 4 and reload again.

Rollback for step 6: restoring the backup Caddyfile and reloading takes you back to the previous state:

sudo cp /etc/caddy/Caddyfile.bak.<timestamp> /etc/caddy/Caddyfile
sudo systemctl reload caddy

Step 7 — Wait for Let's Encrypt cert issuance

Caddy issues a cert for the new subdomain automatically via the HTTP-01 challenge. Usually takes under 30 seconds.

Watch Caddy's logs for the issuance event:

sudo journalctl -u caddy -f

Look for lines mentioning updates.jakubkadlec.dev, specifically certificate obtained successfully. Ctrl-C out once you see it.

If it doesn't issue within 2 minutes, one of:

  • Port 80 is blocked — check sudo ss -tlnp | grep ':80' shows Caddy listening.
  • DNS hasn't propagated — check dig updates.jakubkadlec.dev +short matches the VPS IP.
  • Let's Encrypt rate limit — check journalctl -u caddy for too many certificates. Wait an hour and retry; don't hammer.

Step 8 — Verify from an external client

From any machine that isn't the VPS:

# Cert subject should contain updates.jakubkadlec.dev in SAN
echo | openssl s_client -connect updates.jakubkadlec.dev:443 \
  -servername updates.jakubkadlec.dev 2>/dev/null \
  | openssl x509 -noout -text | grep -A1 "Subject Alternative Name"

# Manifest should return 200 with a short Cache-Control
curl -I https://updates.jakubkadlec.dev/manifest.json

# Placeholder manifest body, pretty-printed
curl -sS https://updates.jakubkadlec.dev/manifest.json | jq .

Expected: SAN contains updates.jakubkadlec.dev, HTTP 200, Cache-Control: public, max-age=60, must-revalidate, body is the placeholder JSON from step 2.

If all three pass, the update channel is live and the launcher will accept fetches (though it will still refuse to apply the placeholder manifest because its signature is not valid — see step 2 note).

Step 9 — Clean up old backups (optional, later)

Once the vhost has been live for a week without incident:

# List backups older than 7 days
sudo find /etc/caddy/Caddyfile.bak.* -mtime +7

# Remove them
sudo find /etc/caddy/Caddyfile.bak.* -mtime +7 -delete

Post-runbook — What's next

  • The first real release uses scripts/make-manifest.py + scripts/sign-manifest.py (both in this repo) to produce manifest.json + manifest.json.sig, then rsync them onto /var/www/updates.jakubkadlec.dev/ along with the content-addressed blobs under files/<hash[0:2]>/<hash>.
  • The launcher binary's own self-update path (Velopack) needs a separate publish step (vpk pack) that populates /var/www/updates.jakubkadlec.dev/launcher/. That's its own runbook and not part of this one.

If something goes catastrophically wrong

Caddy dies across the board → Gitea (gitea.jakubkadlec.dev) and any other served site are offline. System SSH on port 22 is independent of Caddy, so you can always reach the box.

Recovery:

ssh -i ~/.ssh/metin mt2.jakubkadlec.dev@mt2.jakubkadlec.dev

# Restore the last known-good Caddyfile
sudo ls -lt /etc/caddy/Caddyfile.bak.* | head -1
sudo cp /etc/caddy/Caddyfile.bak.<most-recent> /etc/caddy/Caddyfile

sudo caddy validate --config /etc/caddy/Caddyfile --adapter caddyfile
sudo systemctl reload caddy || sudo systemctl restart caddy
sudo systemctl status caddy --no-pager

Gitea SSH remains on port 2222 the whole time; it's a separate process and does not share fate with Caddy.