diff --git a/docs/runbook-caddy-updates.md b/docs/runbook-caddy-updates.md new file mode 100644 index 00000000..31c44a1e --- /dev/null +++ b/docs/runbook-caddy-updates.md @@ -0,0 +1,210 @@ +# Runbook — bring up updates.jakubkadlec.dev + +Operator runbook for turning the update channel on. Does the following on the production VPS `mt2.jakubkadlec.dev`: + +1. Creates the directory layout the update manager expects +2. Adds the Caddy site block for `updates.jakubkadlec.dev` +3. Validates the Caddy config before reloading +4. Reloads Caddy so the new vhost serves HTTPS with a fresh Let's Encrypt cert +5. Verifies the vhost is up from an external client + +**Pre-requisites:** + +- Root or `sudo` access on the VPS. +- DNS: `updates.jakubkadlec.dev` already resolves to the VPS IP (verified 2026-04-14: `194.163.138.177`). If it stops resolving, fix DNS first. +- Port 80 open from the public internet (Caddy uses it for the ACME HTTP-01 challenge). Already open because Caddy is serving other sites on 443. + +**Estimated time:** 5 minutes, most of it waiting for LE cert issuance. + +**Rollback:** every mutating step has an explicit rollback below. The safest rollback is to restore the backup Caddyfile and reload — Caddy will drop the new vhost and keep everything else running exactly as before. + +## Step 1 — SSH to the VPS + +```bash +ssh -i ~/.ssh/metin mt2.jakubkadlec.dev@mt2.jakubkadlec.dev +``` + +All following commands run as the `mt2.jakubkadlec.dev` user unless marked `sudo`. + +## Step 2 — Create the directory layout + +```bash +sudo mkdir -p /var/www/updates.jakubkadlec.dev/{files,manifests,launcher} +sudo chown -R mt2.jakubkadlec.dev:mt2.jakubkadlec.dev /var/www/updates.jakubkadlec.dev +sudo chmod -R 755 /var/www/updates.jakubkadlec.dev + +# Drop a placeholder manifest so the vhost has something to serve during validation. +# This file will be overwritten by the first real release. +cat > /tmp/placeholder-manifest.json <<'EOF' +{ + "version": "0.0.0-placeholder", + "created_at": "2026-04-14T00:00:00Z", + "notes": "placeholder — replace with the first real signed release", + "launcher": { + "path": "Metin2Launcher.exe", + "sha256": "0000000000000000000000000000000000000000000000000000000000000000", + "size": 0, + "platform": "windows" + }, + "files": [] +} +EOF +sudo mv /tmp/placeholder-manifest.json /var/www/updates.jakubkadlec.dev/manifest.json +sudo chown mt2.jakubkadlec.dev:mt2.jakubkadlec.dev /var/www/updates.jakubkadlec.dev/manifest.json +``` + +**Rollback for step 2:** + +```bash +sudo rm -rf /var/www/updates.jakubkadlec.dev +``` + +Note that without a signed placeholder the launcher will refuse to launch, because the zero-hash signature won't verify. That's **by design** — the launcher treats signature failure as "server is lying" and blocks the game. The placeholder is only there to prove HTTPS works; the first real release will overwrite it with a properly signed manifest. + +## Step 3 — Back up the current Caddyfile + +```bash +sudo cp /etc/caddy/Caddyfile /etc/caddy/Caddyfile.bak.$(date +%Y%m%d-%H%M%S) +ls -la /etc/caddy/Caddyfile.bak.* +``` + +**Rollback for step 3:** there's nothing to roll back — backup files are harmless. + +## Step 4 — Append the new vhost block to Caddyfile + +The block lives in the repo at `docs/caddy-updates.conf`. Copy its contents and append them to `/etc/caddy/Caddyfile`: + +```bash +# From your local machine, or by pulling the file onto the VPS: +sudo tee -a /etc/caddy/Caddyfile < /path/to/docs/caddy-updates.conf +``` + +Or, if you'd rather pull it from Gitea directly on the VPS: + +```bash +curl -sS -H "Authorization: token $(cat ~/.config/metin/gitea-token)" \ + "https://gitea.jakubkadlec.dev/api/v1/repos/metin-server/m2dev-client/raw/main/docs/caddy-updates.conf" \ + | sudo tee -a /etc/caddy/Caddyfile +``` + +(Replace `main` with the PR branch if you want to test before merge.) + +Open the Caddyfile and confirm the block is at the end with no mangled whitespace: + +```bash +sudo tail -80 /etc/caddy/Caddyfile +``` + +**Rollback for step 4:** + +```bash +sudo cp /etc/caddy/Caddyfile.bak. /etc/caddy/Caddyfile +``` + +## Step 5 — Validate the new Caddyfile + +**Do not skip this.** A broken Caddyfile + reload would take every Caddy-served site down together. + +```bash +sudo caddy validate --config /etc/caddy/Caddyfile --adapter caddyfile +``` + +Expected output ends with `Valid configuration`. Any line starting with `error` means stop and roll back step 4: + +```bash +sudo cp /etc/caddy/Caddyfile.bak. /etc/caddy/Caddyfile +``` + +## Step 6 — Reload Caddy + +```bash +sudo systemctl reload caddy +sudo systemctl status caddy --no-pager +``` + +`reload` is not `restart` — running connections are preserved and Caddy loads the new config in place. If something goes wrong Caddy keeps the old config active. + +**If reload fails** (systemctl returns non-zero), run the validate step again and read `journalctl -u caddy -n 50` to see the exact error, then roll back step 4 and reload again. + +**Rollback for step 6:** restoring the backup Caddyfile and reloading takes you back to the previous state: + +```bash +sudo cp /etc/caddy/Caddyfile.bak. /etc/caddy/Caddyfile +sudo systemctl reload caddy +``` + +## Step 7 — Wait for Let's Encrypt cert issuance + +Caddy issues a cert for the new subdomain automatically via the HTTP-01 challenge. Usually takes under 30 seconds. + +Watch Caddy's logs for the issuance event: + +```bash +sudo journalctl -u caddy -f +``` + +Look for lines mentioning `updates.jakubkadlec.dev`, specifically `certificate obtained successfully`. Ctrl-C out once you see it. + +**If it doesn't issue within 2 minutes**, one of: + +- Port 80 is blocked — check `sudo ss -tlnp | grep ':80'` shows Caddy listening. +- DNS hasn't propagated — check `dig updates.jakubkadlec.dev +short` matches the VPS IP. +- Let's Encrypt rate limit — check `journalctl -u caddy` for `too many certificates`. Wait an hour and retry; don't hammer. + +## Step 8 — Verify from an external client + +From any machine that isn't the VPS: + +```bash +# Cert subject should contain updates.jakubkadlec.dev in SAN +echo | openssl s_client -connect updates.jakubkadlec.dev:443 \ + -servername updates.jakubkadlec.dev 2>/dev/null \ + | openssl x509 -noout -text | grep -A1 "Subject Alternative Name" + +# Manifest should return 200 with a short Cache-Control +curl -I https://updates.jakubkadlec.dev/manifest.json + +# Placeholder manifest body, pretty-printed +curl -sS https://updates.jakubkadlec.dev/manifest.json | jq . +``` + +Expected: SAN contains `updates.jakubkadlec.dev`, HTTP 200, `Cache-Control: public, max-age=60, must-revalidate`, body is the placeholder JSON from step 2. + +If all three pass, the update channel is live and the launcher will accept fetches (though it will still refuse to apply the placeholder manifest because its signature is not valid — see step 2 note). + +## Step 9 — Clean up old backups (optional, later) + +Once the vhost has been live for a week without incident: + +```bash +# List backups older than 7 days +sudo find /etc/caddy/Caddyfile.bak.* -mtime +7 + +# Remove them +sudo find /etc/caddy/Caddyfile.bak.* -mtime +7 -delete +``` + +## Post-runbook — What's next + +- The first real release uses `scripts/make-manifest.py` + `scripts/sign-manifest.py` (both in this repo) to produce `manifest.json` + `manifest.json.sig`, then rsync them onto `/var/www/updates.jakubkadlec.dev/` along with the content-addressed blobs under `files//`. +- The launcher binary's own self-update path (Velopack) needs a separate publish step (`vpk pack`) that populates `/var/www/updates.jakubkadlec.dev/launcher/`. That's its own runbook and not part of this one. + +## If something goes catastrophically wrong + +Caddy dies across the board → Gitea (`gitea.jakubkadlec.dev`) and any other served site are offline. System SSH on port 22 is independent of Caddy, so you can always reach the box. + +Recovery: + +```bash +ssh -i ~/.ssh/metin mt2.jakubkadlec.dev@mt2.jakubkadlec.dev + +# Restore the last known-good Caddyfile +sudo ls -lt /etc/caddy/Caddyfile.bak.* | head -1 +sudo cp /etc/caddy/Caddyfile.bak. /etc/caddy/Caddyfile + +sudo caddy validate --config /etc/caddy/Caddyfile --adapter caddyfile +sudo systemctl reload caddy || sudo systemctl restart caddy +sudo systemctl status caddy --no-pager +``` + +Gitea SSH remains on port 2222 the whole time; it's a separate process and does not share fate with Caddy.