Files
m2dev-server/docs/server-management.md

265 lines
5.9 KiB
Markdown

# Server Management
This document describes the current Debian-side control plane for the production Metin runtime.
## Inventory
The channel topology now lives in one versioned file:
- `deploy/channel-inventory.json`
It defines:
- auth and DB listener ports
- channel ids
- per-core public ports and P2P ports
- whether a channel is public/client-visible
- whether a special channel should always be included by management tooling
This inventory is now the source used by:
- `channel_inventory.py`
- `channels.py` compatibility exports
- `install.py`
- `deploy/systemd/install_systemd.py`
- `metinctl`
## metinctl
The Debian deployment installs:
- `/usr/local/bin/metinctl`
`metinctl` is a lightweight operational CLI for:
- showing an operational summary
- showing recent auth success/failure activity
- showing auth activity grouped by source IP
- showing recent `syserr.log` entries
- summarizing recurring `syserr.log` entries
- viewing inventory
- listing managed units
- checking service status
- listing declared ports
- verifying that enabled public client-facing channels are actually up
- listing recent auth failures
- listing recent login sessions
- listing stale open sessions without logout
- restarting the whole stack or specific channels/instances
- viewing logs
- listing core files in the runtime tree
- generating a backtrace for the newest or selected core file
- collecting incident bundles
- running the root-only headless healthcheck
- waiting for login-ready state after restart
## Examples
Show inventory:
```bash
metinctl inventory
```
Show current unit state:
```bash
metinctl status
```
Show a quick operational summary:
```bash
metinctl summary
```
Show declared ports and whether they are currently listening:
```bash
metinctl ports --live
```
Verify that enabled client-visible public channels are active and listening:
```bash
metinctl public-ready
```
Show recent real auth failures and skip smoke-test logins:
```bash
metinctl auth-failures
```
Show recent auth success/failure flow:
```bash
metinctl auth-activity
```
Show only recent auth failures including smoke tests:
```bash
metinctl auth-activity --status failure --include-smoke
```
Show auth activity grouped by IP:
```bash
metinctl auth-ips
```
Show the latest runtime errors collected from all `syserr.log` files:
```bash
metinctl recent-errors
```
Show the most repeated runtime errors in the last 24 hours:
```bash
metinctl error-summary
```
Include smoke-test failures too:
```bash
metinctl auth-failures --include-smoke
```
Show recent login sessions from `log.loginlog2`:
```bash
metinctl sessions
```
Show only sessions that still have no recorded logout:
```bash
metinctl sessions --active-only
```
Show stale open sessions older than 30 minutes:
```bash
metinctl session-audit
```
Use a different stale threshold:
```bash
metinctl session-audit --stale-minutes 10
```
Restart only channel 1 cores:
```bash
metinctl restart channel:1
```
Restart one specific game instance:
```bash
metinctl restart instance:channel1_core2
```
Tail auth logs:
```bash
metinctl logs auth -n 200 -f
```
Run the deeper end-to-end healthcheck:
```bash
metinctl healthcheck --mode full
```
Run the lighter readiness probe:
```bash
metinctl healthcheck --mode ready
```
Wait until a restarted stack is login-ready:
```bash
metinctl wait-ready
```
List core files currently present in the runtime tree:
```bash
metinctl cores
```
Generate a backtrace for the newest core file:
```bash
metinctl backtrace
```
Generate a backtrace for one specific core file:
```bash
metinctl backtrace --core channels/channel1/core1/core.2255450
```
Collect an incident bundle with logs, unit status, port state and repository revisions:
```bash
metinctl incident-collect --tag auth-timeout --since "-20 minutes"
```
List the most recent incident bundles:
```bash
metinctl incidents
```
## systemd installer behavior
`deploy/systemd/install_systemd.py` now uses the same inventory and installs `metinctl`.
It also reconciles enabled game instance units against the selected channels:
- selected game units are enabled
- stale game units are disabled
- if `--restart` is passed, stale game units are disabled with `--now`
- installs now refuse an auth/internal-only channel selection unless you pass `--allow-internal-only`
This makes channel enablement declarative instead of depending on whatever happened to be enabled previously.
## Crash / Incident Pipeline
The Debian deployment now also installs:
- `/usr/local/sbin/metin-collect-incident`
- `/usr/local/sbin/metin-core-backtrace`
The collector creates a timestamped bundle under:
- `/var/lib/metin/incidents`
Each bundle contains:
- repo revisions for `m2dev-server` and `m2dev-server-src`
- `systemctl status` for the whole stack
- recent `journalctl` output per unit
- listener state from `ss -ltnp`
- tailed runtime `syslog.log` and `syserr.log` files
- metadata for any `core*` files found under `runtime/server/channels`
- metadata for the executable inferred for each core file
If you call it with `--include-cores`, matching core files are copied into the bundle as well. In the same mode, the inferred executable files are copied too, so a later redeploy does not destroy your ability to symbolicate the crash with the original binary snapshot.
The runtime units now also declare `LimitCORE=infinity`, so after the next service restart the processes are allowed to emit core dumps when the host kernel/core policy permits it.
For quick manual crash triage outside the incident bundle flow, use:
```bash
metinctl backtrace
```
It defaults to the newest core file under the runtime tree, infers the executable path, and uses `gdb` or `lldb` when present on the host. If no supported debugger is installed, it still prints file/readelf metadata for the core and executable. If the current executable is newer than the core file, the helper prints an explicit warning because the backtrace may no longer match the crashed binary.