Files
m2dev-server/docs/server-management.md

5.9 KiB

Server Management

This document describes the current Debian-side control plane for the production Metin runtime.

Inventory

The channel topology now lives in one versioned file:

  • deploy/channel-inventory.json

It defines:

  • auth and DB listener ports
  • channel ids
  • per-core public ports and P2P ports
  • whether a channel is public/client-visible
  • whether a special channel should always be included by management tooling

This inventory is now the source used by:

  • channel_inventory.py
  • channels.py compatibility exports
  • install.py
  • deploy/systemd/install_systemd.py
  • metinctl

metinctl

The Debian deployment installs:

  • /usr/local/bin/metinctl

metinctl is a lightweight operational CLI for:

  • showing an operational summary
  • showing recent auth success/failure activity
  • showing auth activity grouped by source IP
  • showing recent syserr.log entries
  • summarizing recurring syserr.log entries
  • viewing inventory
  • listing managed units
  • checking service status
  • listing declared ports
  • verifying that enabled public client-facing channels are actually up
  • listing recent auth failures
  • listing recent login sessions
  • listing stale open sessions without logout
  • restarting the whole stack or specific channels/instances
  • viewing logs
  • listing core files in the runtime tree
  • generating a backtrace for the newest or selected core file
  • collecting incident bundles
  • running the root-only headless healthcheck
  • waiting for login-ready state after restart

Examples

Show inventory:

metinctl inventory

Show current unit state:

metinctl status

Show a quick operational summary:

metinctl summary

Show declared ports and whether they are currently listening:

metinctl ports --live

Verify that enabled client-visible public channels are active and listening:

metinctl public-ready

Show recent real auth failures and skip smoke-test logins:

metinctl auth-failures

Show recent auth success/failure flow:

metinctl auth-activity

Show only recent auth failures including smoke tests:

metinctl auth-activity --status failure --include-smoke

Show auth activity grouped by IP:

metinctl auth-ips

Show the latest runtime errors collected from all syserr.log files:

metinctl recent-errors

Show the most repeated runtime errors in the last 24 hours:

metinctl error-summary

Include smoke-test failures too:

metinctl auth-failures --include-smoke

Show recent login sessions from log.loginlog2:

metinctl sessions

Show only sessions that still have no recorded logout:

metinctl sessions --active-only

Show stale open sessions older than 30 minutes:

metinctl session-audit

Use a different stale threshold:

metinctl session-audit --stale-minutes 10

Restart only channel 1 cores:

metinctl restart channel:1

Restart one specific game instance:

metinctl restart instance:channel1_core2

Tail auth logs:

metinctl logs auth -n 200 -f

Run the deeper end-to-end healthcheck:

metinctl healthcheck --mode full

Run the lighter readiness probe:

metinctl healthcheck --mode ready

Wait until a restarted stack is login-ready:

metinctl wait-ready

List core files currently present in the runtime tree:

metinctl cores

Generate a backtrace for the newest core file:

metinctl backtrace

Generate a backtrace for one specific core file:

metinctl backtrace --core channels/channel1/core1/core.2255450

Collect an incident bundle with logs, unit status, port state and repository revisions:

metinctl incident-collect --tag auth-timeout --since "-20 minutes"

List the most recent incident bundles:

metinctl incidents

systemd installer behavior

deploy/systemd/install_systemd.py now uses the same inventory and installs metinctl.

It also reconciles enabled game instance units against the selected channels:

  • selected game units are enabled
  • stale game units are disabled
  • if --restart is passed, stale game units are disabled with --now
  • installs now refuse an auth/internal-only channel selection unless you pass --allow-internal-only

This makes channel enablement declarative instead of depending on whatever happened to be enabled previously.

Crash / Incident Pipeline

The Debian deployment now also installs:

  • /usr/local/sbin/metin-collect-incident
  • /usr/local/sbin/metin-core-backtrace

The collector creates a timestamped bundle under:

  • /var/lib/metin/incidents

Each bundle contains:

  • repo revisions for m2dev-server and m2dev-server-src
  • systemctl status for the whole stack
  • recent journalctl output per unit
  • listener state from ss -ltnp
  • tailed runtime syslog.log and syserr.log files
  • metadata for any core* files found under runtime/server/channels
  • metadata for the executable inferred for each core file

If you call it with --include-cores, matching core files are copied into the bundle as well. In the same mode, the inferred executable files are copied too, so a later redeploy does not destroy your ability to symbolicate the crash with the original binary snapshot.

The runtime units now also declare LimitCORE=infinity, so after the next service restart the processes are allowed to emit core dumps when the host kernel/core policy permits it.

For quick manual crash triage outside the incident bundle flow, use:

metinctl backtrace

It defaults to the newest core file under the runtime tree, infers the executable path, and uses gdb or lldb when present on the host. If no supported debugger is installed, it still prints file/readelf metadata for the core and executable. If the current executable is newer than the core file, the helper prints an explicit warning because the backtrace may no longer match the crashed binary.