# Server runtime audit

Engineer-to-engineer writeup of what the VPS `mt2.jakubkadlec.dev` is actually
running as of 2026-04-14. Existing docs under `docs/` describe the intended
layout (`debian-runtime.md`, `database-bootstrap.md`, `config-and-secrets.md`);
this document is a ground-truth snapshot from a live recon session, with PIDs,
paths, versions and surprises.

Companion: `docs/server-topology.md` for the ASCII diagram and port table.

## TL;DR

- Only one metin binary is alive right now: the **`db`** helper on port `9000`
  (PID `1788997` at audit time, cwd
  `/home/mt2.jakubkadlec.dev/metin/runtime/server/channels/db`).
- **`game_auth` and all `channel*_core*` processes are NOT running.** The listing
  in the original prompt (auth `:11000/12000`, channel1 cores `:11011/12011`
  etc.) reflects *intended* state from the systemd units, not the current live
  process table. `ss -tlnp` only shows `0.0.0.0:9000` for m2.
- The game/auth binaries are **not present on disk either**. Only
  `share/bin/db` exists; there is no `share/bin/game_auth` and no
  `share/bin/channel*_core*`. Those channels cannot start even if requested.
- The `db` unit is currently **flapping / crash-looping**. `systemctl` reports
  `deactivating (stop-sigterm)`; syserr.log shows repeated
  `Connection reset by peer` from client peers (auth/game trying to reconnect
  is the usual culprit, but here nobody is connecting — cause needs
  verification). Two fresh `core.<pid>` files (97 MB each) sit in the db
  channel dir from 13:24 and 13:25 today.
- Orchestration is **pure systemd**, not the upstream `start.py` / tmux setup.
  The README still documents `start.py`, so the README is stale for the Debian
  VPS; `deploy/systemd/` + `docs/debian-runtime.md` are authoritative.
- MariaDB 11.8.6 is the backing store on `127.0.0.1:3306`. The DB user the
  stack is configured to use is `bootstrap` (from `share/conf/db.txt` /
  `game.txt`). The actual password is injected via `/etc/metin/metin.env`,
  which is `root:root 600` and intentionally unreadable by the runtime user
  inspector account.

## Host

- Hostname: `vmi3229987` (Contabo), public name `mt2.jakubkadlec.dev`.
- OS: Debian 13 (trixie).
- MariaDB: `mariadbd` 11.8.6, PID `103624`, listening on `127.0.0.1:3306`.
- All metin services run as the unprivileged user
  `mt2.jakubkadlec.dev:mt2.jakubkadlec.dev`.
- Runtime root: `/home/mt2.jakubkadlec.dev/metin/runtime/server` (755 MB across
  `channels/`, 123 MB across `share/`, total metin workspace on the box
  ~1.7 GB).

## Processes currently alive

From `ps auxf` + `ss -tlnp` at audit time:

```
mysql    103624  /usr/sbin/mariadbd                     — 127.0.0.1:3306
mt2.j+  1788997  /home/.../channels/db/db               — 0.0.0.0:9000
```

No other m2 binaries show up. `ps` has **zero** matches for `game_auth`,
`channel1_core1`, `channel1_core2`, `channel1_core3`, `channel99_core1`.

Per-process inspection:

| PID     | cwd                                             | exe (resolved)                                    | fds of interest |
| ------- | ----------------------------------------------- | ------------------------------------------------- | --------------- |
| 1788997 | `.../runtime/server/channels/db`                | `.../share/bin/db` (via `./db` symlink)           | fd 3→syslog.log, fd 4→syserr.log, fd 11 TCP `*:9000`, fd 17 `[eventpoll]` (epoll fdwatch) |

The `db` symlink inside the channel dir resolves to `../../share/bin/db`,
which is an `ELF 64-bit LSB pie executable, x86-64, dynamically linked,
BuildID fc049d0f..., not stripped`. Build identifier from
`channels/db/VERSION.txt`: **`db revision: b2b037f-dirty`** — the dirty tag is
a red flag, the build wasn't from a clean checkout of `m2dev-server-src`.

The `usage.txt` in the same directory shows hourly heartbeat rows with
`| 0 | 0 |` since 2026-04-13 21:00 (the "sessions / active" columns are
stuck at zero — consistent with no game channels being connected).

## Binaries actually present on disk

```
/home/mt2.jakubkadlec.dev/metin/runtime/server/share/bin/
├── db        ← present, used
└── game      ← present (shared game binary, but not launched under any
                instance name that the systemd generator expects)
```

What is NOT present:

- `share/bin/game_auth`
- `share/bin/channel1_core1`, `channel1_core2`, `channel1_core3`
- `share/bin/channel99_core1`

The `metin-game-instance-start` helper (`/usr/local/libexec/...`) is a bash
wrapper that `cd`s into `channels/<channel>/<core>/` and execs `./<instance>`,
e.g. `./channel1_core1`. Those per-instance binaries don't exist yet. The
channel dirs themselves (`channel1/core1/`, etc.) already contain the
scaffolding (`CONFIG`, `conf`, `data`, `log`, `mark`, `package`,
`p2p_packet_info.txt`, `packet_info.txt`, `syserr.log`, `syslog.log`,
`version.txt`), but `version.txt` says `game revision: unknown` and the
per-instance executable file is missing. The log directory has a single
stale `syslog_2026-04-13.log`.

Interpretation: the deploy pipeline that builds `m2dev-server-src` and drops
instance binaries into `share/bin/` has not yet been run (or has not been
re-run since the tree was laid out on 2026-04-13). Once Jakub's
`debian-foundation` build produces per-instance symlinked/hardlinked
binaries, the `metin-game@*` units should come up automatically on the next
`systemctl restart metin-server`.

## How things are started

All orchestration goes through systemd units under `/etc/systemd/system/`,
installed from `deploy/systemd/` via `deploy/systemd/install_systemd.py`.

Unit list and roles:

| Unit                                      | Type     | Role                                         |
| ----------------------------------------- | -------- | -------------------------------------------- |
| `metin-server.service`                    | oneshot  | top-level grouping, `Requires=mariadb.service`. `ExecStart=/bin/true`, `RemainAfterExit=yes`. All sub-units are `PartOf=metin-server.service` so restarting `metin-server` cycles everything. |
| `metin-db.service`                        | simple   | launches `.../channels/db/db` as runtime user, `Restart=on-failure`, `LimitCORE=infinity`, env file `/etc/metin/metin.env`. |
| `metin-db-ready.service`                  | oneshot  | runs `/usr/local/libexec/metin-wait-port 127.0.0.1 9000 30` — gate that blocks auth+game until the DB socket is listening. |
| `metin-auth.service`                      | simple   | launches `.../channels/auth/game_auth`. Requires db-ready. |
| `metin-game@channel1_core1..3.service`    | template | each runs `/usr/local/libexec/metin-game-instance-start <instance>` which execs `./<instance>` in that channel dir. |
| `metin-game@channel99_core1.service`      | template | same, for channel 99. |

Dependency chain:

```
mariadb.service
      │
      ▼
metin-db.service ──► metin-db-ready.service ──► metin-auth.service
                                             └► metin-game@*.service
                                                    │
                                                    ▼
                                             metin-server.service  (oneshot umbrella)
```

All units have `PartOf=metin-server.service`, `Restart=on-failure`,
`LimitNOFILE=65535`, `LimitCORE=infinity`. None run in Docker. None use tmux,
screen or the upstream `start.py`. **The upstream `start.py` / `stop.py` in
the repo are NOT wired up on this host** and should be treated as FreeBSD-era
legacy.

The per-instance launcher `/usr/local/libexec/metin-game-instance-start`
(installed by `install_systemd.py`) is:

```bash
#!/usr/bin/env bash
set -euo pipefail
instance="${1:?missing instance name}"
root_dir="/home/mt2.jakubkadlec.dev/metin/runtime/server/channels"
channel_dir="${instance%_*}"           # e.g. channel1 from channel1_core2
core_dir="${instance##*_}"             # e.g. core2
workdir="${root_dir}/${channel_dir}/${core_dir}"
cd "$workdir"
exec "./${instance}"
```

Notes:

- the `%_*` / `##*_` parse is brittle — an instance name with more than one
  underscore would misbehave. For current naming (`channelN_coreM`) it works.
- the helper does not redirect stdout/stderr; both go to the journal via
  systemd.

## Config files the binaries actually read

All m2 config files referenced by the running/installed stack, resolved to
their real path on disk:

| Config file                                                              | Read by       | Purpose                                             |
| ------------------------------------------------------------------------ | ------------- | --------------------------------------------------- |
| `share/conf/db.txt`                                                      | `db`          | SQL hosts, BIND_PORT=9000, item id range, hotbackup |
| `share/conf/game.txt`                                                    | game cores    | DB_ADDR=127.0.0.1, DB_PORT=9000, SQL creds, flags   |
| `share/conf/CMD`                                                         | game cores    | in-game command ACL (notice, warp, item, …)         |
| `share/conf/item_proto.txt`, `mob_proto.txt`, `item_names*.txt`, `mob_names*.txt` | both db and game | static content tables |
| `channels/db/conf` (symlink → `share/conf`)                              | `db`           | every db channel looks into this flat conf tree     |
| `channels/db/data` (symlink → `share/data`)                              | `db`/`game`   | mob/pc/dungeon/spawn data                           |
| `channels/db/locale` (symlink → `share/locale`)                          | all           | locale assets                                       |
| `channels/auth/CONFIG`                                                   | `game_auth`   | `HOSTNAME: auth`, `CHANNEL: 1`, `PORT: 11000`, `P2P_PORT: 12000`, `AUTH_SERVER: master` |
| `channels/channel1/core1/CONFIG`                                         | core1         | `HOSTNAME: channel1_1`, `CHANNEL: 1`, `PORT: 11011`, `P2P_PORT: 12011`, `MAP_ALLOW: 1 4 5 6 3 23 43 112 107 67 68 72 208 302 304` |
| `channels/channel1/core2/CONFIG`                                         | core2         | `PORT: 11012`, `P2P_PORT: 12012`                    |
| `channels/channel1/core3/CONFIG`                                         | core3         | `PORT: 11013`, `P2P_PORT: 12013`                    |
| `channels/channel99/core1/CONFIG`                                        | ch99 core1    | `HOSTNAME: channel99_1`, `CHANNEL: 99`, `PORT: 11991`, `P2P_PORT: 12991`, `MAP_ALLOW: 113 81 100 101 103 105 110 111 114 118 119 120 121 122 123 124 125 126 127 128 181 182 183 200` |
| `/etc/metin/metin.env`                                                   | all systemd units via `EnvironmentFile=-` | host-local secrets/overrides, root:root mode 600. Contents not readable during this audit. |

Flat `share/conf/db.txt` (verbatim, with bootstrap secrets):

```
WELCOME_MSG  = "Database connector is running..."
SQL_ACCOUNT  = "127.0.0.1 account bootstrap change-me 0"
SQL_PLAYER   = "127.0.0.1 player bootstrap change-me 0"
SQL_COMMON   = "127.0.0.1 common bootstrap change-me 0"
SQL_HOTBACKUP= "127.0.0.1 hotbackup bootstrap change-me 0"
TABLE_POSTFIX = ""
BIND_PORT               = 9000
CLIENT_HEART_FPS        = 60
HASH_PLAYER_LIFE_SEC    = 600
BACKUP_LIMIT_SEC        = 3600
PLAYER_ID_START         = 100
PLAYER_DELETE_LEVEL_LIMIT = 70
PLAYER_DELETE_CHECK_SIMPLE = 1
ITEM_ID_RANGE           = 2000000000 2100000000
MIN_LENGTH_OF_SOCIAL_ID = 6
SIMPLE_SOCIALID         = 1
```

The `bootstrap` / `change-me` values are git-tracked placeholders.
`config-and-secrets.md` explicitly says these are templates, and real values
are expected to come from `/etc/metin/metin.env`. This works because the
server source re-reads credentials from the environment when injected; verify
by grepping `m2dev-server-src` for the SQL env var names used by `db`/`game`.
(**Open question**: confirm which env var names override the in-file creds;
the audit session couldn't read `metin.env` directly.)

## Database

- Engine: **MariaDB 11.8.6** (`mariadb --version`).
- PID: 103624, listening on `127.0.0.1:3306` only. No external TCP
  exposure, no unix socket checked (likely `/run/mysqld/mysqld.sock`).
- Expected databases from `docs/database-bootstrap.md`: `account`, `player`,
  `common`, `log`, `hotbackup`.
- Stack-side DB user: `bootstrap` (placeholder in git, real password in
  `/etc/metin/metin.env`).
- Could not enumerate actual tables during the audit — both `mysql -uroot`
  and `sudo -u mt2.jakubkadlec.dev mariadb` failed (Access denied), since
  root uses unix-socket auth for `root@localhost` and the runtime user has
  no CLI credentials outside the systemd environment.
- **To inspect the DB read-only:** either run as root with
  `sudo mariadb` (unix socket auth — needs confirmation it's enabled), or
  open `/etc/metin/metin.env` as root, grab the `bootstrap` password, then
  `mariadb -ubootstrap -p account` etc. Do not attempt writes.

## Logging

Every m2 process writes two files in its channel dir, via fd 3 / fd 4:

- `syslog.log` — verbose info stream (rotated by date in some dirs:
  `channel1/core1/log/syslog_2026-04-13.log`).
- `syserr.log` — error stream. Look here first on crash.

The `db` channel additionally writes to `syslog.log` (36 MB today, rotating
appears to be manual — there is a `log/` dir with a daily file but the
current `syslog.log` is at the top level) and drops `core.<pid>` ELF cores
into the channel dir on SIGSEGV/SIGABRT because `LimitCORE=infinity` is set.

systemd journal captures stdout/stderr as well, so `journalctl -u metin-db
--since '1 hour ago'` is the fastest way to see startup banners and
`systemd`-observed restarts. Example from this audit:

```
Apr 14 13:26:40 vmi3229987 db[1788997]: Real Server
Apr 14 13:26:40 vmi3229987 db[1788997]: Success ACCOUNT
Apr 14 13:26:40 vmi3229987 db[1788997]: Success COMMON
Apr 14 13:26:40 vmi3229987 db[1788997]: Success HOTBACKUP
Apr 14 13:26:40 vmi3229987 db[1788997]: mysql_real_connect: Lost connection
     to server at 'sending authentication information', system error: 104
```

Every `db` start it opens *more than a dozen* AsyncSQL pools ("AsyncSQL:
connected to 127.0.0.1 (reconnect 1)" repeated ~12 times), suggesting a large
per-instance pool size. Worth checking if that needs tuning.

The current `syserr.log` in `channels/db/` is dominated by:

```
[error] [int CPeerBase::Recv()()] socket_read failed Connection reset by peer
[error] [int CClientManager::Process()()] Recv failed
```

which is the peer disconnect path. Since no auth/game peers should be
connecting right now, this is either a leftover from an earlier start or
something else (maybe a healthcheck probe) is touching 9000 and aborting.
See open questions.

## Ports

Live `ss -tlnp` on the VPS (m2-relevant lines only):

| L3:L4            | Who          | Exposure       |
| ---------------- | ------------ | -------------- |
| `0.0.0.0:9000`   | `db`         | **INADDR_ANY** — listens on all interfaces. Look at this. |
| `127.0.0.1:3306` | `mariadbd`   | localhost only |

Not currently listening (would be if auth/game were up):

- `11000` / `12000` — auth client + p2p
- `11011..11013` / `12011..12013` — channel1 cores + p2p
- `11991` / `12991` — channel99 core1 + p2p

Other listeners on the host (not m2): `:22`, `:2222` (gitea ssh), `:25`
(postfix loopback), `:80/:443` (Caddy), `:3000` (Gitea), `:2019` (Caddy
admin), `:33891` (unknown loopback), `:5355` / `:53` (resolver).

**Firewalling note:** `db` binding to `0.0.0.0:9000` is a concern. In the
normal m2 architecture, `db` only talks to auth/game cores on the same host
and should bind to `127.0.0.1` only. Current binding is set by the
`BIND_PORT = 9000` line in `share/conf/db.txt`, which in this server fork
apparently defaults to `INADDR_ANY`. If the Contabo firewall or iptables/nft
rules don't block 9000 from the outside, this is exposed. **Open question:
verify iptables/nftables on the host, or move `db` to `127.0.0.1` explicitly
in source / config.**

## Data directory layout

All under `/home/mt2.jakubkadlec.dev/metin/runtime/server/share/`:

```
share/
├── bin/          ← compiled binaries (only db + game present today)
├── conf/         ← db.txt, game.txt, CMD, item_proto.txt, mob_proto.txt,
│                   item_names_*.txt, mob_names_*.txt (17 locales each)
├── data/         ← DTA/, dungeon/, easterevent/, mob_spawn/, monster/,
│                   pc/, pc2/ (27 MB total)
├── locale/       ← 86 MB, per-locale strings + binary quest outputs
├── mark/
└── package/
```

Per-channel scaffolding under `channels/` symlinks `conf`, `data`, `locale`
back into `share/`, so each channel reads from a single canonical content
tree.

## Disk usage footprint

```
/home/mt2.jakubkadlec.dev/metin/             1.7 G   (total metin workspace)
    runtime/server/share/                    123 M
        runtime/server/share/data/            27 M
        runtime/server/share/locale/          86 M
    runtime/server/channels/                  755 M
        channels/db/core.178508{2,8}        ~194 M   (two 97 MB coredumps)
        channels/db/syslog.log                36 M   (grows fast)
```

Core dumps dominate the channel dir footprint right now. Cleaning up old
`core.*` files is safe when the db is not actively crashing (and only after
Jakub has looked at them).

## How to restart channel1_core2 cleanly

Pre-flight checklist:

1. Confirm `share/bin/channel1_core2` actually exists on disk — right now it
   does **not**, so the instance cannot start. Skip straight to the
   "rebuild / redeploy" section in Jakub's `docs/deploy-workflow.md`
   before trying.
2. Confirm `metin-db.service` and `metin-auth.service` are `active (running)`
   (`systemctl is-active metin-db metin-auth`). If not, fix upstream first —
   a clean restart of core2 requires a healthy auth + db.
3. Check that no player is currently online on that core. With `usage.txt`
   at 0/0 this is trivially true today, but in prod do
   `cat channels/channel1/core2/usage.txt` first.
4. Look at recent logs so you have a baseline:
   `journalctl -u metin-game@channel1_core2 -n 50 --no-pager`

Clean restart:

```bash
# on the VPS as root or with sudo
systemctl restart metin-game@channel1_core2.service
systemctl status  metin-game@channel1_core2.service --no-pager
journalctl -u metin-game@channel1_core2.service -n 100 --no-pager -f
```

Because the unit is `Type=simple` with `Restart=on-failure`, `systemctl
restart` sends SIGTERM, waits up to `TimeoutStopSec=60`, then brings the
process back up. The binary's own `hupsig()` handler logs the SIGTERM into
`syserr.log` and shuts down gracefully.

Post-restart verification:

```bash
ss -tlnp | grep -E ':(11012|12012)\b'       # expect both ports listening
tail -n 30 /home/mt2.jakubkadlec.dev/metin/runtime/server/channels/channel1/core2/syserr.log
```

If the process refuses to stay up (`Restart=on-failure` loops it), **do not**
just bump `RestartSec`; grab the last 200 journal lines and the last 200
syserr lines and open an issue in `metin-server/m2dev-server-src` against
Jakub. Do not edit the unit file ad-hoc on the host.

## Open questions

These are things the audit could not determine without making changes or
getting more access. They need a human operator to resolve.

1. **Who produces the per-instance binaries** (`channel1_core1`,
   `channel1_core2`, `channel1_core3`, `channel99_core1`, `game_auth`)?
   The deploy flow expects them in `share/bin/` and channel dirs but they
   are missing. Is this still hand-built, or is there a make target that
   hardlinks `share/bin/game` into each `channel*/core*/<instance>` name?
2. **Why is `db` currently flapping** (`deactivating (stop-sigterm)` in
   systemctl, plus two fresh core dumps on 2026-04-14 13:24/13:25 and
   dozens of `CPeerBase::Recv()` errors)? Nothing should be connecting to
   port 9000 right now.
3. **What the real `metin.env` contains** — specifically, the actual
   `bootstrap` DB password, and whether there is a separate admin-page
   password override. Audit did not touch `/etc/metin/metin.env`.
4. **Exact override-variable contract** between `share/conf/db.txt`
   placeholders and the env file. We need to verify which env var names
   the `db`/`game` source actually reads so we know whether the
   `change-me` literal is ever used at runtime.
5. **Is `db` intended to bind `0.0.0.0:9000`?** From a defense-in-depth
   standpoint it should be `127.0.0.1`. Needs either a source fix or a
   host firewall rule. Check current nftables state.
6. **`VERSION.txt` says `db revision: b2b037f-dirty`.** Which tree was this
   built from and why "dirty"? Point back at the `m2dev-server-src`
   commit and confirm the build artefact is reproducible.
7. **Log rotation**: `channels/db/syslog.log` is already 36 MB today with
   nothing connected. There is a `channels/channel1/core1/log/` dated
   subdir convention that suggests daily rotation, but `db`'s own syslog
   is not rotating. Confirm whether `logrotate` or an in-process rotator
   is expected to own this.
8. **Hourly heartbeat in `usage.txt`** comes from where? Every ~1 h a row
   is appended — this is probably the `db` backup tick, but confirm it's
   not some cron job.
9. **`mysqld`'s live databases**: could not enumerate table names without
   credentials. `docs/database-bootstrap.md` lists the expected set;
   someone with `metin.env` access should confirm `account`, `player`,
   `common`, `log`, `hotbackup` are all present and populated.
10. **Stale README**: top-level `README.md` still documents FreeBSD +
    `start.py`. Not urgent, but worth a `docs:` sweep to point readers at
    `docs/debian-runtime.md` as the canonical layout.