This commit is contained in:
2025-10-12 14:53:27 +02:00
parent 2581435b10
commit 2239badc8a
16 changed files with 1116 additions and 1414 deletions

View File

@@ -3,30 +3,29 @@
CM Dashboard is a Rust-powered terminal UI for real-time monitoring of CMTEC infrastructure hosts. It subscribes to the CMTEC ZMQ gossip network where lightweight agents publish SMART, service, and backup metrics, and presents them in an efficient, keyboard-driven interface built with `ratatui`.
```
┌──────────────────────────────────────────────────────────────────────────────
│ CM Dashboard
├────────────────────────────────────────────────────────┬────────────────────┤
NVMe Health │ Services │ CPU / Memory
Host: srv01 │ Host: srv01 │ Host: srv01
Status: Healthy │ Service memory: 1.2G/4.0G │ RAM: 6.9 / 7.8 GiB
Healthy/Warning/Critical: │ Disk usage: 45 / 500 GiB │ CPU load (1/5/15):
4 / 0 / 0 │ Services tracked: 8 │ 1.2 0.9 0.7
Capacity used: 512 / 2048G │ │ CPU temp: 68°C
Issue: — │ nginx running 320M │ GPU temp: —
│ │ immich running 1.2G │ Status • ok │
backup-api running 40M │
├────────────────────────────┴────────────┬───────────────┴────────────────────┤
Backups │ Alerts │
Host: srv01 │ srv01: ok
Overall: Healthy │ labbox: warning: RAM 82%
│ Last success: 2024-02-01 03:12:45 │ cmbox: critical: CPU temp 92°C │
Snapshots: 17 • Size: 512.0 GiB │ Update: 2024-02-01 10:15:32
Pending jobs: 0 (enabled: true) │
└──────────────────────────────┬───────────────────────────────────────────────┘
Status │
│ Active host: srv01 (1/3) │ History retention ≈ 3600s │
│ Config: config/dashboard.toml│ Default host: labbox │
└──────────────────────────────┴───────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ CM Dashboard • cmbox
├─────────────────────────────────────────────────────────────────────┤
Storage • ok:1 warn:0 crit:0 │ Services • ok:1 warn:0 fail:0
┌─────────────────────────────────┐ │ ┌─────────────────────────────── │
│Drive Temp Wear Spare Hours │Service memory: 7.1/23899.7 MiB│
│nvme0n1 28°C 1% 100% 14489 │ │ │Disk usage: —
│ Capacity Usage │ │ Service Memory Disk │
│ 954G 77G (8%) │ │ │✔ sshd 7.1 MiB —
└─────────────────────────────────┘ │ └─────────────────────────────── │
├─────────────────────────────────────────────────────────────────────┤
CPU / Memory • warnBackups
│ System memory: 5251.7/23899.7 MiB │ Host cmbox awaiting backup │ │
CPU load (1/5/15): 2.18 2.66 2.56 │ metrics
CPU freq: 1100.1 MHz
CPU temp: 47.0°C
├─────────────────────────────────────────────────────────────────────┤
Alerts • ok:0 warn:3 fail:0 │ Status • ZMQ connected
cmbox: warning: CPU load 2.18 │ Monitoring • hosts: 3
│ srv01: pending: awaiting metrics │ Data source: ZMQ connected │ │
labbox: pending: awaiting metrics │ Active host: cmbox (1/3)
└─────────────────────────────────────────────────────────────────────┘
Keys: [←→] hosts [r]efresh [q]uit
```
## Requirements
@@ -100,12 +99,15 @@ Adjust the host list and `data_source.zmq.endpoints` to match your CMTEC gossip
## Features
- Rotating host selection with left/right arrows (`←`, `→`, `h`, `l`, `Tab`)
- Live NVMe, service, CPU/memory, backup, and alert panels per host
- Health scoring that rolls CPU/RAM/GPU pressure into alerts automatically
- Structured logging with `tracing` (`-v`/`-vv` to increase verbosity)
- Help overlay (`?`) outlining keyboard shortcuts
- Config-driven host discovery via `config/dashboard.toml`
- **Real-time monitoring** with ZMQ gossip network architecture
- **Storage health** with drive capacity, usage, temperature, and wear tracking
- **Per-service resource tracking** including memory and disk usage by service
- **CPU/Memory monitoring** with load averages, temperature, and GPU metrics
- **Alert system** with color-coded highlighting and threshold-based warnings
- **Multi-host support** with seamless host switching (`←`, `→`, `h`, `l`, `Tab`)
- **Backup status** monitoring with restic integration
- **Keyboard-driven interface** with help overlay (`?`)
- **Configuration management** via TOML files for hosts and dashboard settings
## Getting Started
@@ -131,13 +133,30 @@ cargo run -p cm-dashboard -- -v
## Agent
The metrics agent publishes SMART/service/backup data to the gossip network. Run it on each host (or under systemd/NixOS) and point the dashboard at its endpoint. Example:
The metrics agent runs on each host and publishes SMART, service, and backup data to the ZMQ gossip network. The agent auto-detects system configuration and requires root privileges for hardware monitoring.
```bash
cargo run -p cm-dashboard-agent -- --hostname srv01 --bind tcp://*:6130 --interval-ms 5000
# Run agent with auto-detection
sudo cargo run -p cm-dashboard-agent
# Run with specific configuration
sudo cargo run -p cm-dashboard-agent -- --config config/agent.toml
# Manual configuration
sudo cargo run -p cm-dashboard-agent -- \
--hostname srv01 \
--bind tcp://*:6130 \
--smart-devices nvme0n1,sda \
--services nginx,postgres
```
Use `--disable-*` flags to skip collectors when a host doesnt expose those metrics.
The agent automatically:
- Detects available storage devices for SMART monitoring
- Discovers running systemd services for resource tracking
- Configures appropriate collection intervals per host type
- Requires root access for `smartctl` and system metrics
Use `--disable-smart`, `--disable-service`, or `--disable-backup` to skip specific collectors.
## Development