Testing

2025-10-12 14:53:27 +02:00
parent 2581435b10
commit 2239badc8a
16 changed files with 1116 additions and 1414 deletions
--- a/README.md
+++ b/README.md
@@ -3,30 +3,29 @@
 CM Dashboard is a Rust-powered terminal UI for real-time monitoring of CMTEC infrastructure hosts. It subscribes to the CMTEC ZMQ gossip network where lightweight agents publish SMART, service, and backup metrics, and presents them in an efficient, keyboard-driven interface built with `ratatui`.

 ```
-┌──────────────────────────────────────────────────────────────────────────────┐
-│ CM Dashboard                                                                │
-├────────────────────────────┬────────────────────────────┬────────────────────┤
-│ NVMe Health                │ Services                   │ CPU / Memory        │
-│ Host: srv01                │ Host: srv01                │ Host: srv01         │
-│ Status: Healthy            │ Service memory: 1.2G/4.0G  │ RAM: 6.9 / 7.8 GiB   │
-│ Healthy/Warning/Critical:  │ Disk usage: 45 / 500 GiB   │ CPU load (1/5/15):   │
-│ 4 / 0 / 0                  │ Services tracked: 8        │ 1.2  0.9  0.7        │
-│ Capacity used: 512 / 2048G │                            │ CPU temp: 68°C       │
-│ Issue: —                   │ nginx      running 320M    │ GPU temp: —          │
-│                            │ immich     running 1.2G    │ Status • ok          │
-│                            │ backup-api running  40M    │                      │
-├────────────────────────────┴────────────┬───────────────┴────────────────────┤
-│ Backups                                  │ Alerts                          │
-│ Host: srv01                              │ srv01: ok                       │
-│ Overall: Healthy                         │ labbox: warning: RAM 82%        │
-│ Last success: 2024-02-01 03:12:45        │ cmbox: critical: CPU temp 92°C  │
-│ Snapshots: 17 • Size: 512.0 GiB          │ Update: 2024-02-01 10:15:32     │
-│ Pending jobs: 0 (enabled: true)          │                                  │
-└──────────────────────────────┬───────────────────────────────────────────────┘
-│ Status                       │                                                │
-│ Active host: srv01 (1/3)     │ History retention ≈ 3600s                      │
-│ Config: config/dashboard.toml│ Default host: labbox                           │
-└──────────────────────────────┴───────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────────┐
+│ CM Dashboard • cmbox                                                 │
+├─────────────────────────────────────────────────────────────────────┤
+│ Storage • ok:1 warn:0 crit:0       │ Services • ok:1 warn:0 fail:0   │
+│ ┌─────────────────────────────────┐ │ ┌─────────────────────────────── │ │
+│ │Drive    Temp  Wear Spare Hours │ │ │Service memory: 7.1/23899.7 MiB│ │
+│ │nvme0n1  28°C  1%   100%  14489 │ │ │Disk usage: —                  │ │
+│ │         Capacity Usage          │ │ │  Service  Memory     Disk      │ │
+│ │         954G     77G (8%)       │ │ │✔ sshd     7.1 MiB   —          │ │
+│ └─────────────────────────────────┘ │ └─────────────────────────────── │ │
+├─────────────────────────────────────────────────────────────────────┤
+│ CPU / Memory • warn                 │ Backups                         │
+│ System memory: 5251.7/23899.7 MiB  │ Host cmbox awaiting backup      │ │
+│ CPU load (1/5/15): 2.18 2.66 2.56  │ metrics                         │ │
+│ CPU freq: 1100.1 MHz               │                                 │ │
+│ CPU temp: 47.0°C                    │                                 │ │
+├─────────────────────────────────────────────────────────────────────┤
+│ Alerts • ok:0 warn:3 fail:0        │ Status • ZMQ connected          │
+│ cmbox: warning: CPU load 2.18      │ Monitoring • hosts: 3           │ │
+│ srv01: pending: awaiting metrics    │ Data source: ZMQ – connected    │ │
+│ labbox: pending: awaiting metrics   │ Active host: cmbox (1/3)        │ │
+└─────────────────────────────────────────────────────────────────────┘
+Keys: [←→] hosts [r]efresh [q]uit
 ```

 ## Requirements
@@ -100,12 +99,15 @@ Adjust the host list and `data_source.zmq.endpoints` to match your CMTEC gossip

 ## Features

- Rotating host selection with left/right arrows (`←`, `→`, `h`, `l`, `Tab`)
- Live NVMe, service, CPU/memory, backup, and alert panels per host
- Health scoring that rolls CPU/RAM/GPU pressure into alerts automatically
- Structured logging with `tracing` (`-v`/`-vv` to increase verbosity)
- Help overlay (`?`) outlining keyboard shortcuts
- Config-driven host discovery via `config/dashboard.toml`
+- **Real-time monitoring** with ZMQ gossip network architecture
+- **Storage health** with drive capacity, usage, temperature, and wear tracking
+- **Per-service resource tracking** including memory and disk usage by service
+- **CPU/Memory monitoring** with load averages, temperature, and GPU metrics
+- **Alert system** with color-coded highlighting and threshold-based warnings
+- **Multi-host support** with seamless host switching (`←`, `→`, `h`, `l`, `Tab`)
+- **Backup status** monitoring with restic integration
+- **Keyboard-driven interface** with help overlay (`?`)
+- **Configuration management** via TOML files for hosts and dashboard settings

 ## Getting Started

@@ -131,13 +133,30 @@ cargo run -p cm-dashboard -- -v

 ## Agent

-The metrics agent publishes SMART/service/backup data to the gossip network. Run it on each host (or under systemd/NixOS) and point the dashboard at its endpoint. Example:
+The metrics agent runs on each host and publishes SMART, service, and backup data to the ZMQ gossip network. The agent auto-detects system configuration and requires root privileges for hardware monitoring.

 ```bash
-cargo run -p cm-dashboard-agent -- --hostname srv01 --bind tcp://*:6130 --interval-ms 5000
+# Run agent with auto-detection
+sudo cargo run -p cm-dashboard-agent
+
+# Run with specific configuration
+sudo cargo run -p cm-dashboard-agent -- --config config/agent.toml
+
+# Manual configuration
+sudo cargo run -p cm-dashboard-agent -- \
+    --hostname srv01 \
+    --bind tcp://*:6130 \
+    --smart-devices nvme0n1,sda \
+    --services nginx,postgres
 ```

-Use `--disable-*` flags to skip collectors when a host doesn’t expose those metrics.
+The agent automatically:
+- Detects available storage devices for SMART monitoring
+- Discovers running systemd services for resource tracking
+- Configures appropriate collection intervals per host type
+- Requires root access for `smartctl` and system metrics
+
+Use `--disable-smart`, `--disable-service`, or `--disable-backup` to skip specific collectors.

 ## Development