Testing
This commit is contained in:
85
README.md
85
README.md
@@ -3,30 +3,29 @@
|
||||
CM Dashboard is a Rust-powered terminal UI for real-time monitoring of CMTEC infrastructure hosts. It subscribes to the CMTEC ZMQ gossip network where lightweight agents publish SMART, service, and backup metrics, and presents them in an efficient, keyboard-driven interface built with `ratatui`.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ CM Dashboard │
|
||||
├────────────────────────────┬────────────────────────────┬────────────────────┤
|
||||
│ NVMe Health │ Services │ CPU / Memory │
|
||||
│ Host: srv01 │ Host: srv01 │ Host: srv01 │
|
||||
│ Status: Healthy │ Service memory: 1.2G/4.0G │ RAM: 6.9 / 7.8 GiB │
|
||||
│ Healthy/Warning/Critical: │ Disk usage: 45 / 500 GiB │ CPU load (1/5/15): │
|
||||
│ 4 / 0 / 0 │ Services tracked: 8 │ 1.2 0.9 0.7 │
|
||||
│ Capacity used: 512 / 2048G │ │ CPU temp: 68°C │
|
||||
│ Issue: — │ nginx running 320M │ GPU temp: — │
|
||||
│ │ immich running 1.2G │ Status • ok │
|
||||
│ │ backup-api running 40M │ │
|
||||
├────────────────────────────┴────────────┬───────────────┴────────────────────┤
|
||||
│ Backups │ Alerts │
|
||||
│ Host: srv01 │ srv01: ok │
|
||||
│ Overall: Healthy │ labbox: warning: RAM 82% │
|
||||
│ Last success: 2024-02-01 03:12:45 │ cmbox: critical: CPU temp 92°C │
|
||||
│ Snapshots: 17 • Size: 512.0 GiB │ Update: 2024-02-01 10:15:32 │
|
||||
│ Pending jobs: 0 (enabled: true) │ │
|
||||
└──────────────────────────────┬───────────────────────────────────────────────┘
|
||||
│ Status │ │
|
||||
│ Active host: srv01 (1/3) │ History retention ≈ 3600s │
|
||||
│ Config: config/dashboard.toml│ Default host: labbox │
|
||||
└──────────────────────────────┴───────────────────────────────────────────────┘
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ CM Dashboard • cmbox │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ Storage • ok:1 warn:0 crit:0 │ Services • ok:1 warn:0 fail:0 │
|
||||
│ ┌─────────────────────────────────┐ │ ┌─────────────────────────────── │ │
|
||||
│ │Drive Temp Wear Spare Hours │ │ │Service memory: 7.1/23899.7 MiB│ │
|
||||
│ │nvme0n1 28°C 1% 100% 14489 │ │ │Disk usage: — │ │
|
||||
│ │ Capacity Usage │ │ │ Service Memory Disk │ │
|
||||
│ │ 954G 77G (8%) │ │ │✔ sshd 7.1 MiB — │ │
|
||||
│ └─────────────────────────────────┘ │ └─────────────────────────────── │ │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ CPU / Memory • warn │ Backups │
|
||||
│ System memory: 5251.7/23899.7 MiB │ Host cmbox awaiting backup │ │
|
||||
│ CPU load (1/5/15): 2.18 2.66 2.56 │ metrics │ │
|
||||
│ CPU freq: 1100.1 MHz │ │ │
|
||||
│ CPU temp: 47.0°C │ │ │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ Alerts • ok:0 warn:3 fail:0 │ Status • ZMQ connected │
|
||||
│ cmbox: warning: CPU load 2.18 │ Monitoring • hosts: 3 │ │
|
||||
│ srv01: pending: awaiting metrics │ Data source: ZMQ – connected │ │
|
||||
│ labbox: pending: awaiting metrics │ Active host: cmbox (1/3) │ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
Keys: [←→] hosts [r]efresh [q]uit
|
||||
```
|
||||
|
||||
## Requirements
|
||||
@@ -100,12 +99,15 @@ Adjust the host list and `data_source.zmq.endpoints` to match your CMTEC gossip
|
||||
|
||||
## Features
|
||||
|
||||
- Rotating host selection with left/right arrows (`←`, `→`, `h`, `l`, `Tab`)
|
||||
- Live NVMe, service, CPU/memory, backup, and alert panels per host
|
||||
- Health scoring that rolls CPU/RAM/GPU pressure into alerts automatically
|
||||
- Structured logging with `tracing` (`-v`/`-vv` to increase verbosity)
|
||||
- Help overlay (`?`) outlining keyboard shortcuts
|
||||
- Config-driven host discovery via `config/dashboard.toml`
|
||||
- **Real-time monitoring** with ZMQ gossip network architecture
|
||||
- **Storage health** with drive capacity, usage, temperature, and wear tracking
|
||||
- **Per-service resource tracking** including memory and disk usage by service
|
||||
- **CPU/Memory monitoring** with load averages, temperature, and GPU metrics
|
||||
- **Alert system** with color-coded highlighting and threshold-based warnings
|
||||
- **Multi-host support** with seamless host switching (`←`, `→`, `h`, `l`, `Tab`)
|
||||
- **Backup status** monitoring with restic integration
|
||||
- **Keyboard-driven interface** with help overlay (`?`)
|
||||
- **Configuration management** via TOML files for hosts and dashboard settings
|
||||
|
||||
## Getting Started
|
||||
|
||||
@@ -131,13 +133,30 @@ cargo run -p cm-dashboard -- -v
|
||||
|
||||
## Agent
|
||||
|
||||
The metrics agent publishes SMART/service/backup data to the gossip network. Run it on each host (or under systemd/NixOS) and point the dashboard at its endpoint. Example:
|
||||
The metrics agent runs on each host and publishes SMART, service, and backup data to the ZMQ gossip network. The agent auto-detects system configuration and requires root privileges for hardware monitoring.
|
||||
|
||||
```bash
|
||||
cargo run -p cm-dashboard-agent -- --hostname srv01 --bind tcp://*:6130 --interval-ms 5000
|
||||
# Run agent with auto-detection
|
||||
sudo cargo run -p cm-dashboard-agent
|
||||
|
||||
# Run with specific configuration
|
||||
sudo cargo run -p cm-dashboard-agent -- --config config/agent.toml
|
||||
|
||||
# Manual configuration
|
||||
sudo cargo run -p cm-dashboard-agent -- \
|
||||
--hostname srv01 \
|
||||
--bind tcp://*:6130 \
|
||||
--smart-devices nvme0n1,sda \
|
||||
--services nginx,postgres
|
||||
```
|
||||
|
||||
Use `--disable-*` flags to skip collectors when a host doesn’t expose those metrics.
|
||||
The agent automatically:
|
||||
- Detects available storage devices for SMART monitoring
|
||||
- Discovers running systemd services for resource tracking
|
||||
- Configures appropriate collection intervals per host type
|
||||
- Requires root access for `smartctl` and system metrics
|
||||
|
||||
Use `--disable-smart`, `--disable-service`, or `--disable-backup` to skip specific collectors.
|
||||
|
||||
## Development
|
||||
|
||||
|
||||
Reference in New Issue
Block a user