# CM Dashboard - Infrastructure Monitoring TUI A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for specific monitoring needs and API integrations. Features real-time monitoring of all infrastructure components with intelligent email notifications and automatic status calculation. ### System Widget ``` ┌System───────────────────────────────────────────────────────┐ │ Memory usage │ │✔ 3.0 / 7.8 GB │ │ CPU load CPU temp │ │✔ 1.05 • 0.96 • 0.58 64.0°C │ │ C1E C3 C6 C8 C9 C10 │ │✔ 0.5% 0.5% 10.4% 10.2% 0.4% 77.9% │ │ GPU load GPU temp │ │✔ — — │ └─────────────────────────────────────────────────────────────┘ ``` ### Services Widget (Enhanced) ``` ┌Services────────────────────────────────────────────────────┐ │ Service Memory (GB) CPU Disk │ │✔ Service Memory 7.1/23899.7 MiB — │ │✔ Disk Usage — — 45/100 GB │ │⚠ CPU Load — 2.18 — │ │✔ CPU Temperature — 47.0°C — │ │✔ docker-registry 0.0 GB 0.0% <1 MB │ │✔ gitea 0.4/4.1 GB 0.2% 970 MB │ │ 1 active connections │ │✔ nginx 0.0/1.0 GB 0.0% <1 MB │ │✔ ├─ docker.cmtec.se │ │✔ ├─ git.cmtec.se │ │✔ ├─ gitea.cmtec.se │ │✔ ├─ haasp.cmtec.se │ │✔ ├─ pages.cmtec.se │ │✔ └─ www.kryddorten.se │ │✔ postgresql 0.1 GB 0.0% 378 MB │ │ 1 active connections │ │✔ redis-immich 0.0 GB 0.4% <1 MB │ │✔ sshd 0.0 GB 0.0% <1 MB │ │ 1 SSH connection │ │✔ unifi 0.9/2.0 GB 0.4% 391 MB │ └────────────────────────────────────────────────────────────┘ ``` ### Storage Widget ``` ┌Storage──────────────────────────────────────────────────────┐ │ Drive Temp Wear Spare Hours Capacity Usage │ │✔ nvme0n1 57°C 4% 100% 11463 932G 23G (2%) │ │ │ └─────────────────────────────────────────────────────────────┘ ``` ### Backups Widget ``` ┌Backups──────────────────────────────────────────────────────┐ │ Backup Status Details │ │✔ Latest 3h ago 1.4 GiB │ │ 8 archives, 2.4 GiB total │ │✔ Disk ok 2.4/468 GB (1%) │ └─────────────────────────────────────────────────────────────┘ ``` ### Hosts Widget ``` ┌Hosts────────────────────────────────────────────────────────┐ │ Host Status Timestamp │ │✔ cmbox ok 2025-10-13 05:45:28 │ │✔ srv01 ok 2025-10-13 05:45:28 │ │? labbox No data received — │ └─────────────────────────────────────────────────────────────┘ ``` **Navigation**: `←→` hosts, `r` refresh, `q` quit ## Key Features ### Real-time Monitoring - **Multi-host support** for cmbox, labbox, simonbox, steambox, srv01 - **Performance-focused** with minimal resource usage - **Keyboard-driven interface** for power users - **ZMQ gossip network** for efficient data distribution ### Infrastructure Monitoring - **NVMe health monitoring** with wear prediction and temperature tracking - **CPU/Memory/GPU telemetry** with automatic thresholding - **Service resource monitoring** with per-service CPU and RAM usage - **Disk usage overview** for root filesystems - **Backup status** with detailed metrics and history - **C-state monitoring** for CPU power management analysis ### Intelligent Alerting - **Agent-calculated status** with predefined thresholds - **Email notifications** via SMTP with rate limiting - **Recovery notifications** with context about original issues - **Stockholm timezone** support for email timestamps - **Unified alert pipeline** summarizing host health ## Architecture ### Agent-Dashboard Separation The system follows a strict separation of concerns: - **Agent**: Single source of truth for all status calculations using defined thresholds - **Dashboard**: Display-only interface that shows agent-provided status - **Data Flow**: Agent (calculations) → Status → Dashboard (display) → Colors ### Agent Thresholds (Production) - **CPU Load**: Warning ≥ 5.0, Critical ≥ 8.0 - **Memory Usage**: Warning ≥ 80%, Critical ≥ 95% - **CPU Temperature**: Warning ≥ 100°C, Critical ≥ 100°C (effectively disabled) ### Email Notification System - **From**: `{hostname}@cmtec.se` (e.g., cmbox@cmtec.se) - **To**: `cm@cmtec.se` - **SMTP**: localhost:25 (postfix) - **Rate Limiting**: 30 minutes (configurable) - **Triggers**: Status degradation and recovery with detailed context ## Installation ### Requirements - Rust toolchain 1.75+ (install via [`rustup`](https://rustup.rs)) - Root privileges for agent (hardware monitoring access) - Network access for ZMQ communication (default port 6130) - SMTP server for notifications (postfix recommended) ### Build from Source ```bash git clone https://github.com/cmtec/cm-dashboard.git cd cm-dashboard cargo build --release ``` Optimized binaries available at: - Dashboard: `target/release/cm-dashboard` - Agent: `target/release/cm-dashboard-agent` ### Installation ```bash # Install dashboard cargo install --path dashboard # Install agent (requires root for hardware access) sudo cargo install --path agent ``` ## Quick Start ### Dashboard ```bash # Run with default configuration cm-dashboard # Specify host to monitor cm-dashboard --host cmbox # Override ZMQ endpoints cm-dashboard --zmq-endpoint tcp://srv01:6130,tcp://labbox:6130 # Increase logging verbosity cm-dashboard -v ``` ### Agent (Pure Auto-Discovery) The agent requires **no configuration files** and auto-discovers all system components: ```bash # Basic agent startup (auto-detects everything) sudo cm-dashboard-agent # With verbose logging for troubleshooting sudo cm-dashboard-agent -v ``` The agent automatically: - **Discovers storage devices** for SMART monitoring - **Detects running systemd services** for resource tracking - **Configures collection intervals** based on system capabilities - **Sets up email notifications** using hostname@cmtec.se ## Configuration ### Dashboard Configuration The dashboard creates `config/dashboard.toml` on first run: ```toml [hosts] default_host = "srv01" [[hosts.hosts]] name = "srv01" enabled = true [[hosts.hosts]] name = "cmbox" enabled = true [dashboard] tick_rate_ms = 250 history_duration_minutes = 60 [data_source] kind = "zmq" [data_source.zmq] endpoints = ["tcp://127.0.0.1:6130"] ``` ### Agent Configuration (Optional) The agent works without configuration but supports optional settings: ```bash # Generate example configuration cm-dashboard-agent --help # Override specific settings sudo cm-dashboard-agent \ --hostname cmbox \ --bind tcp://*:6130 \ --interval 5000 ``` ## Widget Layout ### Services Widget Structure The Services widget now displays both system metrics and services in a unified table: ``` ┌Services────────────────────────────────────────────────────┐ │ Service Memory (GB) CPU Disk │ │✔ Service Memory 7.1/23899.7 MiB — │ ← System metric as service row │✔ Disk Usage — — 45/100 GB │ ← System metric as service row │⚠ CPU Load — 2.18 — │ ← System metric as service row │✔ CPU Temperature — 47.0°C — │ ← System metric as service row │✔ docker-registry 0.0 GB 0.0% <1 MB │ ← Regular service │✔ nginx 0.0/1.0 GB 0.0% <1 MB │ ← Regular service │✔ ├─ docker.cmtec.se │ ← Nginx site (sub-service) │✔ ├─ git.cmtec.se │ ← Nginx site (sub-service) │✔ └─ gitea.cmtec.se │ ← Nginx site (sub-service) │✔ sshd 0.0 GB 0.0% <1 MB │ ← Regular service │ 1 SSH connection │ ← Service description └────────────────────────────────────────────────────────────┘ ``` **Row Types:** - **System Metrics**: CPU Load, Service Memory, Disk Usage, CPU Temperature with status indicators - **Regular Services**: Full resource data (memory, CPU, disk) with optional description lines - **Sub-services**: Nginx sites with tree structure, status indicators only (no resource columns) - **Description Lines**: Connection counts and service-specific info without status indicators ### Hosts Widget (formerly Alerts) The Hosts widget provides a summary view of all monitored hosts: ``` ┌Hosts────────────────────────────────────────────────────────┐ │ Host Status Timestamp │ │✔ cmbox ok 2025-10-13 05:45:28 │ │✔ srv01 ok 2025-10-13 05:45:28 │ │? labbox No data received — │ └─────────────────────────────────────────────────────────────┘ ``` ## Monitoring Components ### System Collector - **CPU Load**: 1/5/15 minute averages with warning/critical thresholds - **Memory Usage**: Used/total with percentage calculation - **CPU Temperature**: x86_pkg_temp prioritized for accuracy - **C-States**: Power management state distribution (C0-C10) ### Service Collector - **System Metrics as Services**: CPU Load, Service Memory, Disk Usage, CPU Temperature displayed as individual service rows - **Systemd Services**: Auto-discovery of interesting services with resource monitoring - **Nginx Site Monitoring**: Individual rows for each nginx virtual host with tree structure (`├─` and `└─`) - **Resource Usage**: Per-service memory, CPU, and disk consumption - **Service Health**: Running/stopped/degraded status with detailed failure info - **Connection Tracking**: SSH connections, database connections as description lines ### SMART Collector - **NVMe Health**: Temperature, wear leveling, spare blocks - **Drive Capacity**: Total/used space with percentage - **SMART Attributes**: Critical health indicators ### Backup Collector - **Restic Integration**: Backup status and history - **Health Monitoring**: Success/failure tracking - **Storage Metrics**: Backup size and retention ## Keyboard Controls | Key | Action | |-----|--------| | `←` / `h` | Previous host | | `→` / `l` / `Tab` | Next host | | `?` | Toggle help overlay | | `r` | Force refresh | | `q` / `Esc` | Quit | ## Email Notifications ### Notification Triggers - **Status Degradation**: Any status change to warning/critical - **Recovery**: Warning/critical status returning to ok - **Service Failures**: Individual service stop/start events ### Example Recovery Email ``` ✅ RESOLVED: system cpu on cmbox Status Change Alert Host: cmbox Component: system Metric: cpu Status Change: warning → ok Time: 2025-10-12 22:15:30 CET Details: Recovered from: CPU load (1/5/15min): 6.20 / 5.80 / 4.50 Current status: CPU load (1/5/15min): 3.30 / 3.17 / 2.84 -- CM Dashboard Agent Generated at 2025-10-12 22:15:30 CET ``` ### Rate Limiting - **Default**: 30 minutes between notifications per component - **Testing**: Set to 0 for immediate notifications - **Configurable**: Adjustable per deployment needs ## Development ### Project Structure ``` cm-dashboard/ ├── agent/ # Monitoring agent │ ├── src/ │ │ ├── collectors/ # Data collection modules │ │ ├── notifications.rs # Email notification system │ │ └── simple_agent.rs # Main agent logic ├── dashboard/ # TUI dashboard │ ├── src/ │ │ ├── ui/ # Widget implementations │ │ ├── data/ # Data structures │ │ └── app.rs # Application state ├── shared/ # Common data structures └── config/ # Configuration files ``` ### Development Commands ```bash # Format code cargo fmt # Check all packages cargo check # Run tests cargo test # Build release cargo build --release # Run with logging RUST_LOG=debug cargo run -p cm-dashboard-agent ``` ### Architecture Principles #### Status Calculation Rules - **Agent calculates all status** using predefined thresholds - **Dashboard never calculates status** - only displays agent data - **No hardcoded thresholds in dashboard** widgets - **Use "unknown" when agent status missing** (never default to "ok") #### Data Flow ``` System Metrics → Agent Collectors → Status Calculation → ZMQ → Dashboard → Display ↓ Email Notifications ``` #### Pure Auto-Discovery - **No config files required** for basic operation - **Runtime discovery** of system capabilities - **Service auto-detection** via systemd patterns - **Storage device enumeration** via /sys filesystem ## Troubleshooting ### Common Issues #### Agent Won't Start ```bash # Check permissions (agent requires root) sudo cm-dashboard-agent -v # Verify ZMQ binding sudo netstat -tulpn | grep 6130 # Check system access sudo smartctl --scan ``` #### Dashboard Connection Issues ```bash # Test ZMQ connectivity cm-dashboard --zmq-endpoint tcp://target-host:6130 -v # Check network connectivity telnet target-host 6130 ``` #### Email Notifications Not Working ```bash # Check postfix status sudo systemctl status postfix # Test SMTP manually telnet localhost 25 # Verify notification settings sudo cm-dashboard-agent -v | grep notification ``` ### Logging Set `RUST_LOG=debug` for detailed logging: ```bash RUST_LOG=debug sudo cm-dashboard-agent RUST_LOG=debug cm-dashboard ``` ## License MIT License - see LICENSE file for details. ## Contributing 1. Fork the repository 2. Create feature branch (`git checkout -b feature/amazing-feature`) 3. Commit changes (`git commit -m 'Add amazing feature'`) 4. Push to branch (`git push origin feature/amazing-feature`) 5. Open Pull Request For bugs and feature requests, please use GitHub Issues. ## NixOS Integration ### Updating cm-dashboard in NixOS Configuration When new code is pushed to the cm-dashboard repository, follow these steps to update the NixOS configuration: #### 1. Get the Latest Commit Hash ```bash # Get the latest commit from the API curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | head -20 # Or use git git log --oneline -1 ``` #### 2. Update the NixOS Configuration Edit `hosts/common/cm-dashboard.nix` and update the `rev` field: ```nix src = pkgs.fetchFromGitea { domain = "gitea.cmtec.se"; owner = "cm"; repo = "cm-dashboard"; rev = "f786d054f2ece80823f85e46933857af96e241b2"; # Update this hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; # Reset temporarily }; ``` #### 3. Get the Correct Hash Build with placeholder hash to get the actual hash: ```bash nix-build --no-out-link -E 'with import {}; fetchFromGitea { domain = "gitea.cmtec.se"; owner = "cm"; repo = "cm-dashboard"; rev = "YOUR_COMMIT_HASH"; hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' 2>&1 | grep "got:" ``` Example output: ``` error: hash mismatch in fixed-output derivation '/nix/store/...': specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= got: sha256-x8crxNusOUYRrkP9mYEOG+Ga3JCPIdJLkEAc5P1ZxdQ= ``` #### 4. Update the Hash Replace the placeholder with the correct hash from the error message (the "got:" line): ```nix hash = "sha256-vjy+j91iDCHUf0RE43anK4WZ+rKcyohP/3SykwZGof8="; # Use actual hash ``` #### 5. Update Cargo Dependencies (if needed) If Cargo.lock has changed, you may need to update `cargoHash`: ```bash # Build to get cargo hash error nix-build --no-out-link --expr 'with import {}; rustPlatform.buildRustPackage rec { pname = "cm-dashboard"; version = "0.1.0"; src = fetchFromGitea { domain = "gitea.cmtec.se"; owner = "cm"; repo = "cm-dashboard"; rev = "YOUR_COMMIT_HASH"; hash = "YOUR_SOURCE_HASH"; }; cargoHash = ""; nativeBuildInputs = [ pkg-config ]; buildInputs = [ openssl ]; buildAndTestSubdir = "."; cargoBuildFlags = [ "--workspace" ]; }' 2>&1 | grep "got:" ``` Then update `cargoHash` in the configuration. #### 6. Commit the Changes ```bash git add hosts/common/cm-dashboard.nix git commit -m "Update cm-dashboard to latest version" git push ``` ### Example Update Process ```bash # 1. Get latest commit LATEST_COMMIT=$(curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | grep '"sha"' | head -1 | cut -d'"' -f4) # 2. Get source hash SOURCE_HASH=$(nix-build --no-out-link -E "with import {}; fetchFromGitea { domain = \"gitea.cmtec.se\"; owner = \"cm\"; repo = \"cm-dashboard\"; rev = \"$LATEST_COMMIT\"; hash = \"sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=\"; }" 2>&1 | grep "got:" | cut -d' ' -f12) # 3. Update configuration and commit echo "Latest commit: $LATEST_COMMIT" echo "Source hash: $SOURCE_HASH" ```