# CM Dashboard - Infrastructure Monitoring TUI ## Overview A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture. ## Current Features ### Core Functionality - **Real-time Monitoring**: CPU, RAM, Storage, and Service status - **Service Management**: Start/stop services with user-stopped tracking - **Multi-host Support**: Monitor multiple servers from single dashboard - **NixOS Integration**: System rebuild via SSH + tmux popup - **Backup Monitoring**: Borgbackup status and scheduling ### User-Stopped Service Tracking - Services stopped via dashboard are marked as "user-stopped" - User-stopped services report Status::OK instead of Warning - Prevents false alerts during intentional maintenance - Persistent storage survives agent restarts - Automatic flag clearing when services are restarted via dashboard ### Custom Service Logs - Configure service-specific log file paths per host in dashboard config - Press `L` on any service to view custom log files via `tail -f` - Configuration format in dashboard config: ```toml [service_logs] hostname1 = [ { service_name = "nginx", log_file_path = "/var/log/nginx/access.log" }, { service_name = "app", log_file_path = "/var/log/myapp/app.log" } ] hostname2 = [ { service_name = "database", log_file_path = "/var/log/postgres/postgres.log" } ] ``` ### Service Management - **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services - **Service Actions**: - `s` - Start service (sends UserStart command) - `S` - Stop service (sends UserStop command) - `J` - Show service logs (journalctl in tmux popup) - `L` - Show custom log files (tail -f custom paths in tmux popup) - `R` - Rebuild current host - **Visual Status**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed) - **Transitional Icons**: Blue arrows during operations ### Navigation - **Tab**: Switch between hosts - **↑↓ or j/k**: Select services - **s**: Start selected service (UserStart) - **S**: Stop selected service (UserStop) - **J**: Show service logs (journalctl) - **L**: Show custom log files - **R**: Rebuild current host - **B**: Run backup on current host - **q**: Quit dashboard ## Core Architecture Principles ### Structured Data Architecture (✅ IMPLEMENTED v0.1.131) Complete migration from string-based metrics to structured JSON data. Eliminates all string parsing bugs and provides type-safe data access. **Previous (String Metrics):** - ❌ Agent sent individual metrics with string names like `disk_nvme0n1_temperature` - ❌ Dashboard parsed metric names with underscore counting and string splitting - ❌ Complex and error-prone metric filtering and extraction logic **Current (Structured Data):** ```json { "hostname": "cmbox", "agent_version": "v0.1.131", "timestamp": 1763926877, "system": { "cpu": { "load_1min": 3.5, "load_5min": 3.57, "load_15min": 3.58, "frequency_mhz": 1500, "temperature_celsius": 45.2 }, "memory": { "usage_percent": 25.0, "total_gb": 23.3, "used_gb": 5.9, "swap_total_gb": 10.7, "swap_used_gb": 0.99, "tmpfs": [ { "mount": "/tmp", "usage_percent": 15.0, "used_gb": 0.3, "total_gb": 2.0 } ] }, "storage": { "drives": [ { "name": "nvme0n1", "health": "PASSED", "temperature_celsius": 29.0, "wear_percent": 1.0, "filesystems": [ { "mount": "/", "usage_percent": 24.0, "used_gb": 224.9, "total_gb": 928.2 } ] } ], "pools": [ { "name": "srv_media", "mount": "/srv/media", "type": "mergerfs", "health": "healthy", "usage_percent": 63.0, "used_gb": 2355.2, "total_gb": 3686.4, "data_drives": [{ "name": "sdb", "temperature_celsius": 24.0 }], "parity_drives": [{ "name": "sdc", "temperature_celsius": 24.0 }] } ] } }, "services": [ { "name": "sshd", "status": "active", "memory_mb": 4.5, "disk_gb": 0.0 } ], "backup": { "status": "completed", "last_run": 1763920000, "next_scheduled": 1764006400, "total_size_gb": 150.5, "repository_health": "ok" } } ``` - ✅ Agent sends structured JSON over ZMQ (no legacy support) - ✅ Type-safe data access: `data.system.storage.drives[0].temperature_celsius` - ✅ Complete metric coverage: CPU, memory, storage, services, backup - ✅ Backward compatibility via bridge conversion to existing UI widgets - ✅ All string parsing bugs eliminated ### Maintenance Mode - Agent checks for `/tmp/cm-maintenance` file before sending notifications - File presence suppresses all email notifications while continuing monitoring - Dashboard continues to show real status, only notifications are blocked Usage: ```bash # Enable maintenance mode touch /tmp/cm-maintenance # Run maintenance tasks systemctl stop service # ... maintenance work ... systemctl start service # Disable maintenance mode rm /tmp/cm-maintenance ``` ## Development and Deployment Architecture ### Development Path - **Location:** `~/projects/cm-dashboard` - **Purpose:** Development workflow only - for committing new code - **Access:** Only for developers to commit changes ### Deployment Path - **Location:** `/var/lib/cm-dashboard/nixos-config` - **Purpose:** Production deployment only - agent clones/pulls from git - **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild ### Git Flow ``` Development: ~/projects/cm-dashboard → git commit → git push Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild ``` ## Automated Binary Release System CM Dashboard uses automated binary releases instead of source builds. ### Creating New Releases ```bash cd ~/projects/cm-dashboard git tag v0.1.X git push origin v0.1.X ``` This automatically: - Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"` - Creates GitHub-style release with tarball - Uploads binaries via Gitea API ### NixOS Configuration Updates Edit `~/projects/nixosbox/hosts/services/cm-dashboard.nix`: ```nix version = "v0.1.X"; src = pkgs.fetchurl { url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz"; sha256 = "sha256-NEW_HASH_HERE"; }; ``` ### Get Release Hash ```bash cd ~/projects/nixosbox nix-build --no-out-link -E 'with import {}; fetchurl { url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' 2>&1 | grep "got:" ``` ### Building **Testing & Building:** - **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"` - **Clean compilation**: Remove `target/` between major changes ## Enhanced Storage Pool Visualization ### Auto-Discovery Architecture The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping. ### Discovery Process **At Agent Startup:** 1. Parse `/proc/mounts` to identify all mounted filesystems 2. Detect MergerFS pools by analyzing `fuse.mergerfs` mount sources 3. Identify member disks and potential parity relationships via heuristics 4. Store discovered storage topology for continuous monitoring 5. Generate pool-aware metrics with hierarchical relationships **Continuous Monitoring:** - Use stored discovery data for efficient metric collection - Monitor individual drives for SMART data, temperature, wear - Calculate pool-level health based on member drive status - Generate enhanced metrics for dashboard visualization ### Supported Storage Types **Single Disks:** - ext4, xfs, btrfs mounted directly - Individual drive monitoring with SMART data - Traditional single-disk display for root, boot, etc. **MergerFS Pools:** - Auto-detect from `/proc/mounts` fuse.mergerfs entries - Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2") - Heuristic parity disk detection (sequential device names, "parity" in path) - Pool health calculation (healthy/degraded/critical) - Hierarchical tree display with data/parity disk grouping **Future Extensions Ready:** - RAID arrays via `/proc/mdstat` parsing - ZFS pools via `zpool status` integration - LVM logical volumes via `lvs` discovery ### Configuration ```toml [collectors.disk] enabled = true auto_discover = true # Default: true # Optional exclusions for special filesystems exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"] exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"] ``` ### Display Format ``` Network: ● eno1: ├─ ip: 192.168.30.105 └─ tailscale0: 100.125.108.16 ● eno2: └─ ip: 192.168.32.105 CPU: ● Load: 0.23 0.21 0.13 └─ Freq: 1048 MHz RAM: ● Usage: 25% 5.8GB/23.3GB ├─ ● /tmp: 2% 0.5GB/2GB └─ ● /var/tmp: 0% 0GB/1.0GB Storage: ● 844B9A25 T: 25C W: 4% ├─ ● /: 55% 250.5GB/456.4GB └─ ● /boot: 26% 0.3GB/1.0GB ● mergerfs /srv/media: ├─ ● 63% 2355.2GB/3686.4GB ├─ ● Data_1: WDZQ8H8D T: 28°C ├─ ● Data_2: GGA04461 T: 28°C └─ ● Parity: WDZS8RY0 T: 29°C Backup: ● WD-WCC7K1234567 T: 32°C W: 12% ├─ Last: 2h ago (12.3GB) ├─ Next: in 22h └─ ● Usage: 45% 678GB/1.5TB ``` ## Important Communication Guidelines Keep responses concise and focused. Avoid extensive implementation summaries unless requested. ## Commit Message Guidelines **NEVER mention:** - Claude or any AI assistant names - Automation or AI-generated content - Any reference to automated code generation **ALWAYS:** - Focus purely on technical changes and their purpose - Use standard software development commit message format - Describe what was changed and why, not how it was created - Write from the perspective of a human developer **Examples:** - ❌ "Generated with Claude Code" - ❌ "AI-assisted implementation" - ❌ "Automated refactoring" - ✅ "Implement maintenance mode for backup operations" - ✅ "Restructure storage widget with improved layout" - ✅ "Update CPU thresholds to production values" ## Implementation Rules 1. **Agent Status Authority**: Agent calculates status for each metric using thresholds 2. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name 3. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status **NEVER:** - Copy/paste ANY code from legacy implementations - Calculate status in dashboard widgets - Hardcode metric names in widgets (use const arrays) - Create files unless absolutely necessary for achieving goals - Create documentation files unless explicitly requested **ALWAYS:** - Prefer editing existing files to creating new ones - Follow existing code conventions and patterns - Use existing libraries and utilities - Follow security best practices