# CM Dashboard - Infrastructure Monitoring TUI ## Overview A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture. ## Current Features ### Core Functionality - **Real-time Monitoring**: CPU, RAM, Storage, and Service status - **Service Management**: Start/stop services with user-stopped tracking - **Multi-host Support**: Monitor multiple servers from single dashboard - **NixOS Integration**: System rebuild via SSH + tmux popup - **Backup Monitoring**: Borgbackup status and scheduling ### User-Stopped Service Tracking - Services stopped via dashboard are marked as "user-stopped" - User-stopped services report Status::OK instead of Warning - Prevents false alerts during intentional maintenance - Persistent storage survives agent restarts - Automatic flag clearing when services are restarted via dashboard ### Custom Service Logs - Configure service-specific log file paths per host in dashboard config - Press `L` on any service to view custom log files via `tail -f` - Configuration format in dashboard config: ```toml [service_logs] hostname1 = [ { service_name = "nginx", log_file_path = "/var/log/nginx/access.log" }, { service_name = "app", log_file_path = "/var/log/myapp/app.log" } ] hostname2 = [ { service_name = "database", log_file_path = "/var/log/postgres/postgres.log" } ] ``` ### Service Management - **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services - **Service Actions**: - `s` - Start service (sends UserStart command) - `S` - Stop service (sends UserStop command) - `J` - Show service logs (journalctl in tmux popup) - `L` - Show custom log files (tail -f custom paths in tmux popup) - `R` - Rebuild current host - **Visual Status**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed) - **Transitional Icons**: Blue arrows during operations ### Navigation - **Tab**: Switch between hosts - **↑↓ or j/k**: Select services - **s**: Start selected service (UserStart) - **S**: Stop selected service (UserStop) - **J**: Show service logs (journalctl) - **L**: Show custom log files - **R**: Rebuild current host - **B**: Run backup on current host - **q**: Quit dashboard ## Core Architecture Principles ### Individual Metrics Philosophy - Agent collects individual metrics, dashboard composes widgets - Each metric collected, transmitted, and stored individually - Agent calculates status for each metric using thresholds - Dashboard aggregates individual metric statuses for widget status ### Maintenance Mode - Agent checks for `/tmp/cm-maintenance` file before sending notifications - File presence suppresses all email notifications while continuing monitoring - Dashboard continues to show real status, only notifications are blocked Usage: ```bash # Enable maintenance mode touch /tmp/cm-maintenance # Run maintenance tasks systemctl stop service # ... maintenance work ... systemctl start service # Disable maintenance mode rm /tmp/cm-maintenance ``` ## Development and Deployment Architecture ### Development Path - **Location:** `~/projects/cm-dashboard` - **Purpose:** Development workflow only - for committing new code - **Access:** Only for developers to commit changes ### Deployment Path - **Location:** `/var/lib/cm-dashboard/nixos-config` - **Purpose:** Production deployment only - agent clones/pulls from git - **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild ### Git Flow ``` Development: ~/projects/cm-dashboard → git commit → git push Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild ``` ## Automated Binary Release System CM Dashboard uses automated binary releases instead of source builds. ### Creating New Releases ```bash cd ~/projects/cm-dashboard git tag v0.1.X git push origin v0.1.X ``` This automatically: - Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"` - Creates GitHub-style release with tarball - Uploads binaries via Gitea API ### NixOS Configuration Updates Edit `~/projects/nixosbox/hosts/services/cm-dashboard.nix`: ```nix version = "v0.1.X"; src = pkgs.fetchurl { url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz"; sha256 = "sha256-NEW_HASH_HERE"; }; ``` ### Get Release Hash ```bash cd ~/projects/nixosbox nix-build --no-out-link -E 'with import {}; fetchurl { url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' 2>&1 | grep "got:" ``` ### Building **Testing & Building:** - **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"` - **Clean compilation**: Remove `target/` between major changes ## Enhanced Storage Pool Visualization ### Auto-Discovery Architecture The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping. ### Discovery Process **At Agent Startup:** 1. Parse `/proc/mounts` to identify all mounted filesystems 2. Detect MergerFS pools by analyzing `fuse.mergerfs` mount sources 3. Identify member disks and potential parity relationships via heuristics 4. Store discovered storage topology for continuous monitoring 5. Generate pool-aware metrics with hierarchical relationships **Continuous Monitoring:** - Use stored discovery data for efficient metric collection - Monitor individual drives for SMART data, temperature, wear - Calculate pool-level health based on member drive status - Generate enhanced metrics for dashboard visualization ### Supported Storage Types **Single Disks:** - ext4, xfs, btrfs mounted directly - Individual drive monitoring with SMART data - Traditional single-disk display for root, boot, etc. **MergerFS Pools:** - Auto-detect from `/proc/mounts` fuse.mergerfs entries - Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2") - Heuristic parity disk detection (sequential device names, "parity" in path) - Pool health calculation (healthy/degraded/critical) - Hierarchical tree display with data/parity disk grouping **Future Extensions Ready:** - RAID arrays via `/proc/mdstat` parsing - ZFS pools via `zpool status` integration - LVM logical volumes via `lvs` discovery ### Configuration ```toml [collectors.disk] enabled = true auto_discover = true # Default: true # Optional exclusions for special filesystems exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"] exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"] ``` ### Display Format ``` Storage: ● /srv/media (mergerfs (2+1)): ├─ Pool Status: ● Healthy (3 drives) ├─ Total: ● 63% 2355.2GB/3686.4GB ├─ Data Disks: │ ├─ ● sdb T: 24°C │ └─ ● sdd T: 27°C └─ Parity: ● sdc T: 24°C ● /: ├─ ● nvme0n1 W: 13% └─ ● 7% 14.5GB/218.5GB ``` ### Implementation Benefits - **Zero Configuration**: No manual pool definitions required - **Always Accurate**: Reflects actual system state automatically - **Scales Automatically**: Handles any number of pools without config changes - **Backwards Compatible**: Single disks continue working unchanged - **Future Ready**: Easy extension for additional storage technologies ### Current Status (v0.1.100) **✅ Completed:** - Auto-discovery system implemented and deployed - `/proc/mounts` parsing with smart heuristics for parity detection - Storage topology stored at agent startup for efficient monitoring - Universal zero-configuration for all hosts (cmbox, steambox, simonbox, srv01, srv02, srv03) - Enhanced pool health calculation (healthy/degraded/critical) - Hierarchical tree visualization with data/parity disk separation **🔄 In Progress - Complete Disk Collector Rewrite:** The current disk collector has grown complex with mixed legacy/auto-discovery approaches. Planning complete rewrite with clean, simple workflow supporting both physical drives and mergerfs pools. **New Clean Architecture:** **Discovery Workflow:** 1. **`lsblk`** to detect all mount points and backing devices 2. **`df`** to get filesystem usage for each mount point 3. **Group by physical drive** (nvme0n1, sda, etc.) 4. **Parse `/proc/mounts`** for mergerfs pools 5. **Generate unified metrics** for both storage types **Physical Drive Display:** ``` ● nvme0n1: ├─ ● Drive: T: 35°C W: 1% ├─ ● Total: 23% 218.0GB/928.2GB ├─ ● /boot: 11% 0.1GB/1.0GB └─ ● /: 23% 214.9GB/928.2GB ``` **MergerFS Pool Display:** ``` ● /srv/media (mergerfs): ├─ ● Pool: 63% 2355.2GB/3686.4GB ├─ Data Disks: │ ├─ ● sdb T: 24°C │ └─ ● sdd T: 27°C └─ ● sdc T: 24°C (parity) ``` **Implementation Benefits:** - **Pure auto-discovery**: No configuration needed - **Clean code paths**: Single workflow for all storage types - **Consistent display**: Status icons on every line, no redundant text - **Simple pipeline**: lsblk → df → group → metrics - **Support for both**: Physical drives and mergerfs pools ## Important Communication Guidelines Keep responses concise and focused. Avoid extensive implementation summaries unless requested. ## Commit Message Guidelines **NEVER mention:** - Claude or any AI assistant names - Automation or AI-generated content - Any reference to automated code generation **ALWAYS:** - Focus purely on technical changes and their purpose - Use standard software development commit message format - Describe what was changed and why, not how it was created - Write from the perspective of a human developer **Examples:** - ❌ "Generated with Claude Code" - ❌ "AI-assisted implementation" - ❌ "Automated refactoring" - ✅ "Implement maintenance mode for backup operations" - ✅ "Restructure storage widget with improved layout" - ✅ "Update CPU thresholds to production values" ## Implementation Rules 1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually 2. **Agent Status Authority**: Agent calculates status for each metric using thresholds 3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name 4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status **NEVER:** - Copy/paste ANY code from legacy implementations - Calculate status in dashboard widgets - Hardcode metric names in widgets (use const arrays) - Create files unless absolutely necessary for achieving goals - Create documentation files unless explicitly requested **ALWAYS:** - Prefer editing existing files to creating new ones - Follow existing code conventions and patterns - Use existing libraries and utilities - Follow security best practices