# CM Dashboard - Infrastructure Monitoring TUI ## Overview A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and API integrations. ## Project Goals ### Core Objectives - **Real-time monitoring** of all infrastructure components - **Multi-host support** for cmbox, labbox, simonbox, steambox, srv01 - **Performance-focused** with minimal resource usage - **Keyboard-driven interface** for power users - **Integration** with existing monitoring APIs (ports 6127, 6128, 6129) ### Key Features - **NVMe health monitoring** with wear prediction - **CPU / memory / GPU telemetry** with automatic thresholding - **Service resource monitoring** with per-service CPU and RAM usage - **Disk usage overview** for root filesystems - **Backup status** with detailed metrics and history - **Unified alert pipeline** summarising host health - **Historical data tracking** and trend analysis ## Technical Architecture ### Technology Stack - **Language**: Rust πŸ¦€ - **TUI Framework**: ratatui (modern tui-rs fork) - **Async Runtime**: tokio - **HTTP Client**: reqwest - **Serialization**: serde - **CLI**: clap - **Error Handling**: anyhow - **Time**: chrono ### Dependencies ```toml [dependencies] ratatui = "0.24" # Modern TUI framework crossterm = "0.27" # Cross-platform terminal handling tokio = { version = "1.0", features = ["full"] } # Async runtime reqwest = { version = "0.11", features = ["json"] } # HTTP client serde = { version = "1.0", features = ["derive"] } # JSON parsing clap = { version = "4.0", features = ["derive"] } # CLI args anyhow = "1.0" # Error handling chrono = "0.4" # Time handling ``` ## Project Structure ``` cm-dashboard/ β”œβ”€β”€ Cargo.toml β”œβ”€β”€ README.md β”œβ”€β”€ CLAUDE.md # This file β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ main.rs # Entry point & CLI β”‚ β”œβ”€β”€ app.rs # Main application state β”‚ β”œβ”€β”€ ui/ β”‚ β”‚ β”œβ”€β”€ mod.rs β”‚ β”‚ β”œβ”€β”€ dashboard.rs # Main dashboard layout β”‚ β”‚ β”œβ”€β”€ nvme.rs # NVMe health widget β”‚ β”‚ β”œβ”€β”€ services.rs # Services status widget β”‚ β”‚ β”œβ”€β”€ memory.rs # RAM optimization widget β”‚ β”‚ β”œβ”€β”€ backup.rs # Backup status widget β”‚ β”‚ └── alerts.rs # Alerts/notifications widget β”‚ β”œβ”€β”€ api/ β”‚ β”‚ β”œβ”€β”€ mod.rs β”‚ β”‚ β”œβ”€β”€ client.rs # HTTP client wrapper β”‚ β”‚ β”œβ”€β”€ smart.rs # Smart metrics API (port 6127) β”‚ β”‚ β”œβ”€β”€ service.rs # Service metrics API (port 6128) β”‚ β”‚ └── backup.rs # Backup metrics API (port 6129) β”‚ β”œβ”€β”€ data/ β”‚ β”‚ β”œβ”€β”€ mod.rs β”‚ β”‚ β”œβ”€β”€ metrics.rs # Data structures β”‚ β”‚ β”œβ”€β”€ history.rs # Historical data storage β”‚ β”‚ └── config.rs # Host configuration β”‚ └── config.rs # Application configuration β”œβ”€β”€ config/ β”‚ β”œβ”€β”€ hosts.toml # Host definitions β”‚ └── dashboard.toml # Dashboard layout config └── docs/ β”œβ”€β”€ API.md # API integration documentation └── WIDGETS.md # Widget development guide ``` ### Data Structures ```rust #[derive(Deserialize, Debug)] pub struct SmartMetrics { pub status: String, pub drives: Vec, pub summary: DriveSummary, pub issues: Vec, pub timestamp: u64, } #[derive(Deserialize, Debug)] pub struct ServiceMetrics { pub summary: ServiceSummary, pub services: Vec, pub timestamp: u64, } #[derive(Deserialize, Debug)] pub struct ServiceSummary { pub healthy: usize, pub degraded: usize, pub failed: usize, pub memory_used_mb: f32, pub memory_quota_mb: f32, pub system_memory_used_mb: f32, pub system_memory_total_mb: f32, pub disk_used_gb: f32, pub disk_total_gb: f32, pub cpu_load_1: f32, pub cpu_load_5: f32, pub cpu_load_15: f32, pub cpu_freq_mhz: Option, pub cpu_temp_c: Option, pub gpu_load_percent: Option, pub gpu_temp_c: Option, } #[derive(Deserialize, Debug)] pub struct BackupMetrics { pub overall_status: String, pub backup: BackupInfo, pub service: BackupServiceInfo, pub timestamp: u64, } ``` ## Dashboard Layout Design ### Main Dashboard View ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CM Dashboard β€’ cmbox β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Storage β€’ ok:1 warn:0 crit:0 β”‚ Services β€’ ok:1 warn:0 fail:0 β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”‚ β”‚ β”‚Drive Temp Wear Spare Hours β”‚ β”‚ β”‚Service memory: 7.1/23899.7 MiBβ”‚ β”‚ β”‚ β”‚nvme0n1 28Β°C 1% 100% 14489 β”‚ β”‚ β”‚Disk usage: β€” β”‚ β”‚ β”‚ β”‚ Capacity Usage β”‚ β”‚ β”‚ Service Memory Disk β”‚ β”‚ β”‚ β”‚ 954G 77G (8%) β”‚ β”‚ β”‚βœ” sshd 7.1 MiB β€” β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ └─────────────────────────────── β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ CPU / Memory β€’ warn β”‚ Backups β”‚ β”‚ System memory: 5251.7/23899.7 MiB β”‚ Host cmbox awaiting backup β”‚ β”‚ β”‚ CPU load (1/5/15): 2.18 2.66 2.56 β”‚ metrics β”‚ β”‚ β”‚ CPU freq: 1100.1 MHz β”‚ β”‚ β”‚ β”‚ CPU temp: 47.0Β°C β”‚ β”‚ β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Alerts β€’ ok:0 warn:3 fail:0 β”‚ Status β€’ ZMQ connected β”‚ β”‚ cmbox: warning: CPU load 2.18 β”‚ Monitoring β€’ hosts: 3 β”‚ β”‚ β”‚ srv01: pending: awaiting metrics β”‚ Data source: ZMQ – connected β”‚ β”‚ β”‚ labbox: pending: awaiting metrics β”‚ Active host: cmbox (1/3) β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Keys: [←→] hosts [r]efresh [q]uit ``` ### Multi-Host View ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ πŸ–₯️ CMTEC Host Overview β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Host β”‚ NVMe Wear β”‚ RAM Usage β”‚ Services β”‚ Last Alert β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ srv01 β”‚ 4% βœ… β”‚ 32% βœ… β”‚ 8/8 βœ… β”‚ 04:00 Backup OK β”‚ β”‚ cmbox β”‚ 12% βœ… β”‚ 45% βœ… β”‚ 3/3 βœ… β”‚ Yesterday Email test β”‚ β”‚ labbox β”‚ 8% βœ… β”‚ 28% βœ… β”‚ 2/2 βœ… β”‚ 2h ago NVMe temp OK β”‚ β”‚ simonbox β”‚ 15% βœ… β”‚ 67% ⚠️ β”‚ 4/4 βœ… β”‚ Gaming session active β”‚ β”‚ steambox β”‚ 23% βœ… β”‚ 78% ⚠️ β”‚ 2/2 βœ… β”‚ High RAM usage β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Keys: [Enter] details [r]efresh [s]ort [f]ilter [q]uit ``` ## Development Status ### Immediate TODOs - Refactor all dashboard widgets to use a shared table/layout helper so icons, padding, and titles remain consistent across panels - Investigate why the backup metrics agent is not publishing data to the dashboard - Resize the services widget so it can display more services without truncation - Remove the dedicated status widget and redistribute the layout space - Add responsive scaling within each widget so columns and content adapt dynamically ### Phase 3: Advanced Features 🚧 IN PROGRESS - [x] ZMQ gossip network implementation - [x] Comprehensive error handling - [x] Performance optimizations - [ ] Predictive analytics for wear levels - [ ] Custom alert rules engine - [ ] Historical data export capabilities # Important Communication Guidelines NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise. NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation. NEVER mention Claude or automation in commit messages. Keep commit messages focused on the technical changes only.