Implement per-service disk usage monitoring

Replaced system-wide disk usage with accurate per-service tracking by scanning service-specific directories. Services like sshd now correctly show minimal disk usage instead of misleading system totals. - Rename storage widget and add drive capacity/usage columns - Move host display to main dashboard title for cleaner layout - Replace separate alert displays with color-coded row highlighting - Add per-service disk usage collection using du command - Update services widget formatting to handle small disk values - Restructure into workspace with dedicated agent and dashboard packages
2025-10-11 22:59:16 +02:00
parent 82afe3d4f1
commit 2581435b10
30 changed files with 4801 additions and 446 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -14,10 +14,11 @@ A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure.

 ### Key Features
 - **NVMe health monitoring** with wear prediction
- **RAM optimization tracking** (tmpfs, zram, kernel metrics)
- **Service resource monitoring** with sandboxed limits
+- **CPU / memory / GPU telemetry** with automatic thresholding
+- **Service resource monitoring** with per-service CPU and RAM usage
+- **Disk usage overview** for root filesystems
 - **Backup status** with detailed metrics and history
- **Email notification integration**
+- **Unified alert pipeline** summarising host health
 - **Historical data tracking** and trend analysis

 ## Technical Architecture
@@ -93,8 +94,10 @@ cm-dashboard/

 2. **Service Metrics API** (port 6128)
   - Service status and resource usage
-   - Memory consumption vs limits
-   - Disk usage per service
+   - Service memory consumption vs limits
+   - Host CPU load / frequency / temperature
+   - Root disk utilisation snapshot
+   - GPU utilisation and temperature (if available)

 3. **Backup Metrics API** (port 6129)
   - Backup status and history
@@ -119,6 +122,26 @@ pub struct ServiceMetrics {
    pub timestamp: u64,
 }

+#[derive(Deserialize, Debug)]
+pub struct ServiceSummary {
+    pub healthy: usize,
+    pub degraded: usize,
+    pub failed: usize,
+    pub memory_used_mb: f32,
+    pub memory_quota_mb: f32,
+    pub system_memory_used_mb: f32,
+    pub system_memory_total_mb: f32,
+    pub disk_used_gb: f32,
+    pub disk_total_gb: f32,
+    pub cpu_load_1: f32,
+    pub cpu_load_5: f32,
+    pub cpu_load_15: f32,
+    pub cpu_freq_mhz: Option<f32>,
+    pub cpu_temp_c: Option<f32>,
+    pub gpu_load_percent: Option<f32>,
+    pub gpu_temp_c: Option<f32>,
+}
+
 #[derive(Deserialize, Debug)]
 pub struct BackupMetrics {
    pub overall_status: String,
@@ -617,4 +640,4 @@ smartmontools-rs = "0.1"  # Or direct smartctl bindings
 **Performance Targets**:
 - **Agent footprint**: < 2MB RAM, < 1% CPU
 - **Metric latency**: < 100ms propagation across network
- **Network efficiency**: < 1KB/s per host steady state
+- **Network efficiency**: < 1KB/s per host steady state