Implement per-service disk usage monitoring
Replaced system-wide disk usage with accurate per-service tracking by scanning service-specific directories. Services like sshd now correctly show minimal disk usage instead of misleading system totals. - Rename storage widget and add drive capacity/usage columns - Move host display to main dashboard title for cleaner layout - Replace separate alert displays with color-coded row highlighting - Add per-service disk usage collection using du command - Update services widget formatting to handle small disk values - Restructure into workspace with dedicated agent and dashboard packages
This commit is contained in:
35
CLAUDE.md
35
CLAUDE.md
@@ -14,10 +14,11 @@ A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure.
|
||||
|
||||
### Key Features
|
||||
- **NVMe health monitoring** with wear prediction
|
||||
- **RAM optimization tracking** (tmpfs, zram, kernel metrics)
|
||||
- **Service resource monitoring** with sandboxed limits
|
||||
- **CPU / memory / GPU telemetry** with automatic thresholding
|
||||
- **Service resource monitoring** with per-service CPU and RAM usage
|
||||
- **Disk usage overview** for root filesystems
|
||||
- **Backup status** with detailed metrics and history
|
||||
- **Email notification integration**
|
||||
- **Unified alert pipeline** summarising host health
|
||||
- **Historical data tracking** and trend analysis
|
||||
|
||||
## Technical Architecture
|
||||
@@ -93,8 +94,10 @@ cm-dashboard/
|
||||
|
||||
2. **Service Metrics API** (port 6128)
|
||||
- Service status and resource usage
|
||||
- Memory consumption vs limits
|
||||
- Disk usage per service
|
||||
- Service memory consumption vs limits
|
||||
- Host CPU load / frequency / temperature
|
||||
- Root disk utilisation snapshot
|
||||
- GPU utilisation and temperature (if available)
|
||||
|
||||
3. **Backup Metrics API** (port 6129)
|
||||
- Backup status and history
|
||||
@@ -119,6 +122,26 @@ pub struct ServiceMetrics {
|
||||
pub timestamp: u64,
|
||||
}
|
||||
|
||||
#[derive(Deserialize, Debug)]
|
||||
pub struct ServiceSummary {
|
||||
pub healthy: usize,
|
||||
pub degraded: usize,
|
||||
pub failed: usize,
|
||||
pub memory_used_mb: f32,
|
||||
pub memory_quota_mb: f32,
|
||||
pub system_memory_used_mb: f32,
|
||||
pub system_memory_total_mb: f32,
|
||||
pub disk_used_gb: f32,
|
||||
pub disk_total_gb: f32,
|
||||
pub cpu_load_1: f32,
|
||||
pub cpu_load_5: f32,
|
||||
pub cpu_load_15: f32,
|
||||
pub cpu_freq_mhz: Option<f32>,
|
||||
pub cpu_temp_c: Option<f32>,
|
||||
pub gpu_load_percent: Option<f32>,
|
||||
pub gpu_temp_c: Option<f32>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize, Debug)]
|
||||
pub struct BackupMetrics {
|
||||
pub overall_status: String,
|
||||
@@ -617,4 +640,4 @@ smartmontools-rs = "0.1" # Or direct smartctl bindings
|
||||
**Performance Targets**:
|
||||
- **Agent footprint**: < 2MB RAM, < 1% CPU
|
||||
- **Metric latency**: < 100ms propagation across network
|
||||
- **Network efficiency**: < 1KB/s per host steady state
|
||||
- **Network efficiency**: < 1KB/s per host steady state
|
||||
|
||||
Reference in New Issue
Block a user