Update README with actual dashboard interface and implementation details

2025-10-21 20:36:03 +02:00 · 2025-10-21 20:36:03 +02:00 · 0417e2c1f1
commit 0417e2c1f1
parent a08670071c
1 changed files with 312 additions and 451 deletions
--- a/README.md
+++ b/README.md
@ -1,544 +1,405 @@
-# CM Dashboard - Infrastructure Monitoring TUI
+# CM Dashboard

-A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for specific monitoring needs and API integrations. Features real-time monitoring of all infrastructure components with intelligent email notifications and automatic status calculation.
+A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ.
+
+## Current Implementation
+
+This is a complete rewrite implementing an **individual metrics architecture** where:
+- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status
+- **Dashboard** subscribes to specific metrics and composes widgets
+- **Status Aggregation** provides intelligent email notifications with batching
+- **Persistent Cache** prevents false notifications on restart
+
+## Dashboard Interface

-### System Widget
 ```
-┌System───────────────────────────────────────────────────────┐
-│  Memory usage                                               │
-│✔ 3.0 / 7.8 GB                                               │
-│  CPU load            CPU temp                               │
-│✔ 1.05 • 0.96 • 0.58  64.0°C                                 │
-│  C1E    C3     C6     C8     C9     C10                     │
-│✔ 0.5%   0.5%   10.4%  10.2%  0.4%   77.9%                   │
-│  GPU load  GPU temp                                         │
-│✔ —         —                                                │
-└─────────────────────────────────────────────────────────────┘
+cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
+┌system───────────────────────────────────────────┐┌services────────────────────────────────────────────────────┐
+│CPU:                                             ││Service:                  Status:    RAM:     Disk:         │
+│● Load: 0.10 0.52 0.88 • 400.0 MHz               ││● docker                  active     27M      496MB         │
+│RAM:                                             ││● docker-registry         active     19M      496MB         │
+│● Used: 30% 2.3GB/7.6GB                          ││● gitea                   active     579M     2.6GB         │
+│● tmp: 0.0% 0B/2.0GB                             ││● gitea-runner-default    active     11M      2.6GB         │
+│Disk nvme0n1:                                    ││● haasp-core              active     9M       1MB           │
+│● Health: PASSED                                 ││● haasp-mqtt              active     3M       1MB           │
+│● Usage @root: 8.3% • 75.4/906.2 GB              ││● haasp-webgrid           active     10M      1MB           │
+│● Usage @boot: 5.9% • 0.1/1.0 GB                 ││● immich-server           active     240M     45.1GB        │
+│                                                 ││● mosquitto               active     1M       1MB           │
+│                                                 ││● mysql                   active     38M      225MB         │
+│                                                 ││● nginx                   active     28M      24MB          │
+│                                                 ││  ├─ ● gitea.cmtec.se     51ms                              │
+│                                                 ││  ├─ ● haasp.cmtec.se     43ms                              │
+│                                                 ││  ├─ ● haasp.net          43ms                              │
+│                                                 ││  ├─ ● pages.cmtec.se     45ms                              │
+└─────────────────────────────────────────────────┘│  ├─ ● photos.cmtec.se    41ms                              │
+┌backup───────────────────────────────────────────┐│  ├─ ● unifi.cmtec.se     46ms                              │
+│Latest backup:                                   ││  ├─ ● vault.cmtec.se     47ms                              │
+│● Status: OK                                     ││  ├─ ● www.kryddorten.se  81ms                              │
+│Duration: 54s • Last: 4h ago                     ││  ├─ ● www.mariehall2.se  86ms                              │
+│Disk usage: 48.2GB/915.8GB                       ││● postgresql              active     112M     357MB         │
+│P/N: Samsung SSD 870 QVO 1TB                     ││● redis-immich            active     8M       45.1GB        │
+│S/N: S5RRNF0W800639Y                             ││● sshd                    active     2M       0             │
+│● gitea 2 archives 2.7GB                         ││● unifi                   active     594M     495MB         │
+│● immich 2 archives 45.0GB                       ││● vaultwarden             active     12M      1MB           │
+│● kryddorten 2 archives 67.6MB                   ││                                                            │
+│● mariehall2 2 archives 321.8MB                  ││                                                            │
+│● nixosbox 2 archives 4.5MB                      ││                                                            │
+│● unifi 2 archives 2.9MB                         ││                                                            │
+│● vaultwarden 2 archives 305kB                   ││                                                            │
+└─────────────────────────────────────────────────┘└────────────────────────────────────────────────────────────┘
 ```

-### Services Widget (Enhanced)
-```
-┌Services────────────────────────────────────────────────────┐
-│  Service          Memory (GB)  CPU    Disk                 │
-│✔ Service Memory   7.1/23899.7 MiB     —                   │
-│✔ Disk Usage       —           —       45/100 GB           │
-│⚠ CPU Load         —           2.18    —                   │
-│✔ CPU Temperature  —           47.0°C  —                   │
-│✔ docker-registry  0.0 GB       0.0%   <1 MB               │
-│✔ gitea            0.4/4.1 GB   0.2%   970 MB               │
-│  1 active connections                                      │
-│✔ nginx            0.0/1.0 GB   0.0%   <1 MB                │
-│✔  ├─ docker.cmtec.se                                      │
-│✔  ├─ git.cmtec.se                                         │
-│✔  ├─ gitea.cmtec.se                                       │
-│✔  ├─ haasp.cmtec.se                                       │
-│✔  ├─ pages.cmtec.se                                       │
-│✔  └─ www.kryddorten.se                                    │
-│✔ postgresql       0.1 GB       0.0%   378 MB               │
-│  1 active connections                                      │
-│✔ redis-immich     0.0 GB       0.4%   <1 MB                │
-│✔ sshd             0.0 GB       0.0%   <1 MB                │
-│  1 SSH connection                                          │
-│✔ unifi            0.9/2.0 GB   0.4%   391 MB               │
-└────────────────────────────────────────────────────────────┘
-```
+**Navigation**: `←→` switch hosts, `r` refresh, `q` quit

-### Storage Widget
-```
-┌Storage──────────────────────────────────────────────────────┐
-│  Drive    Temp   Wear   Spare  Hours  Capacity  Usage       │
-│✔ nvme0n1  57°C   4%     100%   11463  932G      23G (2%)    │
-│                                                             │
-└─────────────────────────────────────────────────────────────┘
-```
+## Features

-### Backups Widget
-```
-┌Backups──────────────────────────────────────────────────────┐
-│  Backup  Status  Details                                    │
-│✔ Latest  3h ago  1.4 GiB                                    │
-│  8 archives, 2.4 GiB total                                  │
-│✔ Disk    ok      2.4/468 GB (1%)                            │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### Hosts Widget
-```
-┌Hosts────────────────────────────────────────────────────────┐
-│  Host    Status            Timestamp                        │
-│✔ cmbox   ok                2025-10-13 05:45:28              │
-│✔ srv01   ok                2025-10-13 05:45:28              │
-│? labbox  No data received  —                                │
-└─────────────────────────────────────────────────────────────┘
-```
-
-**Navigation**: `←→` hosts, `r` refresh, `q` quit
-
-## Key Features
-
-### Real-time Monitoring
- **Multi-host support** for cmbox, labbox, simonbox, steambox, srv01
- **Performance-focused** with minimal resource usage
- **Keyboard-driven interface** for power users
- **ZMQ gossip network** for efficient data distribution
-
-### Infrastructure Monitoring
- **NVMe health monitoring** with wear prediction and temperature tracking
- **CPU/Memory/GPU telemetry** with automatic thresholding
- **Service resource monitoring** with per-service CPU and RAM usage
- **Disk usage overview** for root filesystems
- **Backup status** with detailed metrics and history
- **C-state monitoring** for CPU power management analysis
-
-### Intelligent Alerting
- **Agent-calculated status** with predefined thresholds
- **Email notifications** via SMTP with rate limiting
- **Recovery notifications** with context about original issues
- **Stockholm timezone** support for email timestamps
- **Unified alert pipeline** summarizing host health
+- **Real-time monitoring** - Dashboard updates every 1-2 seconds
+- **Individual metric collection** - Granular data for flexible dashboard composition  
+- **Intelligent status aggregation** - Host-level status calculated from all services
+- **Smart email notifications** - Batched, detailed alerts with service groupings
+- **Persistent state** - Prevents false notifications on restarts
+- **ZMQ communication** - Efficient agent-to-dashboard messaging
+- **Clean TUI** - Terminal-based dashboard with color-coded status indicators

 ## Architecture

-### Agent-Dashboard Separation
-The system follows a strict separation of concerns:
+### Core Components

- **Agent**: Single source of truth for all status calculations using defined thresholds
- **Dashboard**: Display-only interface that shows agent-provided status
- **Data Flow**: Agent (calculations) → Status → Dashboard (display) → Colors
+- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
+- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics  
+- **Shared** (`cm-dashboard-shared`) - Common types and protocol
+- **Status Aggregation** - Intelligent batching and notification management
+- **Persistent Cache** - Maintains state across restarts

-### Agent Thresholds (Production)
- **CPU Load**: Warning ≥ 5.0, Critical ≥ 8.0
- **Memory Usage**: Warning ≥ 80%, Critical ≥ 95%
- **CPU Temperature**: Warning ≥ 100°C, Critical ≥ 100°C (effectively disabled)
+### Status Levels

-### Email Notification System
- **From**: `{hostname}@cmtec.se` (e.g., cmbox@cmtec.se)
- **To**: `cm@cmtec.se`
- **SMTP**: localhost:25 (postfix)
- **Rate Limiting**: 30 minutes (configurable)
- **Triggers**: Status degradation and recovery with detailed context
-
-## Installation
-
-### Requirements
- Rust toolchain 1.75+ (install via [`rustup`](https://rustup.rs))
- Root privileges for agent (hardware monitoring access)
- Network access for ZMQ communication (default port 6130)
- SMTP server for notifications (postfix recommended)
-
-### Build from Source
-```bash
-git clone https://github.com/cmtec/cm-dashboard.git
-cd cm-dashboard
-cargo build --release
-```
-
-Optimized binaries available at:
- Dashboard: `target/release/cm-dashboard`
- Agent: `target/release/cm-dashboard-agent`
-
-### Installation
-```bash
-# Install dashboard
-cargo install --path dashboard
-
-# Install agent (requires root for hardware access)
-sudo cargo install --path agent
-```
+- **🟢 Ok** - Service running normally
+- **🔵 Pending** - Service starting/stopping/reloading  
+- **🟡 Warning** - Service issues (high load, memory, disk usage)
+- **🔴 Critical** - Service failed or critical thresholds exceeded
+- **❓ Unknown** - Service state cannot be determined

 ## Quick Start

-### Dashboard
-```bash
-# Run with default configuration
-cm-dashboard
-
-# Specify host to monitor
-cm-dashboard --host cmbox
-
-# Override ZMQ endpoints
-cm-dashboard --zmq-endpoint tcp://srv01:6130,tcp://labbox:6130
-
-# Increase logging verbosity
-cm-dashboard -v
-```
-
-### Agent (Pure Auto-Discovery)
-The agent requires **no configuration files** and auto-discovers all system components:
+### Build

 ```bash
-# Basic agent startup (auto-detects everything)
-sudo cm-dashboard-agent
+# With Nix (recommended)
+nix-shell -p openssl pkg-config --run "cargo build --workspace"

-# With verbose logging for troubleshooting
-sudo cm-dashboard-agent -v
+# Or with system dependencies
+sudo apt install libssl-dev pkg-config  # Ubuntu/Debian
+cargo build --workspace
 ```

-The agent automatically:
- **Discovers storage devices** for SMART monitoring
- **Detects running systemd services** for resource tracking
- **Configures collection intervals** based on system capabilities
- **Sets up email notifications** using hostname@cmtec.se
+### Run
+
+```bash
+# Start agent (requires configuration file)
+./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
+
+# Start dashboard 
+./target/debug/cm-dashboard --config /path/to/dashboard.toml
+```

 ## Configuration

-### Dashboard Configuration
-The dashboard creates `config/dashboard.toml` on first run:
+### Agent Configuration (`agent.toml`)
+
+The agent requires a comprehensive TOML configuration file:

 ```toml
-[hosts]
-default_host = "srv01"
+collection_interval_seconds = 2

-[[hosts.hosts]]
-name = "srv01"
+[zmq]
+publisher_port = 6130
+command_port = 6131
+bind_address = "0.0.0.0"
+timeout_ms = 5000
+heartbeat_interval_ms = 30000
+
+[collectors.cpu]
 enabled = true
+interval_seconds = 2
+load_warning_threshold = 9.0
+load_critical_threshold = 10.0
+temperature_warning_threshold = 100.0
+temperature_critical_threshold = 110.0

-[[hosts.hosts]]
-name = "cmbox"
+[collectors.memory]
 enabled = true
+interval_seconds = 2
+usage_warning_percent = 80.0
+usage_critical_percent = 95.0

-[dashboard]
-tick_rate_ms = 250
-history_duration_minutes = 60
+[collectors.disk]
+enabled = true
+interval_seconds = 300
+usage_warning_percent = 80.0
+usage_critical_percent = 90.0

-[data_source]
-kind = "zmq"
+[[collectors.disk.filesystems]]
+name = "root"
+uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79"
+mount_point = "/"
+fs_type = "ext4"
+monitor = true

-[data_source.zmq]
-endpoints = ["tcp://127.0.0.1:6130"]
+[collectors.systemd]
+enabled = true
+interval_seconds = 10
+memory_warning_mb = 1000.0
+memory_critical_mb = 2000.0
+service_name_filters = [
+  "nginx", "postgresql", "redis", "docker", "sshd"
+]
+excluded_services = [
+  "nginx-config-reload", "sshd-keygen"
+]
+
+[notifications]
+enabled = true
+smtp_host = "localhost"
+smtp_port = 25
+from_email = "{hostname}@example.com"
+to_email = "admin@example.com"
+rate_limit_minutes = 0
+trigger_on_warnings = true
+trigger_on_failures = true
+recovery_requires_all_ok = true
+suppress_individual_recoveries = true
+
+[status_aggregation]
+enabled = true
+aggregation_method = "worst_case"
+notification_interval_seconds = 30
+
+[cache]
+persist_path = "/var/lib/cm-dashboard/cache.json"
 ```

-### Agent Configuration (Optional)
-The agent works without configuration but supports optional settings:
+### Dashboard Configuration (`dashboard.toml`)

-```bash
-# Generate example configuration
-cm-dashboard-agent --help
+```toml
+[zmq]
+hosts = [
+  { name = "server1", address = "192.168.1.100", port = 6130 },
+  { name = "server2", address = "192.168.1.101", port = 6130 }
+]
+connection_timeout_ms = 5000
+reconnect_interval_ms = 10000

-# Override specific settings
-sudo cm-dashboard-agent \
-    --hostname cmbox \
-    --bind tcp://*:6130 \
-    --interval 5000
+[ui]
+refresh_interval_ms = 1000
+theme = "dark"
 ```

-## Widget Layout
+## Collectors

-### Services Widget Structure
-The Services widget now displays both system metrics and services in a unified table:
+The agent implements several specialized collectors:

-```
-┌Services────────────────────────────────────────────────────┐
-│  Service          Memory (GB)  CPU    Disk                 │
-│✔ Service Memory   7.1/23899.7 MiB     —                   │ ← System metric as service row
-│✔ Disk Usage       —           —       45/100 GB           │ ← System metric as service row  
-│⚠ CPU Load         —           2.18    —                   │ ← System metric as service row
-│✔ CPU Temperature  —           47.0°C  —                   │ ← System metric as service row
-│✔ docker-registry  0.0 GB      0.0%    <1 MB               │ ← Regular service
-│✔ nginx            0.0/1.0 GB  0.0%    <1 MB               │ ← Regular service
-│✔  ├─ docker.cmtec.se                                      │ ← Nginx site (sub-service)
-│✔  ├─ git.cmtec.se                                         │ ← Nginx site (sub-service)  
-│✔  └─ gitea.cmtec.se                                       │ ← Nginx site (sub-service)
-│✔ sshd             0.0 GB      0.0%    <1 MB               │ ← Regular service
-│  1 SSH connection                                          │ ← Service description
-└────────────────────────────────────────────────────────────┘
-```
+### CPU Collector (`cpu.rs`)
+- Load average (1, 5, 15 minute)
+- CPU temperature monitoring
+- Real-time process monitoring (top CPU consumers)
+- Status calculation with configurable thresholds

-**Row Types:**
- **System Metrics**: CPU Load, Service Memory, Disk Usage, CPU Temperature with status indicators
- **Regular Services**: Full resource data (memory, CPU, disk) with optional description lines  
- **Sub-services**: Nginx sites with tree structure, status indicators only (no resource columns)
- **Description Lines**: Connection counts and service-specific info without status indicators
+### Memory Collector (`memory.rs`)  
+- RAM usage (total, used, available)
+- Swap monitoring
+- Real-time process monitoring (top RAM consumers)
+- Memory pressure detection

-### Hosts Widget (formerly Alerts)
-The Hosts widget provides a summary view of all monitored hosts:
+### Disk Collector (`disk.rs`)
+- Filesystem usage per mount point
+- SMART health monitoring
+- Temperature and wear tracking
+- Configurable filesystem monitoring

-```
-┌Hosts────────────────────────────────────────────────────────┐
-│  Host    Status            Timestamp                        │
-│✔ cmbox   ok                2025-10-13 05:45:28              │
-│✔ srv01   ok                2025-10-13 05:45:28              │
-│? labbox  No data received  —                                │
-└─────────────────────────────────────────────────────────────┘
-```
+### Systemd Collector (`systemd.rs`)
+- Service status monitoring (`active`, `inactive`, `failed`)
+- Memory usage per service
+- Service filtering and exclusions
+- Handles transitional states (`Status::Pending`)

-## Monitoring Components
-
-### System Collector
- **CPU Load**: 1/5/15 minute averages with warning/critical thresholds
- **Memory Usage**: Used/total with percentage calculation
- **CPU Temperature**: x86_pkg_temp prioritized for accuracy
- **C-States**: Power management state distribution (C0-C10)
-
-### Service Collector
- **System Metrics as Services**: CPU Load, Service Memory, Disk Usage, CPU Temperature displayed as individual service rows
- **Systemd Services**: Auto-discovery of interesting services with resource monitoring
- **Nginx Site Monitoring**: Individual rows for each nginx virtual host with tree structure (`├─` and `└─`)
- **Resource Usage**: Per-service memory, CPU, and disk consumption
- **Service Health**: Running/stopped/degraded status with detailed failure info
- **Connection Tracking**: SSH connections, database connections as description lines
-
-### SMART Collector
- **NVMe Health**: Temperature, wear leveling, spare blocks
- **Drive Capacity**: Total/used space with percentage
- **SMART Attributes**: Critical health indicators
-
-### Backup Collector
- **Restic Integration**: Backup status and history
- **Health Monitoring**: Success/failure tracking
- **Storage Metrics**: Backup size and retention
-
-## Keyboard Controls
-
-| Key | Action |
-|-----|--------|
-| `←` / `h` | Previous host |
-| `→` / `l` / `Tab` | Next host |
-| `?` | Toggle help overlay |
-| `r` | Force refresh |
-| `q` / `Esc` | Quit |
+### Backup Collector (`backup.rs`)
+- Reads TOML status files from backup systems
+- Archive age verification
+- Disk usage tracking
+- Repository health monitoring

 ## Email Notifications

-### Notification Triggers
- **Status Degradation**: Any status change to warning/critical
- **Recovery**: Warning/critical status returning to ok
- **Service Failures**: Individual service stop/start events
+### Intelligent Batching
+
+The system implements smart notification batching to prevent email spam:
+
+- **Real-time dashboard updates** - Status changes appear immediately
+- **Batched email notifications** - Aggregated every 30 seconds
+- **Detailed groupings** - Services organized by severity
+
+### Example Alert Email

-### Example Recovery Email
 ```
-✅ RESOLVED: system cpu on cmbox
+Subject: Status Alert: 2 critical, 1 warning, 15 started

-Status Change Alert
+Status Summary (30s duration)
+Host Status: Ok → Warning

-Host: cmbox
-Component: system
-Metric: cpu
-Status Change: warning → ok
-Time: 2025-10-12 22:15:30 CET
+🔴 CRITICAL ISSUES (2):
+  postgresql: Ok → Critical
+  nginx: Warning → Critical

-Details:
-Recovered from: CPU load (1/5/15min): 6.20 / 5.80 / 4.50
-Current status: CPU load (1/5/15min): 3.30 / 3.17 / 2.84
+🟡 WARNINGS (1):
+  redis: Ok → Warning (memory usage 85%)
+
+✅ RECOVERIES (0):
+
+🟢 SERVICE STARTUPS (15):
+  docker: Unknown → Ok
+  sshd: Unknown → Ok
+  ...

 --
 CM Dashboard Agent
-Generated at 2025-10-12 22:15:30 CET
+Generated at 2025-10-21 19:42:42 CET
 ```

-### Rate Limiting
- **Default**: 30 minutes between notifications per component
- **Testing**: Set to 0 for immediate notifications
- **Configurable**: Adjustable per deployment needs
+## Individual Metrics Architecture
+
+The system follows a **metrics-first architecture**:
+
+### Agent Side
+```rust
+// Agent collects individual metrics
+vec![
+    Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok),
+    Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning),
+    Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok),
+]
+```
+
+### Dashboard Side
+```rust
+// Widgets subscribe to specific metrics
+impl Widget for CpuWidget {
+    fn update_from_metrics(&mut self, metrics: &[&Metric]) {
+        for metric in metrics {
+            match metric.name.as_str() {
+                "cpu_load_1min" => self.load_1min = metric.value.as_f32(),
+                "cpu_load_5min" => self.load_5min = metric.value.as_f32(),
+                "cpu_temperature_celsius" => self.temperature = metric.value.as_f32(),
+                _ => {}
+            }
+        }
+    }
+}
+```
+
+## Persistent Cache
+
+The cache system prevents false notifications:
+
+- **Automatic saving** - Saves when service status changes
+- **Persistent storage** - Maintains state across agent restarts
+- **Simple design** - No complex TTL or cleanup logic
+- **Status preservation** - Prevents duplicate notifications

 ## Development

 ### Project Structure
+
 ```
 cm-dashboard/
-├── agent/                 # Monitoring agent
+├── agent/                  # Metrics collection agent
 │   ├── src/
-│   │   ├── collectors/    # Data collection modules
-│   │   ├── notifications.rs # Email notification system
-│   │   └── simple_agent.rs # Main agent logic
-├── dashboard/             # TUI dashboard
+│   │   ├── collectors/     # CPU, memory, disk, systemd, backup
+│   │   ├── status/         # Status aggregation and notifications
+│   │   ├── cache/          # Persistent metric caching
+│   │   ├── config/         # TOML configuration loading
+│   │   └── notifications/  # Email notification system
+├── dashboard/              # TUI dashboard application
 │   ├── src/
-│   │   ├── ui/           # Widget implementations
-│   │   ├── data/         # Data structures
-│   │   └── app.rs        # Application state
-├── shared/               # Common data structures
-└── config/              # Configuration files
+│   │   ├── ui/widgets/     # CPU, memory, services, backup widgets
+│   │   ├── metrics/        # Metric storage and filtering
+│   │   └── communication/  # ZMQ metric consumption
+├── shared/                 # Shared types and utilities
+│   └── src/
+│       ├── metrics.rs      # Metric, Status, and Value types
+│       ├── protocol.rs     # ZMQ message format
+│       └── cache.rs        # Cache configuration
+└── README.md              # This file
 ```

-### Development Commands
-```bash
-# Format code
-cargo fmt
+### Building

-# Check all packages
-cargo check
+```bash
+# Debug build
+cargo build --workspace
+
+# Release build  
+cargo build --workspace --release

 # Run tests
-cargo test
+cargo test --workspace

-# Build release
-cargo build --release
+# Check code formatting
+cargo fmt --all -- --check

-# Run with logging
-RUST_LOG=debug cargo run -p cm-dashboard-agent
+# Run clippy linter
+cargo clippy --workspace -- -D warnings
 ```

-### Architecture Principles
+### Dependencies

-#### Status Calculation Rules
- **Agent calculates all status** using predefined thresholds
- **Dashboard never calculates status** - only displays agent data
- **No hardcoded thresholds in dashboard** widgets
- **Use "unknown" when agent status missing** (never default to "ok")
-
-#### Data Flow
-```
-System Metrics → Agent Collectors → Status Calculation → ZMQ → Dashboard → Display
-                                         ↓
-                                 Email Notifications
-```
-
-#### Pure Auto-Discovery
- **No config files required** for basic operation
- **Runtime discovery** of system capabilities
- **Service auto-detection** via systemd patterns
- **Storage device enumeration** via /sys filesystem
-
-## Troubleshooting
-
-### Common Issues
-
-#### Agent Won't Start
-```bash
-# Check permissions (agent requires root)
-sudo cm-dashboard-agent -v
-
-# Verify ZMQ binding
-sudo netstat -tulpn | grep 6130
-
-# Check system access
-sudo smartctl --scan
-```
-
-#### Dashboard Connection Issues
-```bash
-# Test ZMQ connectivity
-cm-dashboard --zmq-endpoint tcp://target-host:6130 -v
-
-# Check network connectivity
-telnet target-host 6130
-```
-
-#### Email Notifications Not Working
-```bash
-# Check postfix status
-sudo systemctl status postfix
-
-# Test SMTP manually
-telnet localhost 25
-
-# Verify notification settings
-sudo cm-dashboard-agent -v | grep notification
-```
-
-### Logging
-Set `RUST_LOG=debug` for detailed logging:
-```bash
-RUST_LOG=debug sudo cm-dashboard-agent
-RUST_LOG=debug cm-dashboard
-```
-
-## License
-
-MIT License - see LICENSE file for details.
-
-## Contributing
-
-1. Fork the repository
-2. Create feature branch (`git checkout -b feature/amazing-feature`)
-3. Commit changes (`git commit -m 'Add amazing feature'`)
-4. Push to branch (`git push origin feature/amazing-feature`)
-5. Open Pull Request
-
-For bugs and feature requests, please use GitHub Issues.
+- **tokio** - Async runtime
+- **zmq** - Message passing between agent and dashboard
+- **ratatui** - Terminal user interface
+- **serde** - Serialization for metrics and config
+- **anyhow/thiserror** - Error handling
+- **tracing** - Structured logging
+- **lettre** - SMTP email notifications
+- **clap** - Command-line argument parsing
+- **toml** - Configuration file parsing

 ## NixOS Integration

-### Updating cm-dashboard in NixOS Configuration
+This project is designed for declarative deployment via NixOS:

-When new code is pushed to the cm-dashboard repository, follow these steps to update the NixOS configuration:
+### Configuration Generation

-#### 1. Get the Latest Commit Hash
-```bash
-# Get the latest commit from the API
-curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | head -20
+The NixOS module automatically generates the agent configuration:

-# Or use git
-git log --oneline -1
-```
-
-#### 2. Update the NixOS Configuration
-Edit `hosts/common/cm-dashboard.nix` and update the `rev` field:
 ```nix
-src = pkgs.fetchFromGitea {
-  domain = "gitea.cmtec.se";
-  owner = "cm";
-  repo = "cm-dashboard";
-  rev = "f786d054f2ece80823f85e46933857af96e241b2";  # Update this
-  hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";  # Reset temporarily
+# hosts/common/cm-dashboard.nix
+services.cm-dashboard-agent = {
+  enable = true;
+  port = 6130;
 };
 ```

-#### 3. Get the Correct Hash
-Build with placeholder hash to get the actual hash:
-```bash
-nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchFromGitea { 
-  domain = "gitea.cmtec.se"; 
-  owner = "cm"; 
-  repo = "cm-dashboard"; 
-  rev = "YOUR_COMMIT_HASH"; 
-  hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; 
-}' 2>&1 | grep "got:"
-```
-
-Example output:
-```
-error: hash mismatch in fixed-output derivation '/nix/store/...':
-         specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
-            got:    sha256-x8crxNusOUYRrkP9mYEOG+Ga3JCPIdJLkEAc5P1ZxdQ=
-```
-
-#### 4. Update the Hash
-Replace the placeholder with the correct hash from the error message (the "got:" line):
-```nix
-hash = "sha256-vjy+j91iDCHUf0RE43anK4WZ+rKcyohP/3SykwZGof8=";  # Use actual hash
-```
-
-#### 5. Update Cargo Dependencies (if needed)
-If Cargo.lock has changed, you may need to update `cargoHash`:
-```bash
-# Build to get cargo hash error
-nix-build --no-out-link --expr 'with import <nixpkgs> {}; rustPlatform.buildRustPackage rec { 
-  pname = "cm-dashboard"; 
-  version = "0.1.0"; 
-  src = fetchFromGitea { 
-    domain = "gitea.cmtec.se"; 
-    owner = "cm"; 
-    repo = "cm-dashboard"; 
-    rev = "YOUR_COMMIT_HASH"; 
-    hash = "YOUR_SOURCE_HASH"; 
-  }; 
-  cargoHash = ""; 
-  nativeBuildInputs = [ pkg-config ]; 
-  buildInputs = [ openssl ]; 
-  buildAndTestSubdir = "."; 
-  cargoBuildFlags = [ "--workspace" ]; 
-}' 2>&1 | grep "got:"
-```
-
-Then update `cargoHash` in the configuration.
-
-#### 6. Commit the Changes
+### Deployment
+
 ```bash
+# Update NixOS configuration
 git add hosts/common/cm-dashboard.nix
-git commit -m "Update cm-dashboard to latest version"
+git commit -m "Update cm-dashboard configuration"
 git push
+
+# Rebuild system (user-performed)
+sudo nixos-rebuild switch --flake .
 ```

-### Example Update Process
-```bash
-# 1. Get latest commit
-LATEST_COMMIT=$(curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | grep '"sha"' | head -1 | cut -d'"' -f4)
+## Monitoring Intervals

-# 2. Get source hash
-SOURCE_HASH=$(nix-build --no-out-link -E "with import <nixpkgs> {}; fetchFromGitea { domain = \"gitea.cmtec.se\"; owner = \"cm\"; repo = \"cm-dashboard\"; rev = \"$LATEST_COMMIT\"; hash = \"sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=\"; }" 2>&1 | grep "got:" | cut -d' ' -f12)
+- **CPU/Memory**: 2 seconds (real-time monitoring)
+- **Disk usage**: 300 seconds (5 minutes)
+- **Systemd services**: 10 seconds
+- **SMART health**: 600 seconds (10 minutes)  
+- **Backup status**: 60 seconds (1 minute)
+- **Email notifications**: 30 seconds (batched)
+- **Dashboard updates**: 1 second (real-time display)

-# 3. Update configuration and commit
-echo "Latest commit: $LATEST_COMMIT"
-echo "Source hash: $SOURCE_HASH"
-```
+## License
+
+MIT License - see LICENSE file for details