From 0417e2c1f1165d8793cb4b7e38633b63578845f1 Mon Sep 17 00:00:00 2001 From: Christoffer Martinsson Date: Tue, 21 Oct 2025 20:36:03 +0200 Subject: [PATCH] Update README with actual dashboard interface and implementation details --- README.md | 763 ++++++++++++++++++++++-------------------------------- 1 file changed, 312 insertions(+), 451 deletions(-) diff --git a/README.md b/README.md index 23516c8..fde4abe 100644 --- a/README.md +++ b/README.md @@ -1,544 +1,405 @@ -# CM Dashboard - Infrastructure Monitoring TUI +# CM Dashboard -A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for specific monitoring needs and API integrations. Features real-time monitoring of all infrastructure components with intelligent email notifications and automatic status calculation. +A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ. + +## Current Implementation + +This is a complete rewrite implementing an **individual metrics architecture** where: +- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status +- **Dashboard** subscribes to specific metrics and composes widgets +- **Status Aggregation** provides intelligent email notifications with batching +- **Persistent Cache** prevents false notifications on restart + +## Dashboard Interface -### System Widget ``` -┌System───────────────────────────────────────────────────────┐ -│ Memory usage │ -│✔ 3.0 / 7.8 GB │ -│ CPU load CPU temp │ -│✔ 1.05 • 0.96 • 0.58 64.0°C │ -│ C1E C3 C6 C8 C9 C10 │ -│✔ 0.5% 0.5% 10.4% 10.2% 0.4% 77.9% │ -│ GPU load GPU temp │ -│✔ — — │ -└─────────────────────────────────────────────────────────────┘ +cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox +┌system───────────────────────────────────────────┐┌services────────────────────────────────────────────────────┐ +│CPU: ││Service: Status: RAM: Disk: │ +│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │ +│RAM: ││● docker-registry active 19M 496MB │ +│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │ +│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │ +│Disk nvme0n1: ││● haasp-core active 9M 1MB │ +│● Health: PASSED ││● haasp-mqtt active 3M 1MB │ +│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │ +│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │ +│ ││● mosquitto active 1M 1MB │ +│ ││● mysql active 38M 225MB │ +│ ││● nginx active 28M 24MB │ +│ ││ ├─ ● gitea.cmtec.se 51ms │ +│ ││ ├─ ● haasp.cmtec.se 43ms │ +│ ││ ├─ ● haasp.net 43ms │ +│ ││ ├─ ● pages.cmtec.se 45ms │ +└─────────────────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │ +┌backup───────────────────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │ +│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │ +│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │ +│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │ +│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │ +│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │ +│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │ +│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │ +│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │ +│● kryddorten 2 archives 67.6MB ││ │ +│● mariehall2 2 archives 321.8MB ││ │ +│● nixosbox 2 archives 4.5MB ││ │ +│● unifi 2 archives 2.9MB ││ │ +│● vaultwarden 2 archives 305kB ││ │ +└─────────────────────────────────────────────────┘└────────────────────────────────────────────────────────────┘ ``` -### Services Widget (Enhanced) -``` -┌Services────────────────────────────────────────────────────┐ -│ Service Memory (GB) CPU Disk │ -│✔ Service Memory 7.1/23899.7 MiB — │ -│✔ Disk Usage — — 45/100 GB │ -│⚠ CPU Load — 2.18 — │ -│✔ CPU Temperature — 47.0°C — │ -│✔ docker-registry 0.0 GB 0.0% <1 MB │ -│✔ gitea 0.4/4.1 GB 0.2% 970 MB │ -│ 1 active connections │ -│✔ nginx 0.0/1.0 GB 0.0% <1 MB │ -│✔ ├─ docker.cmtec.se │ -│✔ ├─ git.cmtec.se │ -│✔ ├─ gitea.cmtec.se │ -│✔ ├─ haasp.cmtec.se │ -│✔ ├─ pages.cmtec.se │ -│✔ └─ www.kryddorten.se │ -│✔ postgresql 0.1 GB 0.0% 378 MB │ -│ 1 active connections │ -│✔ redis-immich 0.0 GB 0.4% <1 MB │ -│✔ sshd 0.0 GB 0.0% <1 MB │ -│ 1 SSH connection │ -│✔ unifi 0.9/2.0 GB 0.4% 391 MB │ -└────────────────────────────────────────────────────────────┘ -``` +**Navigation**: `←→` switch hosts, `r` refresh, `q` quit -### Storage Widget -``` -┌Storage──────────────────────────────────────────────────────┐ -│ Drive Temp Wear Spare Hours Capacity Usage │ -│✔ nvme0n1 57°C 4% 100% 11463 932G 23G (2%) │ -│ │ -└─────────────────────────────────────────────────────────────┘ -``` +## Features -### Backups Widget -``` -┌Backups──────────────────────────────────────────────────────┐ -│ Backup Status Details │ -│✔ Latest 3h ago 1.4 GiB │ -│ 8 archives, 2.4 GiB total │ -│✔ Disk ok 2.4/468 GB (1%) │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Hosts Widget -``` -┌Hosts────────────────────────────────────────────────────────┐ -│ Host Status Timestamp │ -│✔ cmbox ok 2025-10-13 05:45:28 │ -│✔ srv01 ok 2025-10-13 05:45:28 │ -│? labbox No data received — │ -└─────────────────────────────────────────────────────────────┘ -``` - -**Navigation**: `←→` hosts, `r` refresh, `q` quit - -## Key Features - -### Real-time Monitoring -- **Multi-host support** for cmbox, labbox, simonbox, steambox, srv01 -- **Performance-focused** with minimal resource usage -- **Keyboard-driven interface** for power users -- **ZMQ gossip network** for efficient data distribution - -### Infrastructure Monitoring -- **NVMe health monitoring** with wear prediction and temperature tracking -- **CPU/Memory/GPU telemetry** with automatic thresholding -- **Service resource monitoring** with per-service CPU and RAM usage -- **Disk usage overview** for root filesystems -- **Backup status** with detailed metrics and history -- **C-state monitoring** for CPU power management analysis - -### Intelligent Alerting -- **Agent-calculated status** with predefined thresholds -- **Email notifications** via SMTP with rate limiting -- **Recovery notifications** with context about original issues -- **Stockholm timezone** support for email timestamps -- **Unified alert pipeline** summarizing host health +- **Real-time monitoring** - Dashboard updates every 1-2 seconds +- **Individual metric collection** - Granular data for flexible dashboard composition +- **Intelligent status aggregation** - Host-level status calculated from all services +- **Smart email notifications** - Batched, detailed alerts with service groupings +- **Persistent state** - Prevents false notifications on restarts +- **ZMQ communication** - Efficient agent-to-dashboard messaging +- **Clean TUI** - Terminal-based dashboard with color-coded status indicators ## Architecture -### Agent-Dashboard Separation -The system follows a strict separation of concerns: +### Core Components -- **Agent**: Single source of truth for all status calculations using defined thresholds -- **Dashboard**: Display-only interface that shows agent-provided status -- **Data Flow**: Agent (calculations) → Status → Dashboard (display) → Colors +- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ +- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics +- **Shared** (`cm-dashboard-shared`) - Common types and protocol +- **Status Aggregation** - Intelligent batching and notification management +- **Persistent Cache** - Maintains state across restarts -### Agent Thresholds (Production) -- **CPU Load**: Warning ≥ 5.0, Critical ≥ 8.0 -- **Memory Usage**: Warning ≥ 80%, Critical ≥ 95% -- **CPU Temperature**: Warning ≥ 100°C, Critical ≥ 100°C (effectively disabled) +### Status Levels -### Email Notification System -- **From**: `{hostname}@cmtec.se` (e.g., cmbox@cmtec.se) -- **To**: `cm@cmtec.se` -- **SMTP**: localhost:25 (postfix) -- **Rate Limiting**: 30 minutes (configurable) -- **Triggers**: Status degradation and recovery with detailed context - -## Installation - -### Requirements -- Rust toolchain 1.75+ (install via [`rustup`](https://rustup.rs)) -- Root privileges for agent (hardware monitoring access) -- Network access for ZMQ communication (default port 6130) -- SMTP server for notifications (postfix recommended) - -### Build from Source -```bash -git clone https://github.com/cmtec/cm-dashboard.git -cd cm-dashboard -cargo build --release -``` - -Optimized binaries available at: -- Dashboard: `target/release/cm-dashboard` -- Agent: `target/release/cm-dashboard-agent` - -### Installation -```bash -# Install dashboard -cargo install --path dashboard - -# Install agent (requires root for hardware access) -sudo cargo install --path agent -``` +- **🟢 Ok** - Service running normally +- **🔵 Pending** - Service starting/stopping/reloading +- **🟡 Warning** - Service issues (high load, memory, disk usage) +- **🔴 Critical** - Service failed or critical thresholds exceeded +- **❓ Unknown** - Service state cannot be determined ## Quick Start -### Dashboard -```bash -# Run with default configuration -cm-dashboard - -# Specify host to monitor -cm-dashboard --host cmbox - -# Override ZMQ endpoints -cm-dashboard --zmq-endpoint tcp://srv01:6130,tcp://labbox:6130 - -# Increase logging verbosity -cm-dashboard -v -``` - -### Agent (Pure Auto-Discovery) -The agent requires **no configuration files** and auto-discovers all system components: +### Build ```bash -# Basic agent startup (auto-detects everything) -sudo cm-dashboard-agent +# With Nix (recommended) +nix-shell -p openssl pkg-config --run "cargo build --workspace" -# With verbose logging for troubleshooting -sudo cm-dashboard-agent -v +# Or with system dependencies +sudo apt install libssl-dev pkg-config # Ubuntu/Debian +cargo build --workspace ``` -The agent automatically: -- **Discovers storage devices** for SMART monitoring -- **Detects running systemd services** for resource tracking -- **Configures collection intervals** based on system capabilities -- **Sets up email notifications** using hostname@cmtec.se +### Run + +```bash +# Start agent (requires configuration file) +./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml + +# Start dashboard +./target/debug/cm-dashboard --config /path/to/dashboard.toml +``` ## Configuration -### Dashboard Configuration -The dashboard creates `config/dashboard.toml` on first run: +### Agent Configuration (`agent.toml`) + +The agent requires a comprehensive TOML configuration file: ```toml -[hosts] -default_host = "srv01" +collection_interval_seconds = 2 -[[hosts.hosts]] -name = "srv01" +[zmq] +publisher_port = 6130 +command_port = 6131 +bind_address = "0.0.0.0" +timeout_ms = 5000 +heartbeat_interval_ms = 30000 + +[collectors.cpu] enabled = true +interval_seconds = 2 +load_warning_threshold = 9.0 +load_critical_threshold = 10.0 +temperature_warning_threshold = 100.0 +temperature_critical_threshold = 110.0 -[[hosts.hosts]] -name = "cmbox" +[collectors.memory] enabled = true +interval_seconds = 2 +usage_warning_percent = 80.0 +usage_critical_percent = 95.0 -[dashboard] -tick_rate_ms = 250 -history_duration_minutes = 60 +[collectors.disk] +enabled = true +interval_seconds = 300 +usage_warning_percent = 80.0 +usage_critical_percent = 90.0 -[data_source] -kind = "zmq" +[[collectors.disk.filesystems]] +name = "root" +uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79" +mount_point = "/" +fs_type = "ext4" +monitor = true -[data_source.zmq] -endpoints = ["tcp://127.0.0.1:6130"] +[collectors.systemd] +enabled = true +interval_seconds = 10 +memory_warning_mb = 1000.0 +memory_critical_mb = 2000.0 +service_name_filters = [ + "nginx", "postgresql", "redis", "docker", "sshd" +] +excluded_services = [ + "nginx-config-reload", "sshd-keygen" +] + +[notifications] +enabled = true +smtp_host = "localhost" +smtp_port = 25 +from_email = "{hostname}@example.com" +to_email = "admin@example.com" +rate_limit_minutes = 0 +trigger_on_warnings = true +trigger_on_failures = true +recovery_requires_all_ok = true +suppress_individual_recoveries = true + +[status_aggregation] +enabled = true +aggregation_method = "worst_case" +notification_interval_seconds = 30 + +[cache] +persist_path = "/var/lib/cm-dashboard/cache.json" ``` -### Agent Configuration (Optional) -The agent works without configuration but supports optional settings: +### Dashboard Configuration (`dashboard.toml`) -```bash -# Generate example configuration -cm-dashboard-agent --help +```toml +[zmq] +hosts = [ + { name = "server1", address = "192.168.1.100", port = 6130 }, + { name = "server2", address = "192.168.1.101", port = 6130 } +] +connection_timeout_ms = 5000 +reconnect_interval_ms = 10000 -# Override specific settings -sudo cm-dashboard-agent \ - --hostname cmbox \ - --bind tcp://*:6130 \ - --interval 5000 +[ui] +refresh_interval_ms = 1000 +theme = "dark" ``` -## Widget Layout +## Collectors -### Services Widget Structure -The Services widget now displays both system metrics and services in a unified table: +The agent implements several specialized collectors: -``` -┌Services────────────────────────────────────────────────────┐ -│ Service Memory (GB) CPU Disk │ -│✔ Service Memory 7.1/23899.7 MiB — │ ← System metric as service row -│✔ Disk Usage — — 45/100 GB │ ← System metric as service row -│⚠ CPU Load — 2.18 — │ ← System metric as service row -│✔ CPU Temperature — 47.0°C — │ ← System metric as service row -│✔ docker-registry 0.0 GB 0.0% <1 MB │ ← Regular service -│✔ nginx 0.0/1.0 GB 0.0% <1 MB │ ← Regular service -│✔ ├─ docker.cmtec.se │ ← Nginx site (sub-service) -│✔ ├─ git.cmtec.se │ ← Nginx site (sub-service) -│✔ └─ gitea.cmtec.se │ ← Nginx site (sub-service) -│✔ sshd 0.0 GB 0.0% <1 MB │ ← Regular service -│ 1 SSH connection │ ← Service description -└────────────────────────────────────────────────────────────┘ -``` +### CPU Collector (`cpu.rs`) +- Load average (1, 5, 15 minute) +- CPU temperature monitoring +- Real-time process monitoring (top CPU consumers) +- Status calculation with configurable thresholds -**Row Types:** -- **System Metrics**: CPU Load, Service Memory, Disk Usage, CPU Temperature with status indicators -- **Regular Services**: Full resource data (memory, CPU, disk) with optional description lines -- **Sub-services**: Nginx sites with tree structure, status indicators only (no resource columns) -- **Description Lines**: Connection counts and service-specific info without status indicators +### Memory Collector (`memory.rs`) +- RAM usage (total, used, available) +- Swap monitoring +- Real-time process monitoring (top RAM consumers) +- Memory pressure detection -### Hosts Widget (formerly Alerts) -The Hosts widget provides a summary view of all monitored hosts: +### Disk Collector (`disk.rs`) +- Filesystem usage per mount point +- SMART health monitoring +- Temperature and wear tracking +- Configurable filesystem monitoring -``` -┌Hosts────────────────────────────────────────────────────────┐ -│ Host Status Timestamp │ -│✔ cmbox ok 2025-10-13 05:45:28 │ -│✔ srv01 ok 2025-10-13 05:45:28 │ -│? labbox No data received — │ -└─────────────────────────────────────────────────────────────┘ -``` +### Systemd Collector (`systemd.rs`) +- Service status monitoring (`active`, `inactive`, `failed`) +- Memory usage per service +- Service filtering and exclusions +- Handles transitional states (`Status::Pending`) -## Monitoring Components - -### System Collector -- **CPU Load**: 1/5/15 minute averages with warning/critical thresholds -- **Memory Usage**: Used/total with percentage calculation -- **CPU Temperature**: x86_pkg_temp prioritized for accuracy -- **C-States**: Power management state distribution (C0-C10) - -### Service Collector -- **System Metrics as Services**: CPU Load, Service Memory, Disk Usage, CPU Temperature displayed as individual service rows -- **Systemd Services**: Auto-discovery of interesting services with resource monitoring -- **Nginx Site Monitoring**: Individual rows for each nginx virtual host with tree structure (`├─` and `└─`) -- **Resource Usage**: Per-service memory, CPU, and disk consumption -- **Service Health**: Running/stopped/degraded status with detailed failure info -- **Connection Tracking**: SSH connections, database connections as description lines - -### SMART Collector -- **NVMe Health**: Temperature, wear leveling, spare blocks -- **Drive Capacity**: Total/used space with percentage -- **SMART Attributes**: Critical health indicators - -### Backup Collector -- **Restic Integration**: Backup status and history -- **Health Monitoring**: Success/failure tracking -- **Storage Metrics**: Backup size and retention - -## Keyboard Controls - -| Key | Action | -|-----|--------| -| `←` / `h` | Previous host | -| `→` / `l` / `Tab` | Next host | -| `?` | Toggle help overlay | -| `r` | Force refresh | -| `q` / `Esc` | Quit | +### Backup Collector (`backup.rs`) +- Reads TOML status files from backup systems +- Archive age verification +- Disk usage tracking +- Repository health monitoring ## Email Notifications -### Notification Triggers -- **Status Degradation**: Any status change to warning/critical -- **Recovery**: Warning/critical status returning to ok -- **Service Failures**: Individual service stop/start events +### Intelligent Batching + +The system implements smart notification batching to prevent email spam: + +- **Real-time dashboard updates** - Status changes appear immediately +- **Batched email notifications** - Aggregated every 30 seconds +- **Detailed groupings** - Services organized by severity + +### Example Alert Email -### Example Recovery Email ``` -✅ RESOLVED: system cpu on cmbox +Subject: Status Alert: 2 critical, 1 warning, 15 started -Status Change Alert +Status Summary (30s duration) +Host Status: Ok → Warning -Host: cmbox -Component: system -Metric: cpu -Status Change: warning → ok -Time: 2025-10-12 22:15:30 CET +🔴 CRITICAL ISSUES (2): + postgresql: Ok → Critical + nginx: Warning → Critical -Details: -Recovered from: CPU load (1/5/15min): 6.20 / 5.80 / 4.50 -Current status: CPU load (1/5/15min): 3.30 / 3.17 / 2.84 +🟡 WARNINGS (1): + redis: Ok → Warning (memory usage 85%) + +✅ RECOVERIES (0): + +🟢 SERVICE STARTUPS (15): + docker: Unknown → Ok + sshd: Unknown → Ok + ... -- CM Dashboard Agent -Generated at 2025-10-12 22:15:30 CET +Generated at 2025-10-21 19:42:42 CET ``` -### Rate Limiting -- **Default**: 30 minutes between notifications per component -- **Testing**: Set to 0 for immediate notifications -- **Configurable**: Adjustable per deployment needs +## Individual Metrics Architecture + +The system follows a **metrics-first architecture**: + +### Agent Side +```rust +// Agent collects individual metrics +vec![ + Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok), + Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning), + Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok), +] +``` + +### Dashboard Side +```rust +// Widgets subscribe to specific metrics +impl Widget for CpuWidget { + fn update_from_metrics(&mut self, metrics: &[&Metric]) { + for metric in metrics { + match metric.name.as_str() { + "cpu_load_1min" => self.load_1min = metric.value.as_f32(), + "cpu_load_5min" => self.load_5min = metric.value.as_f32(), + "cpu_temperature_celsius" => self.temperature = metric.value.as_f32(), + _ => {} + } + } + } +} +``` + +## Persistent Cache + +The cache system prevents false notifications: + +- **Automatic saving** - Saves when service status changes +- **Persistent storage** - Maintains state across agent restarts +- **Simple design** - No complex TTL or cleanup logic +- **Status preservation** - Prevents duplicate notifications ## Development ### Project Structure + ``` cm-dashboard/ -├── agent/ # Monitoring agent +├── agent/ # Metrics collection agent │ ├── src/ -│ │ ├── collectors/ # Data collection modules -│ │ ├── notifications.rs # Email notification system -│ │ └── simple_agent.rs # Main agent logic -├── dashboard/ # TUI dashboard +│ │ ├── collectors/ # CPU, memory, disk, systemd, backup +│ │ ├── status/ # Status aggregation and notifications +│ │ ├── cache/ # Persistent metric caching +│ │ ├── config/ # TOML configuration loading +│ │ └── notifications/ # Email notification system +├── dashboard/ # TUI dashboard application │ ├── src/ -│ │ ├── ui/ # Widget implementations -│ │ ├── data/ # Data structures -│ │ └── app.rs # Application state -├── shared/ # Common data structures -└── config/ # Configuration files +│ │ ├── ui/widgets/ # CPU, memory, services, backup widgets +│ │ ├── metrics/ # Metric storage and filtering +│ │ └── communication/ # ZMQ metric consumption +├── shared/ # Shared types and utilities +│ └── src/ +│ ├── metrics.rs # Metric, Status, and Value types +│ ├── protocol.rs # ZMQ message format +│ └── cache.rs # Cache configuration +└── README.md # This file ``` -### Development Commands -```bash -# Format code -cargo fmt +### Building -# Check all packages -cargo check +```bash +# Debug build +cargo build --workspace + +# Release build +cargo build --workspace --release # Run tests -cargo test +cargo test --workspace -# Build release -cargo build --release +# Check code formatting +cargo fmt --all -- --check -# Run with logging -RUST_LOG=debug cargo run -p cm-dashboard-agent +# Run clippy linter +cargo clippy --workspace -- -D warnings ``` -### Architecture Principles +### Dependencies -#### Status Calculation Rules -- **Agent calculates all status** using predefined thresholds -- **Dashboard never calculates status** - only displays agent data -- **No hardcoded thresholds in dashboard** widgets -- **Use "unknown" when agent status missing** (never default to "ok") - -#### Data Flow -``` -System Metrics → Agent Collectors → Status Calculation → ZMQ → Dashboard → Display - ↓ - Email Notifications -``` - -#### Pure Auto-Discovery -- **No config files required** for basic operation -- **Runtime discovery** of system capabilities -- **Service auto-detection** via systemd patterns -- **Storage device enumeration** via /sys filesystem - -## Troubleshooting - -### Common Issues - -#### Agent Won't Start -```bash -# Check permissions (agent requires root) -sudo cm-dashboard-agent -v - -# Verify ZMQ binding -sudo netstat -tulpn | grep 6130 - -# Check system access -sudo smartctl --scan -``` - -#### Dashboard Connection Issues -```bash -# Test ZMQ connectivity -cm-dashboard --zmq-endpoint tcp://target-host:6130 -v - -# Check network connectivity -telnet target-host 6130 -``` - -#### Email Notifications Not Working -```bash -# Check postfix status -sudo systemctl status postfix - -# Test SMTP manually -telnet localhost 25 - -# Verify notification settings -sudo cm-dashboard-agent -v | grep notification -``` - -### Logging -Set `RUST_LOG=debug` for detailed logging: -```bash -RUST_LOG=debug sudo cm-dashboard-agent -RUST_LOG=debug cm-dashboard -``` - -## License - -MIT License - see LICENSE file for details. - -## Contributing - -1. Fork the repository -2. Create feature branch (`git checkout -b feature/amazing-feature`) -3. Commit changes (`git commit -m 'Add amazing feature'`) -4. Push to branch (`git push origin feature/amazing-feature`) -5. Open Pull Request - -For bugs and feature requests, please use GitHub Issues. +- **tokio** - Async runtime +- **zmq** - Message passing between agent and dashboard +- **ratatui** - Terminal user interface +- **serde** - Serialization for metrics and config +- **anyhow/thiserror** - Error handling +- **tracing** - Structured logging +- **lettre** - SMTP email notifications +- **clap** - Command-line argument parsing +- **toml** - Configuration file parsing ## NixOS Integration -### Updating cm-dashboard in NixOS Configuration +This project is designed for declarative deployment via NixOS: -When new code is pushed to the cm-dashboard repository, follow these steps to update the NixOS configuration: +### Configuration Generation -#### 1. Get the Latest Commit Hash -```bash -# Get the latest commit from the API -curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | head -20 +The NixOS module automatically generates the agent configuration: -# Or use git -git log --oneline -1 -``` - -#### 2. Update the NixOS Configuration -Edit `hosts/common/cm-dashboard.nix` and update the `rev` field: ```nix -src = pkgs.fetchFromGitea { - domain = "gitea.cmtec.se"; - owner = "cm"; - repo = "cm-dashboard"; - rev = "f786d054f2ece80823f85e46933857af96e241b2"; # Update this - hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; # Reset temporarily +# hosts/common/cm-dashboard.nix +services.cm-dashboard-agent = { + enable = true; + port = 6130; }; ``` -#### 3. Get the Correct Hash -Build with placeholder hash to get the actual hash: -```bash -nix-build --no-out-link -E 'with import {}; fetchFromGitea { - domain = "gitea.cmtec.se"; - owner = "cm"; - repo = "cm-dashboard"; - rev = "YOUR_COMMIT_HASH"; - hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; -}' 2>&1 | grep "got:" -``` - -Example output: -``` -error: hash mismatch in fixed-output derivation '/nix/store/...': - specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= - got: sha256-x8crxNusOUYRrkP9mYEOG+Ga3JCPIdJLkEAc5P1ZxdQ= -``` - -#### 4. Update the Hash -Replace the placeholder with the correct hash from the error message (the "got:" line): -```nix -hash = "sha256-vjy+j91iDCHUf0RE43anK4WZ+rKcyohP/3SykwZGof8="; # Use actual hash -``` - -#### 5. Update Cargo Dependencies (if needed) -If Cargo.lock has changed, you may need to update `cargoHash`: -```bash -# Build to get cargo hash error -nix-build --no-out-link --expr 'with import {}; rustPlatform.buildRustPackage rec { - pname = "cm-dashboard"; - version = "0.1.0"; - src = fetchFromGitea { - domain = "gitea.cmtec.se"; - owner = "cm"; - repo = "cm-dashboard"; - rev = "YOUR_COMMIT_HASH"; - hash = "YOUR_SOURCE_HASH"; - }; - cargoHash = ""; - nativeBuildInputs = [ pkg-config ]; - buildInputs = [ openssl ]; - buildAndTestSubdir = "."; - cargoBuildFlags = [ "--workspace" ]; -}' 2>&1 | grep "got:" -``` - -Then update `cargoHash` in the configuration. - -#### 6. Commit the Changes +### Deployment + ```bash +# Update NixOS configuration git add hosts/common/cm-dashboard.nix -git commit -m "Update cm-dashboard to latest version" +git commit -m "Update cm-dashboard configuration" git push + +# Rebuild system (user-performed) +sudo nixos-rebuild switch --flake . ``` -### Example Update Process -```bash -# 1. Get latest commit -LATEST_COMMIT=$(curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | grep '"sha"' | head -1 | cut -d'"' -f4) +## Monitoring Intervals -# 2. Get source hash -SOURCE_HASH=$(nix-build --no-out-link -E "with import {}; fetchFromGitea { domain = \"gitea.cmtec.se\"; owner = \"cm\"; repo = \"cm-dashboard\"; rev = \"$LATEST_COMMIT\"; hash = \"sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=\"; }" 2>&1 | grep "got:" | cut -d' ' -f12) +- **CPU/Memory**: 2 seconds (real-time monitoring) +- **Disk usage**: 300 seconds (5 minutes) +- **Systemd services**: 10 seconds +- **SMART health**: 600 seconds (10 minutes) +- **Backup status**: 60 seconds (1 minute) +- **Email notifications**: 30 seconds (batched) +- **Dashboard updates**: 1 second (real-time display) -# 3. Update configuration and commit -echo "Latest commit: $LATEST_COMMIT" -echo "Source hash: $SOURCE_HASH" -``` \ No newline at end of file +## License + +MIT License - see LICENSE file for details \ No newline at end of file