diff --git a/README.md b/README.md index fde4abe..89383c3 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,7 @@ A real-time infrastructure monitoring system with intelligent status aggregation ## Current Implementation This is a complete rewrite implementing an **individual metrics architecture** where: + - **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status - **Dashboard** subscribes to specific metrics and composes widgets - **Status Aggregation** provides intelligent email notifications with batching @@ -14,39 +15,39 @@ This is a complete rewrite implementing an **individual metrics architecture** w ``` cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox -┌system───────────────────────────────────────────┐┌services────────────────────────────────────────────────────┐ -│CPU: ││Service: Status: RAM: Disk: │ -│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │ -│RAM: ││● docker-registry active 19M 496MB │ -│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │ -│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │ -│Disk nvme0n1: ││● haasp-core active 9M 1MB │ -│● Health: PASSED ││● haasp-mqtt active 3M 1MB │ -│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │ -│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │ -│ ││● mosquitto active 1M 1MB │ -│ ││● mysql active 38M 225MB │ -│ ││● nginx active 28M 24MB │ -│ ││ ├─ ● gitea.cmtec.se 51ms │ -│ ││ ├─ ● haasp.cmtec.se 43ms │ -│ ││ ├─ ● haasp.net 43ms │ -│ ││ ├─ ● pages.cmtec.se 45ms │ -└─────────────────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │ -┌backup───────────────────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │ -│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │ -│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │ -│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │ -│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │ -│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │ -│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │ -│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │ -│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │ -│● kryddorten 2 archives 67.6MB ││ │ -│● mariehall2 2 archives 321.8MB ││ │ -│● nixosbox 2 archives 4.5MB ││ │ -│● unifi 2 archives 2.9MB ││ │ -│● vaultwarden 2 archives 305kB ││ │ -└─────────────────────────────────────────────────┘└────────────────────────────────────────────────────────────┘ +┌system──────────────────────────────┐┌services─────────────────────────────────────────┐ +│CPU: ││Service: Status: RAM: Disk: │ +│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │ +│RAM: ││● docker-registry active 19M 496MB │ +│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │ +│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │ +│Disk nvme0n1: ││● haasp-core active 9M 1MB │ +│● Health: PASSED ││● haasp-mqtt active 3M 1MB │ +│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │ +│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │ +│ ││● mosquitto active 1M 1MB │ +│ ││● mysql active 38M 225MB │ +│ ││● nginx active 28M 24MB │ +│ ││ ├─ ● gitea.cmtec.se 51ms │ +│ ││ ├─ ● haasp.cmtec.se 43ms │ +│ ││ ├─ ● haasp.net 43ms │ +│ ││ ├─ ● pages.cmtec.se 45ms │ +└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │ +┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │ +│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │ +│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │ +│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │ +│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │ +│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │ +│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │ +│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │ +│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │ +│● kryddorten 2 archives 67.6MB ││ │ +│● mariehall2 2 archives 321.8MB ││ │ +│● nixosbox 2 archives 4.5MB ││ │ +│● unifi 2 archives 2.9MB ││ │ +│● vaultwarden 2 archives 305kB ││ │ +└────────────────────────────────────┘└─────────────────────────────────────────────────┘ ``` **Navigation**: `←→` switch hosts, `r` refresh, `q` quit @@ -54,7 +55,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox ## Features - **Real-time monitoring** - Dashboard updates every 1-2 seconds -- **Individual metric collection** - Granular data for flexible dashboard composition +- **Individual metric collection** - Granular data for flexible dashboard composition - **Intelligent status aggregation** - Host-level status calculated from all services - **Smart email notifications** - Batched, detailed alerts with service groupings - **Persistent state** - Prevents false notifications on restarts @@ -66,7 +67,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox ### Core Components - **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ -- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics +- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics - **Shared** (`cm-dashboard-shared`) - Common types and protocol - **Status Aggregation** - Intelligent batching and notification management - **Persistent Cache** - Maintains state across restarts @@ -74,7 +75,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox ### Status Levels - **🟢 Ok** - Service running normally -- **🔵 Pending** - Service starting/stopping/reloading +- **🔵 Pending** - Service starting/stopping/reloading - **🟡 Warning** - Service issues (high load, memory, disk usage) - **🔴 Critical** - Service failed or critical thresholds exceeded - **❓ Unknown** - Service state cannot be determined @@ -98,7 +99,7 @@ cargo build --workspace # Start agent (requires configuration file) ./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml -# Start dashboard +# Start dashboard ./target/debug/cm-dashboard --config /path/to/dashboard.toml ``` @@ -199,30 +200,35 @@ theme = "dark" The agent implements several specialized collectors: ### CPU Collector (`cpu.rs`) + - Load average (1, 5, 15 minute) - CPU temperature monitoring - Real-time process monitoring (top CPU consumers) - Status calculation with configurable thresholds -### Memory Collector (`memory.rs`) +### Memory Collector (`memory.rs`) + - RAM usage (total, used, available) - Swap monitoring - Real-time process monitoring (top RAM consumers) - Memory pressure detection ### Disk Collector (`disk.rs`) + - Filesystem usage per mount point - SMART health monitoring - Temperature and wear tracking - Configurable filesystem monitoring ### Systemd Collector (`systemd.rs`) + - Service status monitoring (`active`, `inactive`, `failed`) - Memory usage per service - Service filtering and exclusions - Handles transitional states (`Status::Pending`) ### Backup Collector (`backup.rs`) + - Reads TOML status files from backup systems - Archive age verification - Disk usage tracking @@ -270,6 +276,7 @@ Generated at 2025-10-21 19:42:42 CET The system follows a **metrics-first architecture**: ### Agent Side + ```rust // Agent collects individual metrics vec![ @@ -280,6 +287,7 @@ vec![ ``` ### Dashboard Side + ```rust // Widgets subscribe to specific metrics impl Widget for CpuWidget { @@ -337,7 +345,7 @@ cm-dashboard/ # Debug build cargo build --workspace -# Release build +# Release build cargo build --workspace --release # Run tests @@ -395,11 +403,12 @@ sudo nixos-rebuild switch --flake . - **CPU/Memory**: 2 seconds (real-time monitoring) - **Disk usage**: 300 seconds (5 minutes) - **Systemd services**: 10 seconds -- **SMART health**: 600 seconds (10 minutes) +- **SMART health**: 600 seconds (10 minutes) - **Backup status**: 60 seconds (1 minute) - **Email notifications**: 30 seconds (batched) - **Dashboard updates**: 1 second (real-time display) ## License -MIT License - see LICENSE file for details \ No newline at end of file +MIT License - see LICENSE file for details +