All checks were successful
Build and Release / build-and-release (push) Successful in 1m24s
- Replace simulated progress messages with actual stdout/stderr capture - Stream all nixos-rebuild output line-by-line to terminal popup - Show transparent build process including downloads, compilation, and activation - Maintain real-time visibility into complete rebuild process
CM Dashboard
A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ.
Current Implementation
This is a complete rewrite implementing an individual metrics architecture where:
- Agent collects individual metrics (e.g.,
cpu_load_1min,memory_usage_percent) and calculates status - Dashboard subscribes to specific metrics and composes widgets
- Status Aggregation provides intelligent email notifications with batching
- Persistent Cache prevents false notifications on restart
Dashboard Interface
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
│CPU: ││Service: Status: RAM: Disk: │
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │
│RAM: ││● docker-registry active 19M 496MB │
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │
│Disk nvme0n1: ││● haasp-core active 9M 1MB │
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │
│ ││● mosquitto active 1M 1MB │
│ ││● mysql active 38M 225MB │
│ ││● nginx active 28M 24MB │
│ ││ ├─ ● gitea.cmtec.se 51ms │
│ ││ ├─ ● haasp.cmtec.se 43ms │
│ ││ ├─ ● haasp.net 43ms │
│ ││ ├─ ● pages.cmtec.se 45ms │
└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
│● kryddorten 2 archives 67.6MB ││ │
│● mariehall2 2 archives 321.8MB ││ │
│● nixosbox 2 archives 4.5MB ││ │
│● unifi 2 archives 2.9MB ││ │
│● vaultwarden 2 archives 305kB ││ │
└────────────────────────────────────┘└─────────────────────────────────────────────────┘
Navigation: ←→ switch hosts, r refresh, q quit
Features
- Real-time monitoring - Dashboard updates every 1-2 seconds
- Individual metric collection - Granular data for flexible dashboard composition
- Intelligent status aggregation - Host-level status calculated from all services
- Smart email notifications - Batched, detailed alerts with service groupings
- Persistent state - Prevents false notifications on restarts
- ZMQ communication - Efficient agent-to-dashboard messaging
- Clean TUI - Terminal-based dashboard with color-coded status indicators
Architecture
Core Components
- Agent (
cm-dashboard-agent) - Collects metrics and sends via ZMQ - Dashboard (
cm-dashboard) - Real-time TUI display consuming metrics - Shared (
cm-dashboard-shared) - Common types and protocol - Status Aggregation - Intelligent batching and notification management
- Persistent Cache - Maintains state across restarts
Status Levels
- 🟢 Ok - Service running normally
- 🔵 Pending - Service starting/stopping/reloading
- 🟡 Warning - Service issues (high load, memory, disk usage)
- 🔴 Critical - Service failed or critical thresholds exceeded
- ❓ Unknown - Service state cannot be determined
Quick Start
Build
# With Nix (recommended)
nix-shell -p openssl pkg-config --run "cargo build --workspace"
# Or with system dependencies
sudo apt install libssl-dev pkg-config # Ubuntu/Debian
cargo build --workspace
Run
# Start agent (requires configuration file)
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
# Start dashboard
./target/debug/cm-dashboard --config /path/to/dashboard.toml
Configuration
Agent Configuration (agent.toml)
The agent requires a comprehensive TOML configuration file:
collection_interval_seconds = 2
[zmq]
publisher_port = 6130
command_port = 6131
bind_address = "0.0.0.0"
timeout_ms = 5000
heartbeat_interval_ms = 30000
[collectors.cpu]
enabled = true
interval_seconds = 2
load_warning_threshold = 9.0
load_critical_threshold = 10.0
temperature_warning_threshold = 100.0
temperature_critical_threshold = 110.0
[collectors.memory]
enabled = true
interval_seconds = 2
usage_warning_percent = 80.0
usage_critical_percent = 95.0
[collectors.disk]
enabled = true
interval_seconds = 300
usage_warning_percent = 80.0
usage_critical_percent = 90.0
[[collectors.disk.filesystems]]
name = "root"
uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79"
mount_point = "/"
fs_type = "ext4"
monitor = true
[collectors.systemd]
enabled = true
interval_seconds = 10
memory_warning_mb = 1000.0
memory_critical_mb = 2000.0
service_name_filters = [
"nginx*", "postgresql*", "redis*", "docker*", "sshd*",
"gitea*", "immich*", "haasp*", "mosquitto*", "mysql*",
"unifi*", "vaultwarden*"
]
excluded_services = [
"nginx-config-reload", "sshd-keygen", "systemd-",
"getty@", "user@", "dbus-", "NetworkManager-"
]
[notifications]
enabled = true
smtp_host = "localhost"
smtp_port = 25
from_email = "{hostname}@example.com"
to_email = "admin@example.com"
rate_limit_minutes = 0
trigger_on_warnings = true
trigger_on_failures = true
recovery_requires_all_ok = true
suppress_individual_recoveries = true
[status_aggregation]
enabled = true
aggregation_method = "worst_case"
notification_interval_seconds = 30
[cache]
persist_path = "/var/lib/cm-dashboard/cache.json"
Dashboard Configuration (dashboard.toml)
[zmq]
hosts = [
{ name = "server1", address = "192.168.1.100", port = 6130 },
{ name = "server2", address = "192.168.1.101", port = 6130 }
]
connection_timeout_ms = 5000
reconnect_interval_ms = 10000
[ui]
refresh_interval_ms = 1000
theme = "dark"
Collectors
The agent implements several specialized collectors:
CPU Collector (cpu.rs)
- Load average (1, 5, 15 minute)
- CPU temperature monitoring
- Real-time process monitoring (top CPU consumers)
- Status calculation with configurable thresholds
Memory Collector (memory.rs)
- RAM usage (total, used, available)
- Swap monitoring
- Real-time process monitoring (top RAM consumers)
- Memory pressure detection
Disk Collector (disk.rs)
- Filesystem usage per mount point
- SMART health monitoring
- Temperature and wear tracking
- Configurable filesystem monitoring
Systemd Collector (systemd.rs)
- Service status monitoring (
active,inactive,failed) - Memory usage per service
- Service filtering and exclusions
- Handles transitional states (
Status::Pending)
Backup Collector (backup.rs)
- Reads TOML status files from backup systems
- Archive age verification
- Disk usage tracking
- Repository health monitoring
Email Notifications
Intelligent Batching
The system implements smart notification batching to prevent email spam:
- Real-time dashboard updates - Status changes appear immediately
- Batched email notifications - Aggregated every 30 seconds
- Detailed groupings - Services organized by severity
Example Alert Email
Subject: Status Alert: 2 critical, 1 warning, 15 started
Status Summary (30s duration)
Host Status: Ok → Warning
🔴 CRITICAL ISSUES (2):
postgresql: Ok → Critical
nginx: Warning → Critical
🟡 WARNINGS (1):
redis: Ok → Warning (memory usage 85%)
✅ RECOVERIES (0):
🟢 SERVICE STARTUPS (15):
docker: Unknown → Ok
sshd: Unknown → Ok
...
--
CM Dashboard Agent
Generated at 2025-10-21 19:42:42 CET
Individual Metrics Architecture
The system follows a metrics-first architecture:
Agent Side
// Agent collects individual metrics
vec![
Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok),
Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning),
Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok),
]
Dashboard Side
// Widgets subscribe to specific metrics
impl Widget for CpuWidget {
fn update_from_metrics(&mut self, metrics: &[&Metric]) {
for metric in metrics {
match metric.name.as_str() {
"cpu_load_1min" => self.load_1min = metric.value.as_f32(),
"cpu_load_5min" => self.load_5min = metric.value.as_f32(),
"cpu_temperature_celsius" => self.temperature = metric.value.as_f32(),
_ => {}
}
}
}
}
Persistent Cache
The cache system prevents false notifications:
- Automatic saving - Saves when service status changes
- Persistent storage - Maintains state across agent restarts
- Simple design - No complex TTL or cleanup logic
- Status preservation - Prevents duplicate notifications
Development
Project Structure
cm-dashboard/
├── agent/ # Metrics collection agent
│ ├── src/
│ │ ├── collectors/ # CPU, memory, disk, systemd, backup
│ │ ├── status/ # Status aggregation and notifications
│ │ ├── cache/ # Persistent metric caching
│ │ ├── config/ # TOML configuration loading
│ │ └── notifications/ # Email notification system
├── dashboard/ # TUI dashboard application
│ ├── src/
│ │ ├── ui/widgets/ # CPU, memory, services, backup widgets
│ │ ├── metrics/ # Metric storage and filtering
│ │ └── communication/ # ZMQ metric consumption
├── shared/ # Shared types and utilities
│ └── src/
│ ├── metrics.rs # Metric, Status, and Value types
│ ├── protocol.rs # ZMQ message format
│ └── cache.rs # Cache configuration
└── README.md # This file
Building
# Debug build
cargo build --workspace
# Release build
cargo build --workspace --release
# Run tests
cargo test --workspace
# Check code formatting
cargo fmt --all -- --check
# Run clippy linter
cargo clippy --workspace -- -D warnings
Dependencies
- tokio - Async runtime
- zmq - Message passing between agent and dashboard
- ratatui - Terminal user interface
- serde - Serialization for metrics and config
- anyhow/thiserror - Error handling
- tracing - Structured logging
- lettre - SMTP email notifications
- clap - Command-line argument parsing
- toml - Configuration file parsing
NixOS Integration
This project is designed for declarative deployment via NixOS:
Configuration Generation
The NixOS module automatically generates the agent configuration:
# hosts/common/cm-dashboard.nix
services.cm-dashboard-agent = {
enable = true;
port = 6130;
};
Deployment
# Update NixOS configuration
git add hosts/common/cm-dashboard.nix
git commit -m "Update cm-dashboard configuration"
git push
# Rebuild system (user-performed)
sudo nixos-rebuild switch --flake .
Monitoring Intervals
- CPU/Memory: 2 seconds (real-time monitoring)
- Disk usage: 300 seconds (5 minutes)
- Systemd services: 10 seconds
- SMART health: 600 seconds (10 minutes)
- Backup status: 60 seconds (1 minute)
- Email notifications: 30 seconds (batched)
- Dashboard updates: 1 second (real-time display)
License
MIT License - see LICENSE file for details
Description
cm-dashboard v0.1.265
Latest
Languages
Rust
100%