Christoffer Martinsson aaec8e691c
All checks were successful
Build and Release / build-and-release (push) Successful in 1m28s
Bump version to 0.1.259
2025-12-07 14:52:12 +01:00
2025-12-07 14:52:12 +01:00
2025-12-07 14:52:12 +01:00
2025-12-07 14:52:12 +01:00
2025-11-29 16:44:50 +01:00

CM Dashboard

A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.

Features

Core Monitoring

  • Real-time metrics: CPU, RAM, Storage, and Service status
  • Multi-host support: Monitor multiple servers from single dashboard
  • Service management: Start/stop services with intelligent status tracking
  • NixOS integration: System rebuild via SSH + tmux popup
  • Backup monitoring: Borgbackup status and scheduling
  • Email notifications: Intelligent batching prevents spam

User-Stopped Service Tracking

Services stopped via the dashboard are intelligently tracked to prevent false alerts:

  • Smart status reporting: User-stopped services show as Status::OK instead of Warning
  • Persistent storage: Tracking survives agent restarts via JSON storage
  • Automatic management: Flags cleared when services restarted via dashboard
  • Maintenance friendly: No false alerts during intentional service operations

Architecture

Individual Metrics Philosophy

  • Agent: Collects individual metrics, calculates status using thresholds
  • Dashboard: Subscribes to specific metrics, composes widgets from individual data
  • ZMQ Communication: Efficient real-time metric transmission
  • Status Aggregation: Host-level status calculated from all service metrics

Components

┌─────────────────┐    ZMQ     ┌─────────────────┐
│                 │◄──────────►│                 │
│   Agent         │  Metrics   │   Dashboard     │
│   - Collectors  │            │   - TUI         │
│   - Status      │            │   - Widgets     │
│   - Tracking    │            │   - Commands    │
│                 │            │                 │
└─────────────────┘            └─────────────────┘
         │                              │
         ▼                              ▼
┌─────────────────┐            ┌─────────────────┐
│ JSON Storage    │            │ SSH + tmux      │
│ - User-stopped  │            │ - Remote rebuild│
│ - Cache         │            │ - Process       │
│ - State         │            │   isolation     │
└─────────────────┘            └─────────────────┘

Service Control Flow

  1. User Action: Dashboard sends UserStart/UserStop commands
  2. Agent Processing:
    • Marks service as user-stopped (if stopping)
    • Executes systemctl start/stop service
    • Syncs state to global tracker
  3. Status Calculation:
    • Systemd collector checks user-stopped flag
    • Reports Status::OK for user-stopped inactive services
    • Normal Warning status for system failures

Interface

cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
│NixOS:                              ││Service:                  Status:  RAM:   Disk:  │
│Build: 25.05.20251004.3bcc93c       ││● docker                  active   27M    496MB  │
│Agent: v0.1.43                      ││● gitea                   active   579M   2.6GB  │
│Active users: cm, simon             ││● nginx                   active   28M    24MB   │
│CPU:                                ││  ├─ ● gitea.cmtec.se     51ms                   │
│● Load: 0.10 0.52 0.88 • 3000MHz    ││  ├─ ● photos.cmtec.se    41ms                   │
│RAM:                                ││● postgresql              active   112M   357MB  │
│● Usage: 33% 2.6GB/7.6GB            ││● redis-immich            user-stopped           │
│● /tmp: 0% 0B/2.0GB                 ││● sshd                    active   2M     0      │
│Storage:                            ││● unifi                   active   594M   495MB  │
│● root (Single):                    ││                                                 │
│ ├─ ● nvme0n1 W: 1%                 ││                                                 │
│ └─ ● 18% 167.4GB/928.2GB           ││                                                 │
└────────────────────────────────────┘└─────────────────────────────────────────────────┘

Navigation

  • Tab: Switch between hosts
  • ↑↓ or j/k: Navigate services
  • s: Start selected service (UserStart)
  • S: Stop selected service (UserStop)
  • J: Show service logs (journalctl in tmux popup)
  • L: Show custom log files (tail -f custom paths in tmux popup)
  • R: Rebuild current host
  • B: Run backup on current host
  • q: Quit

Status Indicators

  • Green ●: Active service
  • Yellow ◐: Inactive service (system issue)
  • Red ◯: Failed service
  • Blue arrows: Service transitioning (↑ starting, ↓ stopping, ↻ restarting)
  • "user-stopped": Service stopped via dashboard (Status::OK)

Quick Start

Building

# With Nix (recommended)
nix-shell -p openssl pkg-config --run "cargo build --workspace"

# Or with system dependencies
sudo apt install libssl-dev pkg-config  # Ubuntu/Debian
cargo build --workspace

Running

# Start agent (requires configuration)
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml

# Start dashboard (inside tmux session)
tmux
./target/debug/cm-dashboard --config /etc/cm-dashboard/dashboard.toml

Configuration

Agent Configuration

collection_interval_seconds = 2

[zmq]
publisher_port = 6130
command_port = 6131
bind_address = "0.0.0.0"
transmission_interval_seconds = 2

[collectors.cpu]
enabled = true
interval_seconds = 2
load_warning_threshold = 5.0
load_critical_threshold = 10.0

[collectors.memory]
enabled = true
interval_seconds = 2
usage_warning_percent = 80.0
usage_critical_percent = 90.0

[collectors.systemd]
enabled = true
interval_seconds = 10
service_name_filters = ["nginx*", "postgresql*", "docker*", "sshd*"]
excluded_services = ["nginx-config-reload", "systemd-", "getty@"]
nginx_latency_critical_ms = 1000.0
http_timeout_seconds = 10

[notifications]
enabled = true
smtp_host = "localhost"
smtp_port = 25
from_email = "{hostname}@example.com"
to_email = "admin@example.com"
aggregation_interval_seconds = 30

Dashboard Configuration

[zmq]
subscriber_ports = [6130]

[hosts]
predefined_hosts = ["cmbox", "srv01", "srv02"]

[ssh]
rebuild_user = "cm"
rebuild_alias = "nixos-rebuild-cmtec"
backup_alias = "cm-backup-run"

Technical Implementation

Collectors

Systemd Collector

  • Service Discovery: Uses systemctl list-unit-files + list-units --all
  • Status Calculation: Checks user-stopped flag before assigning Warning status
  • Memory Tracking: Per-service memory usage via systemctl show
  • Sub-services: Nginx site latency, Docker containers
  • User-stopped Integration: UserStoppedServiceTracker::is_service_user_stopped()

User-Stopped Service Tracker

  • Storage: /var/lib/cm-dashboard/user-stopped-services.json
  • Thread Safety: Global singleton with Arc<Mutex<>>
  • Persistence: Automatic save on state changes
  • Global Access: Static methods for collector integration

Other Collectors

  • CPU: Load average, temperature, frequency monitoring
  • Memory: RAM/swap usage, tmpfs monitoring
  • Disk: Filesystem usage, SMART health data
  • NixOS: Build version, active users, agent version
  • Backup: Borgbackup repository status and metrics

ZMQ Protocol

// Metric Message
#[derive(Serialize, Deserialize)]
pub struct MetricMessage {
    pub hostname: String,
    pub timestamp: u64,
    pub metrics: Vec<Metric>,
}

// Service Commands
pub enum AgentCommand {
    ServiceControl {
        service_name: String,
        action: ServiceAction,
    },
    SystemRebuild { /* SSH config */ },
    CollectNow,
}

pub enum ServiceAction {
    Start,           // System-initiated
    Stop,            // System-initiated  
    UserStart,       // User via dashboard (clears user-stopped)
    UserStop,        // User via dashboard (marks user-stopped)
    Status,
}

Maintenance Mode

Suppress notifications during planned maintenance:

# Enable maintenance mode
touch /tmp/cm-maintenance

# Perform maintenance
systemctl stop service
# ... work ...
systemctl start service  

# Disable maintenance mode
rm /tmp/cm-maintenance

Email Notifications

Intelligent Batching

  • Real-time dashboard: Immediate status updates
  • Batched emails: Aggregated every 30 seconds
  • Smart grouping: Services organized by severity
  • Recovery suppression: Reduces notification spam

Example Alert

Subject: Status Alert: 1 critical, 2 warnings, 0 recoveries

Status Summary (30s duration)
Host Status: Ok → Warning

🔴 CRITICAL ISSUES (1):
  postgresql: Ok → Critical (memory usage 95%)

🟡 WARNINGS (2):
  nginx: Ok → Warning (high load 8.5)
  redis: user-stopped → Warning (restarted by system)

✅ RECOVERIES (0):

--
CM Dashboard Agent v0.1.43

Development

Project Structure

cm-dashboard/
├── agent/                     # Metrics collection agent
│   ├── src/
│   │   ├── collectors/        # CPU, memory, disk, systemd, backup, nixos
│   │   ├── service_tracker.rs # User-stopped service tracking
│   │   ├── status/            # Status aggregation and notifications
│   │   ├── config/            # TOML configuration loading
│   │   └── communication/     # ZMQ message handling
├── dashboard/                 # TUI dashboard application  
│   ├── src/
│   │   ├── ui/widgets/        # CPU, memory, services, backup, system
│   │   ├── communication/     # ZMQ consumption and commands
│   │   └── app.rs            # Main application loop
├── shared/                    # Shared types and utilities
│   └── src/
│       ├── metrics.rs         # Metric, Status, StatusTracker types
│       ├── protocol.rs        # ZMQ message format
│       └── cache.rs           # Cache configuration
└── CLAUDE.md                  # Development guidelines and rules

Testing

# Build and test
nix-shell -p openssl pkg-config --run "cargo build --workspace"
nix-shell -p openssl pkg-config --run "cargo test --workspace"

# Code quality
cargo fmt --all
cargo clippy --workspace -- -D warnings

Deployment

Automated Binary Releases

# Create new release
cd ~/projects/cm-dashboard
git tag v0.1.X
git push origin v0.1.X

This triggers automated:

  • Static binary compilation with RUSTFLAGS="-C target-feature=+crt-static"
  • GitHub-style release creation
  • Tarball upload to Gitea

NixOS Integration

Update ~/projects/nixosbox/hosts/services/cm-dashboard.nix:

version = "v0.1.43";
src = pkgs.fetchurl {
  url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
  sha256 = "sha256-HASH";
};

Get hash via:

cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
  url = "URL_HERE";
  sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"

Monitoring Intervals

  • Metrics Collection: 2 seconds (CPU, memory, services)
  • Metric Transmission: 2 seconds (ZMQ publish)
  • Dashboard Updates: 1 second (UI refresh)
  • Email Notifications: 30 seconds (batched)
  • Disk Monitoring: 300 seconds (5 minutes)
  • Service Discovery: 300 seconds (5 minutes cache)

License

MIT License - see LICENSE file for details.

Description
Linux TUI dashboard for host health overview
Readme 12 MiB
2025-12-07 14:52:12 +01:00
Languages
Rust 100%