Christoffer Martinsson dca3642e46 Implement multi-host autoconnect with consolidated host configuration
- Add DEFAULT_HOSTS constant in config.rs for centralized host management
- Update ZMQ endpoint generation to connect to all configured hosts
- Implement graceful connection handling for unreachable endpoints
- Dashboard now auto-discovers and connects to available agents on cmbox, labbox, simonbox, steambox, srv01
2025-10-14 00:44:38 +02:00
2025-10-12 18:03:32 +02:00
2025-10-13 00:16:24 +02:00
2025-10-12 22:31:46 +02:00
2025-10-13 11:23:49 +02:00

CM Dashboard - Infrastructure Monitoring TUI

A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for specific monitoring needs and API integrations. Features real-time monitoring of all infrastructure components with intelligent email notifications and automatic status calculation.

System Widget

┌System───────────────────────────────────────────────────────┐
│  Memory usage                                               │
│✔ 3.0 / 7.8 GB                                               │
│  CPU load            CPU temp                               │
│✔ 1.05 • 0.96 • 0.58  64.0°C                                 │
│  C1E    C3     C6     C8     C9     C10                     │
│✔ 0.5%   0.5%   10.4%  10.2%  0.4%   77.9%                   │
│  GPU load  GPU temp                                         │
│✔ —         —                                                │
└─────────────────────────────────────────────────────────────┘

Services Widget (Enhanced)

┌Services────────────────────────────────────────────────────┐
│  Service          Memory (GB)  CPU    Disk                 │
│✔ Service Memory   7.1/23899.7 MiB     —                   │
│✔ Disk Usage       —           —       45/100 GB           │
│⚠ CPU Load         —           2.18    —                   │
│✔ CPU Temperature  —           47.0°C  —                   │
│✔ docker-registry  0.0 GB       0.0%   <1 MB               │
│✔ gitea            0.4/4.1 GB   0.2%   970 MB               │
│  1 active connections                                      │
│✔ nginx            0.0/1.0 GB   0.0%   <1 MB                │
│✔  ├─ docker.cmtec.se                                      │
│✔  ├─ git.cmtec.se                                         │
│✔  ├─ gitea.cmtec.se                                       │
│✔  ├─ haasp.cmtec.se                                       │
│✔  ├─ pages.cmtec.se                                       │
│✔  └─ www.kryddorten.se                                    │
│✔ postgresql       0.1 GB       0.0%   378 MB               │
│  1 active connections                                      │
│✔ redis-immich     0.0 GB       0.4%   <1 MB                │
│✔ sshd             0.0 GB       0.0%   <1 MB                │
│  1 SSH connection                                          │
│✔ unifi            0.9/2.0 GB   0.4%   391 MB               │
└────────────────────────────────────────────────────────────┘

Storage Widget

┌Storage──────────────────────────────────────────────────────┐
│  Drive    Temp   Wear   Spare  Hours  Capacity  Usage       │
│✔ nvme0n1  57°C   4%     100%   11463  932G      23G (2%)    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Backups Widget

┌Backups──────────────────────────────────────────────────────┐
│  Backup  Status  Details                                    │
│✔ Latest  3h ago  1.4 GiB                                    │
│  8 archives, 2.4 GiB total                                  │
│✔ Disk    ok      2.4/468 GB (1%)                            │
└─────────────────────────────────────────────────────────────┘

Hosts Widget

┌Hosts────────────────────────────────────────────────────────┐
│  Host    Status            Timestamp                        │
│✔ cmbox   ok                2025-10-13 05:45:28              │
│✔ srv01   ok                2025-10-13 05:45:28              │
│? labbox  No data received  —                                │
└─────────────────────────────────────────────────────────────┘

Navigation: ←→ hosts, r refresh, q quit

Key Features

Real-time Monitoring

  • Multi-host support for cmbox, labbox, simonbox, steambox, srv01
  • Performance-focused with minimal resource usage
  • Keyboard-driven interface for power users
  • ZMQ gossip network for efficient data distribution

Infrastructure Monitoring

  • NVMe health monitoring with wear prediction and temperature tracking
  • CPU/Memory/GPU telemetry with automatic thresholding
  • Service resource monitoring with per-service CPU and RAM usage
  • Disk usage overview for root filesystems
  • Backup status with detailed metrics and history
  • C-state monitoring for CPU power management analysis

Intelligent Alerting

  • Agent-calculated status with predefined thresholds
  • Email notifications via SMTP with rate limiting
  • Recovery notifications with context about original issues
  • Stockholm timezone support for email timestamps
  • Unified alert pipeline summarizing host health

Architecture

Agent-Dashboard Separation

The system follows a strict separation of concerns:

  • Agent: Single source of truth for all status calculations using defined thresholds
  • Dashboard: Display-only interface that shows agent-provided status
  • Data Flow: Agent (calculations) → Status → Dashboard (display) → Colors

Agent Thresholds (Production)

  • CPU Load: Warning ≥ 5.0, Critical ≥ 8.0
  • Memory Usage: Warning ≥ 80%, Critical ≥ 95%
  • CPU Temperature: Warning ≥ 100°C, Critical ≥ 100°C (effectively disabled)

Email Notification System

  • From: {hostname}@cmtec.se (e.g., cmbox@cmtec.se)
  • To: cm@cmtec.se
  • SMTP: localhost:25 (postfix)
  • Rate Limiting: 30 minutes (configurable)
  • Triggers: Status degradation and recovery with detailed context

Installation

Requirements

  • Rust toolchain 1.75+ (install via rustup)
  • Root privileges for agent (hardware monitoring access)
  • Network access for ZMQ communication (default port 6130)
  • SMTP server for notifications (postfix recommended)

Build from Source

git clone https://github.com/cmtec/cm-dashboard.git
cd cm-dashboard
cargo build --release

Optimized binaries available at:

  • Dashboard: target/release/cm-dashboard
  • Agent: target/release/cm-dashboard-agent

Installation

# Install dashboard
cargo install --path dashboard

# Install agent (requires root for hardware access)
sudo cargo install --path agent

Quick Start

Dashboard

# Run with default configuration
cm-dashboard

# Specify host to monitor
cm-dashboard --host cmbox

# Override ZMQ endpoints
cm-dashboard --zmq-endpoint tcp://srv01:6130,tcp://labbox:6130

# Increase logging verbosity
cm-dashboard -v

Agent (Pure Auto-Discovery)

The agent requires no configuration files and auto-discovers all system components:

# Basic agent startup (auto-detects everything)
sudo cm-dashboard-agent

# With verbose logging for troubleshooting
sudo cm-dashboard-agent -v

The agent automatically:

  • Discovers storage devices for SMART monitoring
  • Detects running systemd services for resource tracking
  • Configures collection intervals based on system capabilities
  • Sets up email notifications using hostname@cmtec.se

Configuration

Dashboard Configuration

The dashboard creates config/dashboard.toml on first run:

[hosts]
default_host = "srv01"

[[hosts.hosts]]
name = "srv01"
enabled = true

[[hosts.hosts]]
name = "cmbox"
enabled = true

[dashboard]
tick_rate_ms = 250
history_duration_minutes = 60

[data_source]
kind = "zmq"

[data_source.zmq]
endpoints = ["tcp://127.0.0.1:6130"]

Agent Configuration (Optional)

The agent works without configuration but supports optional settings:

# Generate example configuration
cm-dashboard-agent --help

# Override specific settings
sudo cm-dashboard-agent \
    --hostname cmbox \
    --bind tcp://*:6130 \
    --interval 5000

Widget Layout

Services Widget Structure

The Services widget now displays both system metrics and services in a unified table:

┌Services────────────────────────────────────────────────────┐
│  Service          Memory (GB)  CPU    Disk                 │
│✔ Service Memory   7.1/23899.7 MiB     —                   │ ← System metric as service row
│✔ Disk Usage       —           —       45/100 GB           │ ← System metric as service row  
│⚠ CPU Load         —           2.18    —                   │ ← System metric as service row
│✔ CPU Temperature  —           47.0°C  —                   │ ← System metric as service row
│✔ docker-registry  0.0 GB      0.0%    <1 MB               │ ← Regular service
│✔ nginx            0.0/1.0 GB  0.0%    <1 MB               │ ← Regular service
│✔  ├─ docker.cmtec.se                                      │ ← Nginx site (sub-service)
│✔  ├─ git.cmtec.se                                         │ ← Nginx site (sub-service)  
│✔  └─ gitea.cmtec.se                                       │ ← Nginx site (sub-service)
│✔ sshd             0.0 GB      0.0%    <1 MB               │ ← Regular service
│  1 SSH connection                                          │ ← Service description
└────────────────────────────────────────────────────────────┘

Row Types:

  • System Metrics: CPU Load, Service Memory, Disk Usage, CPU Temperature with status indicators
  • Regular Services: Full resource data (memory, CPU, disk) with optional description lines
  • Sub-services: Nginx sites with tree structure, status indicators only (no resource columns)
  • Description Lines: Connection counts and service-specific info without status indicators

Hosts Widget (formerly Alerts)

The Hosts widget provides a summary view of all monitored hosts:

┌Hosts────────────────────────────────────────────────────────┐
│  Host    Status            Timestamp                        │
│✔ cmbox   ok                2025-10-13 05:45:28              │
│✔ srv01   ok                2025-10-13 05:45:28              │
│? labbox  No data received  —                                │
└─────────────────────────────────────────────────────────────┘

Monitoring Components

System Collector

  • CPU Load: 1/5/15 minute averages with warning/critical thresholds
  • Memory Usage: Used/total with percentage calculation
  • CPU Temperature: x86_pkg_temp prioritized for accuracy
  • C-States: Power management state distribution (C0-C10)

Service Collector

  • System Metrics as Services: CPU Load, Service Memory, Disk Usage, CPU Temperature displayed as individual service rows
  • Systemd Services: Auto-discovery of interesting services with resource monitoring
  • Nginx Site Monitoring: Individual rows for each nginx virtual host with tree structure (├─ and └─)
  • Resource Usage: Per-service memory, CPU, and disk consumption
  • Service Health: Running/stopped/degraded status with detailed failure info
  • Connection Tracking: SSH connections, database connections as description lines

SMART Collector

  • NVMe Health: Temperature, wear leveling, spare blocks
  • Drive Capacity: Total/used space with percentage
  • SMART Attributes: Critical health indicators

Backup Collector

  • Restic Integration: Backup status and history
  • Health Monitoring: Success/failure tracking
  • Storage Metrics: Backup size and retention

Keyboard Controls

Key Action
/ h Previous host
/ l / Tab Next host
? Toggle help overlay
r Force refresh
q / Esc Quit

Email Notifications

Notification Triggers

  • Status Degradation: Any status change to warning/critical
  • Recovery: Warning/critical status returning to ok
  • Service Failures: Individual service stop/start events

Example Recovery Email

✅ RESOLVED: system cpu on cmbox

Status Change Alert

Host: cmbox
Component: system
Metric: cpu
Status Change: warning → ok
Time: 2025-10-12 22:15:30 CET

Details:
Recovered from: CPU load (1/5/15min): 6.20 / 5.80 / 4.50
Current status: CPU load (1/5/15min): 3.30 / 3.17 / 2.84

--
CM Dashboard Agent
Generated at 2025-10-12 22:15:30 CET

Rate Limiting

  • Default: 30 minutes between notifications per component
  • Testing: Set to 0 for immediate notifications
  • Configurable: Adjustable per deployment needs

Development

Project Structure

cm-dashboard/
├── agent/                 # Monitoring agent
│   ├── src/
│   │   ├── collectors/    # Data collection modules
│   │   ├── notifications.rs # Email notification system
│   │   └── simple_agent.rs # Main agent logic
├── dashboard/             # TUI dashboard
│   ├── src/
│   │   ├── ui/           # Widget implementations
│   │   ├── data/         # Data structures
│   │   └── app.rs        # Application state
├── shared/               # Common data structures
└── config/              # Configuration files

Development Commands

# Format code
cargo fmt

# Check all packages
cargo check

# Run tests
cargo test

# Build release
cargo build --release

# Run with logging
RUST_LOG=debug cargo run -p cm-dashboard-agent

Architecture Principles

Status Calculation Rules

  • Agent calculates all status using predefined thresholds
  • Dashboard never calculates status - only displays agent data
  • No hardcoded thresholds in dashboard widgets
  • Use "unknown" when agent status missing (never default to "ok")

Data Flow

System Metrics → Agent Collectors → Status Calculation → ZMQ → Dashboard → Display
                                         ↓
                                 Email Notifications

Pure Auto-Discovery

  • No config files required for basic operation
  • Runtime discovery of system capabilities
  • Service auto-detection via systemd patterns
  • Storage device enumeration via /sys filesystem

Troubleshooting

Common Issues

Agent Won't Start

# Check permissions (agent requires root)
sudo cm-dashboard-agent -v

# Verify ZMQ binding
sudo netstat -tulpn | grep 6130

# Check system access
sudo smartctl --scan

Dashboard Connection Issues

# Test ZMQ connectivity
cm-dashboard --zmq-endpoint tcp://target-host:6130 -v

# Check network connectivity
telnet target-host 6130

Email Notifications Not Working

# Check postfix status
sudo systemctl status postfix

# Test SMTP manually
telnet localhost 25

# Verify notification settings
sudo cm-dashboard-agent -v | grep notification

Logging

Set RUST_LOG=debug for detailed logging:

RUST_LOG=debug sudo cm-dashboard-agent
RUST_LOG=debug cm-dashboard

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

For bugs and feature requests, please use GitHub Issues.

NixOS Integration

Updating cm-dashboard in NixOS Configuration

When new code is pushed to the cm-dashboard repository, follow these steps to update the NixOS configuration:

1. Get the Latest Commit Hash

# Get the latest commit from the API
curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | head -20

# Or use git
git log --oneline -1

2. Update the NixOS Configuration

Edit hosts/common/cm-dashboard.nix and update the rev field:

src = pkgs.fetchFromGitea {
  domain = "gitea.cmtec.se";
  owner = "cm";
  repo = "cm-dashboard";
  rev = "f786d054f2ece80823f85e46933857af96e241b2";  # Update this
  hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";  # Reset temporarily
};

3. Get the Correct Hash

Build with placeholder hash to get the actual hash:

nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchFromGitea { 
  domain = "gitea.cmtec.se"; 
  owner = "cm"; 
  repo = "cm-dashboard"; 
  rev = "YOUR_COMMIT_HASH"; 
  hash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; 
}' 2>&1 | grep "got:"

4. Update the Hash

Replace the placeholder with the correct hash:

hash = "sha256-vjy+j91iDCHUf0RE43anK4WZ+rKcyohP/3SykwZGof8=";  # Use actual hash

5. Update Cargo Dependencies (if needed)

If Cargo.lock has changed, you may need to update cargoHash:

# Build to get cargo hash error
nix-build --no-out-link --expr 'with import <nixpkgs> {}; rustPlatform.buildRustPackage rec { 
  pname = "cm-dashboard"; 
  version = "0.1.0"; 
  src = fetchFromGitea { 
    domain = "gitea.cmtec.se"; 
    owner = "cm"; 
    repo = "cm-dashboard"; 
    rev = "YOUR_COMMIT_HASH"; 
    hash = "YOUR_SOURCE_HASH"; 
  }; 
  cargoHash = ""; 
  nativeBuildInputs = [ pkg-config ]; 
  buildInputs = [ openssl ]; 
  buildAndTestSubdir = "."; 
  cargoBuildFlags = [ "--workspace" ]; 
}' 2>&1 | grep "got:"

Then update cargoHash in the configuration.

6. Commit the Changes

git add hosts/common/cm-dashboard.nix
git commit -m "Update cm-dashboard to latest version"
git push

Example Update Process

# 1. Get latest commit
LATEST_COMMIT=$(curl -s "https://gitea.cmtec.se/api/v1/repos/cm/cm-dashboard/commits?sha=main&limit=1" | grep '"sha"' | head -1 | cut -d'"' -f4)

# 2. Get source hash
SOURCE_HASH=$(nix-build --no-out-link -E "with import <nixpkgs> {}; fetchFromGitea { domain = \"gitea.cmtec.se\"; owner = \"cm\"; repo = \"cm-dashboard\"; rev = \"$LATEST_COMMIT\"; hash = \"sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=\"; }" 2>&1 | grep "got:" | cut -d' ' -f12)

# 3. Update configuration and commit
echo "Latest commit: $LATEST_COMMIT"
echo "Source hash: $SOURCE_HASH"
Description
Linux TUI dashboard for host health overview
Readme 13 MiB
2025-12-09 10:47:18 +01:00
Languages
Rust 100%