2025-10-13 00:28:06 +02:00
2025-10-13 00:28:06 +02:00
2025-10-12 18:03:32 +02:00
2025-10-13 00:16:24 +02:00
2025-10-13 00:16:24 +02:00
2025-10-12 22:31:46 +02:00
2025-10-13 00:28:06 +02:00
2025-10-13 00:28:06 +02:00

CM Dashboard - Infrastructure Monitoring TUI

A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for specific monitoring needs and API integrations. Features real-time monitoring of all infrastructure components with intelligent email notifications and automatic status calculation.

┌─────────────────────────────────────────────────────────────────────┐
│ CM Dashboard • cmbox                                                 │
├─────────────────────────────────────────────────────────────────────┤
│ Storage • ok:1 warn:0 crit:0       │ Services • ok:1 warn:0 fail:0   │
│ ┌─────────────────────────────────┐ │ ┌─────────────────────────────── │ │
│ │Drive    Temp  Wear Spare Hours │ │ │Service memory: 7.1/23899.7 MiB│ │
│ │nvme0n1  28°C  1%   100%  14489 │ │ │Disk usage: —                  │ │
│ │         Capacity Usage          │ │ │  Service  Memory     Disk      │ │
│ │         954G     77G (8%)       │ │ │✔ sshd     7.1 MiB   —          │ │
│ └─────────────────────────────────┘ │ └─────────────────────────────── │ │
├─────────────────────────────────────────────────────────────────────┤
│ CPU / Memory • warn                 │ Backups                         │
│ System memory: 5251.7/23899.7 MiB  │ Host cmbox awaiting backup      │ │
│ CPU load (1/5/15): 2.18 2.66 2.56  │ metrics                         │ │
│ CPU freq: 1100.1 MHz               │                                 │ │
│ CPU temp: 47.0°C                    │                                 │ │
├─────────────────────────────────────────────────────────────────────┤
│ Alerts • ok:0 warn:3 fail:0        │ Status • ZMQ connected          │
│ cmbox: warning: CPU load 2.18      │ Monitoring • hosts: 3           │ │
│ srv01: pending: awaiting metrics    │ Data source: ZMQ  connected    │ │
│ labbox: pending: awaiting metrics   │ Active host: cmbox (1/3)        │ │
└─────────────────────────────────────────────────────────────────────┘
Keys: [←→] hosts [r]efresh [q]uit

Key Features

Real-time Monitoring

  • Multi-host support for cmbox, labbox, simonbox, steambox, srv01
  • Performance-focused with minimal resource usage
  • Keyboard-driven interface for power users
  • ZMQ gossip network for efficient data distribution

Infrastructure Monitoring

  • NVMe health monitoring with wear prediction and temperature tracking
  • CPU/Memory/GPU telemetry with automatic thresholding
  • Service resource monitoring with per-service CPU and RAM usage
  • Disk usage overview for root filesystems
  • Backup status with detailed metrics and history
  • C-state monitoring for CPU power management analysis

Intelligent Alerting

  • Agent-calculated status with predefined thresholds
  • Email notifications via SMTP with rate limiting
  • Recovery notifications with context about original issues
  • Stockholm timezone support for email timestamps
  • Unified alert pipeline summarizing host health

Architecture

Agent-Dashboard Separation

The system follows a strict separation of concerns:

  • Agent: Single source of truth for all status calculations using defined thresholds
  • Dashboard: Display-only interface that shows agent-provided status
  • Data Flow: Agent (calculations) → Status → Dashboard (display) → Colors

Agent Thresholds (Production)

  • CPU Load: Warning ≥ 5.0, Critical ≥ 8.0
  • Memory Usage: Warning ≥ 80%, Critical ≥ 95%
  • CPU Temperature: Warning ≥ 100°C, Critical ≥ 100°C (effectively disabled)

Email Notification System

  • From: {hostname}@cmtec.se (e.g., cmbox@cmtec.se)
  • To: cm@cmtec.se
  • SMTP: localhost:25 (postfix)
  • Rate Limiting: 30 minutes (configurable)
  • Triggers: Status degradation and recovery with detailed context

Installation

Requirements

  • Rust toolchain 1.75+ (install via rustup)
  • Root privileges for agent (hardware monitoring access)
  • Network access for ZMQ communication (default port 6130)
  • SMTP server for notifications (postfix recommended)

Build from Source

git clone https://github.com/cmtec/cm-dashboard.git
cd cm-dashboard
cargo build --release

Optimized binaries available at:

  • Dashboard: target/release/cm-dashboard
  • Agent: target/release/cm-dashboard-agent

Installation

# Install dashboard
cargo install --path dashboard

# Install agent (requires root for hardware access)
sudo cargo install --path agent

Quick Start

Dashboard

# Run with default configuration
cm-dashboard

# Specify host to monitor
cm-dashboard --host cmbox

# Override ZMQ endpoints
cm-dashboard --zmq-endpoint tcp://srv01:6130,tcp://labbox:6130

# Increase logging verbosity
cm-dashboard -v

Agent (Pure Auto-Discovery)

The agent requires no configuration files and auto-discovers all system components:

# Basic agent startup (auto-detects everything)
sudo cm-dashboard-agent

# With verbose logging for troubleshooting
sudo cm-dashboard-agent -v

The agent automatically:

  • Discovers storage devices for SMART monitoring
  • Detects running systemd services for resource tracking
  • Configures collection intervals based on system capabilities
  • Sets up email notifications using hostname@cmtec.se

Configuration

Dashboard Configuration

The dashboard creates config/dashboard.toml on first run:

[hosts]
default_host = "srv01"

[[hosts.hosts]]
name = "srv01"
enabled = true

[[hosts.hosts]]
name = "cmbox"
enabled = true

[dashboard]
tick_rate_ms = 250
history_duration_minutes = 60

[data_source]
kind = "zmq"

[data_source.zmq]
endpoints = ["tcp://127.0.0.1:6130"]

Agent Configuration (Optional)

The agent works without configuration but supports optional settings:

# Generate example configuration
cm-dashboard-agent --help

# Override specific settings
sudo cm-dashboard-agent \
    --hostname cmbox \
    --bind tcp://*:6130 \
    --interval 5000

Monitoring Components

System Collector

  • CPU Load: 1/5/15 minute averages with warning/critical thresholds
  • Memory Usage: Used/total with percentage calculation
  • CPU Temperature: x86_pkg_temp prioritized for accuracy
  • C-States: Power management state distribution (C0-C10)

Service Collector

  • Systemd Services: Auto-discovery of interesting services
  • Resource Usage: Per-service memory and disk consumption
  • Service Health: Running/stopped status with detailed failure info

SMART Collector

  • NVMe Health: Temperature, wear leveling, spare blocks
  • Drive Capacity: Total/used space with percentage
  • SMART Attributes: Critical health indicators

Backup Collector

  • Restic Integration: Backup status and history
  • Health Monitoring: Success/failure tracking
  • Storage Metrics: Backup size and retention

Keyboard Controls

Key Action
/ h Previous host
/ l / Tab Next host
? Toggle help overlay
r Force refresh
q / Esc Quit

Email Notifications

Notification Triggers

  • Status Degradation: Any status change to warning/critical
  • Recovery: Warning/critical status returning to ok
  • Service Failures: Individual service stop/start events

Example Recovery Email

✅ RESOLVED: system cpu on cmbox

Status Change Alert

Host: cmbox
Component: system
Metric: cpu
Status Change: warning → ok
Time: 2025-10-12 22:15:30 CET

Details:
Recovered from: CPU load (1/5/15min): 6.20 / 5.80 / 4.50
Current status: CPU load (1/5/15min): 3.30 / 3.17 / 2.84

--
CM Dashboard Agent
Generated at 2025-10-12 22:15:30 CET

Rate Limiting

  • Default: 30 minutes between notifications per component
  • Testing: Set to 0 for immediate notifications
  • Configurable: Adjustable per deployment needs

Development

Project Structure

cm-dashboard/
├── agent/                 # Monitoring agent
│   ├── src/
│   │   ├── collectors/    # Data collection modules
│   │   ├── notifications.rs # Email notification system
│   │   └── simple_agent.rs # Main agent logic
├── dashboard/             # TUI dashboard
│   ├── src/
│   │   ├── ui/           # Widget implementations
│   │   ├── data/         # Data structures
│   │   └── app.rs        # Application state
├── shared/               # Common data structures
└── config/              # Configuration files

Development Commands

# Format code
cargo fmt

# Check all packages
cargo check

# Run tests
cargo test

# Build release
cargo build --release

# Run with logging
RUST_LOG=debug cargo run -p cm-dashboard-agent

Architecture Principles

Status Calculation Rules

  • Agent calculates all status using predefined thresholds
  • Dashboard never calculates status - only displays agent data
  • No hardcoded thresholds in dashboard widgets
  • Use "unknown" when agent status missing (never default to "ok")

Data Flow

System Metrics → Agent Collectors → Status Calculation → ZMQ → Dashboard → Display
                                         ↓
                                 Email Notifications

Pure Auto-Discovery

  • No config files required for basic operation
  • Runtime discovery of system capabilities
  • Service auto-detection via systemd patterns
  • Storage device enumeration via /sys filesystem

Troubleshooting

Common Issues

Agent Won't Start

# Check permissions (agent requires root)
sudo cm-dashboard-agent -v

# Verify ZMQ binding
sudo netstat -tulpn | grep 6130

# Check system access
sudo smartctl --scan

Dashboard Connection Issues

# Test ZMQ connectivity
cm-dashboard --zmq-endpoint tcp://target-host:6130 -v

# Check network connectivity
telnet target-host 6130

Email Notifications Not Working

# Check postfix status
sudo systemctl status postfix

# Test SMTP manually
telnet localhost 25

# Verify notification settings
sudo cm-dashboard-agent -v | grep notification

Logging

Set RUST_LOG=debug for detailed logging:

RUST_LOG=debug sudo cm-dashboard-agent
RUST_LOG=debug cm-dashboard

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

For bugs and feature requests, please use GitHub Issues.

Description
Linux TUI dashboard for host health overview
Readme 13 MiB
2025-12-09 10:47:18 +01:00
Languages
Rust 100%