Christoffer Martinsson 1cb6abf58a
All checks were successful
Build and Release / build-and-release (push) Successful in 1m27s
Replace Transmission with qBittorrent for torrent statistics
Update collector to use qBittorrent Web API instead of Transmission RPC.
Query qBittorrent through VPN namespace using existing passwordless sudo
permissions for ip netns exec commands.

- Change service name from transmission-vpn to openvpn-vpn-download
- Replace get_transmission_stats() with get_qbittorrent_stats()
- Use curl through VPN namespace to access qBittorrent API at localhost:8080
- Parse qBittorrent JSON response for state, dlspeed, upspeed
- Count active torrents (downloading, uploading, stalledDL, stalledUP)
- Update version to v0.1.246
2025-12-02 23:31:56 +01:00
2025-11-29 16:44:50 +01:00

CM Dashboard

A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.

Features

Core Monitoring

  • Real-time metrics: CPU, RAM, Storage, and Service status
  • Multi-host support: Monitor multiple servers from single dashboard
  • Service management: Start/stop services with intelligent status tracking
  • NixOS integration: System rebuild via SSH + tmux popup
  • Backup monitoring: Borgbackup status and scheduling
  • Email notifications: Intelligent batching prevents spam

User-Stopped Service Tracking

Services stopped via the dashboard are intelligently tracked to prevent false alerts:

  • Smart status reporting: User-stopped services show as Status::OK instead of Warning
  • Persistent storage: Tracking survives agent restarts via JSON storage
  • Automatic management: Flags cleared when services restarted via dashboard
  • Maintenance friendly: No false alerts during intentional service operations

Architecture

Individual Metrics Philosophy

  • Agent: Collects individual metrics, calculates status using thresholds
  • Dashboard: Subscribes to specific metrics, composes widgets from individual data
  • ZMQ Communication: Efficient real-time metric transmission
  • Status Aggregation: Host-level status calculated from all service metrics

Components

┌─────────────────┐    ZMQ     ┌─────────────────┐
│                 │◄──────────►│                 │
│   Agent         │  Metrics   │   Dashboard     │
│   - Collectors  │            │   - TUI         │
│   - Status      │            │   - Widgets     │
│   - Tracking    │            │   - Commands    │
│                 │            │                 │
└─────────────────┘            └─────────────────┘
         │                              │
         ▼                              ▼
┌─────────────────┐            ┌─────────────────┐
│ JSON Storage    │            │ SSH + tmux      │
│ - User-stopped  │            │ - Remote rebuild│
│ - Cache         │            │ - Process       │
│ - State         │            │   isolation     │
└─────────────────┘            └─────────────────┘

Service Control Flow

  1. User Action: Dashboard sends UserStart/UserStop commands
  2. Agent Processing:
    • Marks service as user-stopped (if stopping)
    • Executes systemctl start/stop service
    • Syncs state to global tracker
  3. Status Calculation:
    • Systemd collector checks user-stopped flag
    • Reports Status::OK for user-stopped inactive services
    • Normal Warning status for system failures

Interface

cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
│NixOS:                              ││Service:                  Status:  RAM:   Disk:  │
│Build: 25.05.20251004.3bcc93c       ││● docker                  active   27M    496MB  │
│Agent: v0.1.43                      ││● gitea                   active   579M   2.6GB  │
│Active users: cm, simon             ││● nginx                   active   28M    24MB   │
│CPU:                                ││  ├─ ● gitea.cmtec.se     51ms                   │
│● Load: 0.10 0.52 0.88 • 3000MHz    ││  ├─ ● photos.cmtec.se    41ms                   │
│RAM:                                ││● postgresql              active   112M   357MB  │
│● Usage: 33% 2.6GB/7.6GB            ││● redis-immich            user-stopped           │
│● /tmp: 0% 0B/2.0GB                 ││● sshd                    active   2M     0      │
│Storage:                            ││● unifi                   active   594M   495MB  │
│● root (Single):                    ││                                                 │
│ ├─ ● nvme0n1 W: 1%                 ││                                                 │
│ └─ ● 18% 167.4GB/928.2GB           ││                                                 │
└────────────────────────────────────┘└─────────────────────────────────────────────────┘

Navigation

  • Tab: Switch between hosts
  • ↑↓ or j/k: Navigate services
  • s: Start selected service (UserStart)
  • S: Stop selected service (UserStop)
  • J: Show service logs (journalctl in tmux popup)
  • L: Show custom log files (tail -f custom paths in tmux popup)
  • R: Rebuild current host
  • B: Run backup on current host
  • q: Quit

Status Indicators

  • Green ●: Active service
  • Yellow ◐: Inactive service (system issue)
  • Red ◯: Failed service
  • Blue arrows: Service transitioning (↑ starting, ↓ stopping, ↻ restarting)
  • "user-stopped": Service stopped via dashboard (Status::OK)

Quick Start

Building

# With Nix (recommended)
nix-shell -p openssl pkg-config --run "cargo build --workspace"

# Or with system dependencies
sudo apt install libssl-dev pkg-config  # Ubuntu/Debian
cargo build --workspace

Running

# Start agent (requires configuration)
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml

# Start dashboard (inside tmux session)
tmux
./target/debug/cm-dashboard --config /etc/cm-dashboard/dashboard.toml

Configuration

Agent Configuration

collection_interval_seconds = 2

[zmq]
publisher_port = 6130
command_port = 6131
bind_address = "0.0.0.0"
transmission_interval_seconds = 2

[collectors.cpu]
enabled = true
interval_seconds = 2
load_warning_threshold = 5.0
load_critical_threshold = 10.0

[collectors.memory]
enabled = true
interval_seconds = 2
usage_warning_percent = 80.0
usage_critical_percent = 90.0

[collectors.systemd]
enabled = true
interval_seconds = 10
service_name_filters = ["nginx*", "postgresql*", "docker*", "sshd*"]
excluded_services = ["nginx-config-reload", "systemd-", "getty@"]
nginx_latency_critical_ms = 1000.0
http_timeout_seconds = 10

[notifications]
enabled = true
smtp_host = "localhost"
smtp_port = 25
from_email = "{hostname}@example.com"
to_email = "admin@example.com"
aggregation_interval_seconds = 30

Dashboard Configuration

[zmq]
subscriber_ports = [6130]

[hosts]
predefined_hosts = ["cmbox", "srv01", "srv02"]

[ssh]
rebuild_user = "cm"
rebuild_alias = "nixos-rebuild-cmtec"
backup_alias = "cm-backup-run"

Technical Implementation

Collectors

Systemd Collector

  • Service Discovery: Uses systemctl list-unit-files + list-units --all
  • Status Calculation: Checks user-stopped flag before assigning Warning status
  • Memory Tracking: Per-service memory usage via systemctl show
  • Sub-services: Nginx site latency, Docker containers
  • User-stopped Integration: UserStoppedServiceTracker::is_service_user_stopped()

User-Stopped Service Tracker

  • Storage: /var/lib/cm-dashboard/user-stopped-services.json
  • Thread Safety: Global singleton with Arc<Mutex<>>
  • Persistence: Automatic save on state changes
  • Global Access: Static methods for collector integration

Other Collectors

  • CPU: Load average, temperature, frequency monitoring
  • Memory: RAM/swap usage, tmpfs monitoring
  • Disk: Filesystem usage, SMART health data
  • NixOS: Build version, active users, agent version
  • Backup: Borgbackup repository status and metrics

ZMQ Protocol

// Metric Message
#[derive(Serialize, Deserialize)]
pub struct MetricMessage {
    pub hostname: String,
    pub timestamp: u64,
    pub metrics: Vec<Metric>,
}

// Service Commands
pub enum AgentCommand {
    ServiceControl {
        service_name: String,
        action: ServiceAction,
    },
    SystemRebuild { /* SSH config */ },
    CollectNow,
}

pub enum ServiceAction {
    Start,           // System-initiated
    Stop,            // System-initiated  
    UserStart,       // User via dashboard (clears user-stopped)
    UserStop,        // User via dashboard (marks user-stopped)
    Status,
}

Maintenance Mode

Suppress notifications during planned maintenance:

# Enable maintenance mode
touch /tmp/cm-maintenance

# Perform maintenance
systemctl stop service
# ... work ...
systemctl start service  

# Disable maintenance mode
rm /tmp/cm-maintenance

Email Notifications

Intelligent Batching

  • Real-time dashboard: Immediate status updates
  • Batched emails: Aggregated every 30 seconds
  • Smart grouping: Services organized by severity
  • Recovery suppression: Reduces notification spam

Example Alert

Subject: Status Alert: 1 critical, 2 warnings, 0 recoveries

Status Summary (30s duration)
Host Status: Ok → Warning

🔴 CRITICAL ISSUES (1):
  postgresql: Ok → Critical (memory usage 95%)

🟡 WARNINGS (2):
  nginx: Ok → Warning (high load 8.5)
  redis: user-stopped → Warning (restarted by system)

✅ RECOVERIES (0):

--
CM Dashboard Agent v0.1.43

Development

Project Structure

cm-dashboard/
├── agent/                     # Metrics collection agent
│   ├── src/
│   │   ├── collectors/        # CPU, memory, disk, systemd, backup, nixos
│   │   ├── service_tracker.rs # User-stopped service tracking
│   │   ├── status/            # Status aggregation and notifications
│   │   ├── config/            # TOML configuration loading
│   │   └── communication/     # ZMQ message handling
├── dashboard/                 # TUI dashboard application  
│   ├── src/
│   │   ├── ui/widgets/        # CPU, memory, services, backup, system
│   │   ├── communication/     # ZMQ consumption and commands
│   │   └── app.rs            # Main application loop
├── shared/                    # Shared types and utilities
│   └── src/
│       ├── metrics.rs         # Metric, Status, StatusTracker types
│       ├── protocol.rs        # ZMQ message format
│       └── cache.rs           # Cache configuration
└── CLAUDE.md                  # Development guidelines and rules

Testing

# Build and test
nix-shell -p openssl pkg-config --run "cargo build --workspace"
nix-shell -p openssl pkg-config --run "cargo test --workspace"

# Code quality
cargo fmt --all
cargo clippy --workspace -- -D warnings

Deployment

Automated Binary Releases

# Create new release
cd ~/projects/cm-dashboard
git tag v0.1.X
git push origin v0.1.X

This triggers automated:

  • Static binary compilation with RUSTFLAGS="-C target-feature=+crt-static"
  • GitHub-style release creation
  • Tarball upload to Gitea

NixOS Integration

Update ~/projects/nixosbox/hosts/services/cm-dashboard.nix:

version = "v0.1.43";
src = pkgs.fetchurl {
  url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
  sha256 = "sha256-HASH";
};

Get hash via:

cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
  url = "URL_HERE";
  sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"

Monitoring Intervals

  • Metrics Collection: 2 seconds (CPU, memory, services)
  • Metric Transmission: 2 seconds (ZMQ publish)
  • Dashboard Updates: 1 second (UI refresh)
  • Email Notifications: 30 seconds (batched)
  • Disk Monitoring: 300 seconds (5 minutes)
  • Service Discovery: 300 seconds (5 minutes cache)

License

MIT License - see LICENSE file for details.

Description
Linux TUI dashboard for host health overview
Readme 12 MiB
2025-12-07 14:52:12 +01:00
Languages
Rust 100%