Nest IP addresses under physical interface names. Show physical interfaces with status icon on header line. Virtual interfaces show inline with compressed IPs. Format: ● eno1: ├─ ip: 192.168.30.105 └─ tailscale0: 100.125.108.16 Version bump to 0.1.166
11 KiB
CM Dashboard - Infrastructure Monitoring TUI
Overview
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
Current Features
Core Functionality
- Real-time Monitoring: CPU, RAM, Storage, and Service status
- Service Management: Start/stop services with user-stopped tracking
- Multi-host Support: Monitor multiple servers from single dashboard
- NixOS Integration: System rebuild via SSH + tmux popup
- Backup Monitoring: Borgbackup status and scheduling
User-Stopped Service Tracking
- Services stopped via dashboard are marked as "user-stopped"
- User-stopped services report Status::OK instead of Warning
- Prevents false alerts during intentional maintenance
- Persistent storage survives agent restarts
- Automatic flag clearing when services are restarted via dashboard
Custom Service Logs
- Configure service-specific log file paths per host in dashboard config
- Press
Lon any service to view custom log files viatail -f - Configuration format in dashboard config:
[service_logs]
hostname1 = [
{ service_name = "nginx", log_file_path = "/var/log/nginx/access.log" },
{ service_name = "app", log_file_path = "/var/log/myapp/app.log" }
]
hostname2 = [
{ service_name = "database", log_file_path = "/var/log/postgres/postgres.log" }
]
Service Management
- Direct Control: Arrow keys (↑↓) or vim keys (j/k) navigate services
- Service Actions:
s- Start service (sends UserStart command)S- Stop service (sends UserStop command)J- Show service logs (journalctl in tmux popup)L- Show custom log files (tail -f custom paths in tmux popup)R- Rebuild current host
- Visual Status: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
- Transitional Icons: Blue arrows during operations
Navigation
- Tab: Switch between hosts
- ↑↓ or j/k: Select services
- s: Start selected service (UserStart)
- S: Stop selected service (UserStop)
- J: Show service logs (journalctl)
- L: Show custom log files
- R: Rebuild current host
- B: Run backup on current host
- q: Quit dashboard
Core Architecture Principles
Structured Data Architecture (✅ IMPLEMENTED v0.1.131)
Complete migration from string-based metrics to structured JSON data. Eliminates all string parsing bugs and provides type-safe data access.
Previous (String Metrics):
- ❌ Agent sent individual metrics with string names like
disk_nvme0n1_temperature - ❌ Dashboard parsed metric names with underscore counting and string splitting
- ❌ Complex and error-prone metric filtering and extraction logic
Current (Structured Data):
{
"hostname": "cmbox",
"agent_version": "v0.1.131",
"timestamp": 1763926877,
"system": {
"cpu": {
"load_1min": 3.5,
"load_5min": 3.57,
"load_15min": 3.58,
"frequency_mhz": 1500,
"temperature_celsius": 45.2
},
"memory": {
"usage_percent": 25.0,
"total_gb": 23.3,
"used_gb": 5.9,
"swap_total_gb": 10.7,
"swap_used_gb": 0.99,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.0,
"used_gb": 0.3,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 29.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "/",
"usage_percent": 24.0,
"used_gb": 224.9,
"total_gb": 928.2
}
]
}
],
"pools": [
{
"name": "srv_media",
"mount": "/srv/media",
"type": "mergerfs",
"health": "healthy",
"usage_percent": 63.0,
"used_gb": 2355.2,
"total_gb": 3686.4,
"data_drives": [{ "name": "sdb", "temperature_celsius": 24.0 }],
"parity_drives": [{ "name": "sdc", "temperature_celsius": 24.0 }]
}
]
}
},
"services": [
{ "name": "sshd", "status": "active", "memory_mb": 4.5, "disk_gb": 0.0 }
],
"backup": {
"status": "completed",
"last_run": 1763920000,
"next_scheduled": 1764006400,
"total_size_gb": 150.5,
"repository_health": "ok"
}
}
- ✅ Agent sends structured JSON over ZMQ (no legacy support)
- ✅ Type-safe data access:
data.system.storage.drives[0].temperature_celsius - ✅ Complete metric coverage: CPU, memory, storage, services, backup
- ✅ Backward compatibility via bridge conversion to existing UI widgets
- ✅ All string parsing bugs eliminated
Maintenance Mode
- Agent checks for
/tmp/cm-maintenancefile before sending notifications - File presence suppresses all email notifications while continuing monitoring
- Dashboard continues to show real status, only notifications are blocked
Usage:
# Enable maintenance mode
touch /tmp/cm-maintenance
# Run maintenance tasks
systemctl stop service
# ... maintenance work ...
systemctl start service
# Disable maintenance mode
rm /tmp/cm-maintenance
Development and Deployment Architecture
Development Path
- Location:
~/projects/cm-dashboard - Purpose: Development workflow only - for committing new code
- Access: Only for developers to commit changes
Deployment Path
- Location:
/var/lib/cm-dashboard/nixos-config - Purpose: Production deployment only - agent clones/pulls from git
- Workflow: git pull →
/var/lib/cm-dashboard/nixos-config→ nixos-rebuild
Git Flow
Development: ~/projects/cm-dashboard → git commit → git push
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
Automated Binary Release System
CM Dashboard uses automated binary releases instead of source builds.
Creating New Releases
cd ~/projects/cm-dashboard
git tag v0.1.X
git push origin v0.1.X
This automatically:
- Builds static binaries with
RUSTFLAGS="-C target-feature=+crt-static" - Creates GitHub-style release with tarball
- Uploads binaries via Gitea API
NixOS Configuration Updates
Edit ~/projects/nixosbox/hosts/services/cm-dashboard.nix:
version = "v0.1.X";
src = pkgs.fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-NEW_HASH_HERE";
};
Get Release Hash
cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
Building
Testing & Building:
- Workspace builds:
nix-shell -p openssl pkg-config --run "cargo build --workspace" - Clean compilation: Remove
target/between major changes
Enhanced Storage Pool Visualization
Auto-Discovery Architecture
The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping.
Discovery Process
At Agent Startup:
- Parse
/proc/mountsto identify all mounted filesystems - Detect MergerFS pools by analyzing
fuse.mergerfsmount sources - Identify member disks and potential parity relationships via heuristics
- Store discovered storage topology for continuous monitoring
- Generate pool-aware metrics with hierarchical relationships
Continuous Monitoring:
- Use stored discovery data for efficient metric collection
- Monitor individual drives for SMART data, temperature, wear
- Calculate pool-level health based on member drive status
- Generate enhanced metrics for dashboard visualization
Supported Storage Types
Single Disks:
- ext4, xfs, btrfs mounted directly
- Individual drive monitoring with SMART data
- Traditional single-disk display for root, boot, etc.
MergerFS Pools:
- Auto-detect from
/proc/mountsfuse.mergerfs entries - Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2")
- Heuristic parity disk detection (sequential device names, "parity" in path)
- Pool health calculation (healthy/degraded/critical)
- Hierarchical tree display with data/parity disk grouping
Future Extensions Ready:
- RAID arrays via
/proc/mdstatparsing - ZFS pools via
zpool statusintegration - LVM logical volumes via
lvsdiscovery
Configuration
[collectors.disk]
enabled = true
auto_discover = true # Default: true
# Optional exclusions for special filesystems
exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"]
exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
Display Format
Network:
● eno1:
├─ ip: 192.168.30.105
└─ tailscale0: 100.125.108.16
● eno2:
└─ ip: 192.168.32.105
CPU:
● Load: 0.23 0.21 0.13
└─ Freq: 1048 MHz
RAM:
● Usage: 25% 5.8GB/23.3GB
├─ ● /tmp: 2% 0.5GB/2GB
└─ ● /var/tmp: 0% 0GB/1.0GB
Storage:
● 844B9A25 T: 25C W: 4%
├─ ● /: 55% 250.5GB/456.4GB
└─ ● /boot: 26% 0.3GB/1.0GB
● mergerfs /srv/media:
├─ ● 63% 2355.2GB/3686.4GB
├─ ● Data_1: WDZQ8H8D T: 28°C
├─ ● Data_2: GGA04461 T: 28°C
└─ ● Parity: WDZS8RY0 T: 29°C
Backup:
● WD-WCC7K1234567 T: 32°C W: 12%
├─ Last: 2h ago (12.3GB)
├─ Next: in 22h
└─ ● Usage: 45% 678GB/1.5TB
Important Communication Guidelines
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
Commit Message Guidelines
NEVER mention:
- Claude or any AI assistant names
- Automation or AI-generated content
- Any reference to automated code generation
ALWAYS:
- Focus purely on technical changes and their purpose
- Use standard software development commit message format
- Describe what was changed and why, not how it was created
- Write from the perspective of a human developer
Examples:
- ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation"
- ❌ "Automated refactoring"
- ✅ "Implement maintenance mode for backup operations"
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
Implementation Rules
- Agent Status Authority: Agent calculates status for each metric using thresholds
- Dashboard Composition: Dashboard widgets subscribe to specific metrics by name
- Status Aggregation: Dashboard aggregates individual metric statuses for widget status
NEVER:
- Copy/paste ANY code from legacy implementations
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
- Create files unless absolutely necessary for achieving goals
- Create documentation files unless explicitly requested
ALWAYS:
- Prefer editing existing files to creating new ones
- Follow existing code conventions and patterns
- Use existing libraries and utilities
- Follow security best practices