Replaced complex disk collector with simple lsblk → df → group workflow. Supports both physical drives and mergerfs pools with unified metrics. Eliminates configuration complexity through pure auto-discovery. - Clean discovery pipeline using lsblk and df commands - Physical drive grouping with filesystem children - MergerFS pool detection with parity heuristics - Unified metric generation for consistent dashboard display - SMART data collection for temperature, wear, and health
11 KiB
CM Dashboard - Infrastructure Monitoring TUI
Overview
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
Current Features
Core Functionality
- Real-time Monitoring: CPU, RAM, Storage, and Service status
- Service Management: Start/stop services with user-stopped tracking
- Multi-host Support: Monitor multiple servers from single dashboard
- NixOS Integration: System rebuild via SSH + tmux popup
- Backup Monitoring: Borgbackup status and scheduling
User-Stopped Service Tracking
- Services stopped via dashboard are marked as "user-stopped"
- User-stopped services report Status::OK instead of Warning
- Prevents false alerts during intentional maintenance
- Persistent storage survives agent restarts
- Automatic flag clearing when services are restarted via dashboard
Custom Service Logs
- Configure service-specific log file paths per host in dashboard config
- Press
Lon any service to view custom log files viatail -f - Configuration format in dashboard config:
[service_logs]
hostname1 = [
{ service_name = "nginx", log_file_path = "/var/log/nginx/access.log" },
{ service_name = "app", log_file_path = "/var/log/myapp/app.log" }
]
hostname2 = [
{ service_name = "database", log_file_path = "/var/log/postgres/postgres.log" }
]
Service Management
- Direct Control: Arrow keys (↑↓) or vim keys (j/k) navigate services
- Service Actions:
s- Start service (sends UserStart command)S- Stop service (sends UserStop command)J- Show service logs (journalctl in tmux popup)L- Show custom log files (tail -f custom paths in tmux popup)R- Rebuild current host
- Visual Status: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
- Transitional Icons: Blue arrows during operations
Navigation
- Tab: Switch between hosts
- ↑↓ or j/k: Select services
- s: Start selected service (UserStart)
- S: Stop selected service (UserStop)
- J: Show service logs (journalctl)
- L: Show custom log files
- R: Rebuild current host
- B: Run backup on current host
- q: Quit dashboard
Core Architecture Principles
Individual Metrics Philosophy
- Agent collects individual metrics, dashboard composes widgets
- Each metric collected, transmitted, and stored individually
- Agent calculates status for each metric using thresholds
- Dashboard aggregates individual metric statuses for widget status
Maintenance Mode
- Agent checks for
/tmp/cm-maintenancefile before sending notifications - File presence suppresses all email notifications while continuing monitoring
- Dashboard continues to show real status, only notifications are blocked
Usage:
# Enable maintenance mode
touch /tmp/cm-maintenance
# Run maintenance tasks
systemctl stop service
# ... maintenance work ...
systemctl start service
# Disable maintenance mode
rm /tmp/cm-maintenance
Development and Deployment Architecture
Development Path
- Location:
~/projects/cm-dashboard - Purpose: Development workflow only - for committing new code
- Access: Only for developers to commit changes
Deployment Path
- Location:
/var/lib/cm-dashboard/nixos-config - Purpose: Production deployment only - agent clones/pulls from git
- Workflow: git pull →
/var/lib/cm-dashboard/nixos-config→ nixos-rebuild
Git Flow
Development: ~/projects/cm-dashboard → git commit → git push
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
Automated Binary Release System
CM Dashboard uses automated binary releases instead of source builds.
Creating New Releases
cd ~/projects/cm-dashboard
git tag v0.1.X
git push origin v0.1.X
This automatically:
- Builds static binaries with
RUSTFLAGS="-C target-feature=+crt-static" - Creates GitHub-style release with tarball
- Uploads binaries via Gitea API
NixOS Configuration Updates
Edit ~/projects/nixosbox/hosts/services/cm-dashboard.nix:
version = "v0.1.X";
src = pkgs.fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-NEW_HASH_HERE";
};
Get Release Hash
cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
Building
Testing & Building:
- Workspace builds:
nix-shell -p openssl pkg-config --run "cargo build --workspace" - Clean compilation: Remove
target/between major changes
Enhanced Storage Pool Visualization
Auto-Discovery Architecture
The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping.
Discovery Process
At Agent Startup:
- Parse
/proc/mountsto identify all mounted filesystems - Detect MergerFS pools by analyzing
fuse.mergerfsmount sources - Identify member disks and potential parity relationships via heuristics
- Store discovered storage topology for continuous monitoring
- Generate pool-aware metrics with hierarchical relationships
Continuous Monitoring:
- Use stored discovery data for efficient metric collection
- Monitor individual drives for SMART data, temperature, wear
- Calculate pool-level health based on member drive status
- Generate enhanced metrics for dashboard visualization
Supported Storage Types
Single Disks:
- ext4, xfs, btrfs mounted directly
- Individual drive monitoring with SMART data
- Traditional single-disk display for root, boot, etc.
MergerFS Pools:
- Auto-detect from
/proc/mountsfuse.mergerfs entries - Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2")
- Heuristic parity disk detection (sequential device names, "parity" in path)
- Pool health calculation (healthy/degraded/critical)
- Hierarchical tree display with data/parity disk grouping
Future Extensions Ready:
- RAID arrays via
/proc/mdstatparsing - ZFS pools via
zpool statusintegration - LVM logical volumes via
lvsdiscovery
Configuration
[collectors.disk]
enabled = true
auto_discover = true # Default: true
# Optional exclusions for special filesystems
exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"]
exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
Display Format
Storage:
● /srv/media (mergerfs (2+1)):
├─ Pool Status: ● Healthy (3 drives)
├─ Total: ● 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ Parity: ● sdc T: 24°C
● /:
├─ ● nvme0n1 W: 13%
└─ ● 7% 14.5GB/218.5GB
Implementation Benefits
- Zero Configuration: No manual pool definitions required
- Always Accurate: Reflects actual system state automatically
- Scales Automatically: Handles any number of pools without config changes
- Backwards Compatible: Single disks continue working unchanged
- Future Ready: Easy extension for additional storage technologies
Current Status (v0.1.100)
✅ Completed:
- Auto-discovery system implemented and deployed
/proc/mountsparsing with smart heuristics for parity detection- Storage topology stored at agent startup for efficient monitoring
- Universal zero-configuration for all hosts (cmbox, steambox, simonbox, srv01, srv02, srv03)
- Enhanced pool health calculation (healthy/degraded/critical)
- Hierarchical tree visualization with data/parity disk separation
🔄 In Progress - Complete Disk Collector Rewrite:
The current disk collector has grown complex with mixed legacy/auto-discovery approaches. Planning complete rewrite with clean, simple workflow supporting both physical drives and mergerfs pools.
New Clean Architecture:
Discovery Workflow:
lsblkto detect all mount points and backing devicesdfto get filesystem usage for each mount point- Group by physical drive (nvme0n1, sda, etc.)
- Parse
/proc/mountsfor mergerfs pools - Generate unified metrics for both storage types
Physical Drive Display:
● nvme0n1:
├─ ● Drive: T: 35°C W: 1%
├─ ● Total: 23% 218.0GB/928.2GB
├─ ● /boot: 11% 0.1GB/1.0GB
└─ ● /: 23% 214.9GB/928.2GB
MergerFS Pool Display:
● /srv/media (mergerfs):
├─ ● Pool: 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ ● sdc T: 24°C (parity)
Implementation Benefits:
- Pure auto-discovery: No configuration needed
- Clean code paths: Single workflow for all storage types
- Consistent display: Status icons on every line, no redundant text
- Simple pipeline: lsblk → df → group → metrics
- Support for both: Physical drives and mergerfs pools
Important Communication Guidelines
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
Commit Message Guidelines
NEVER mention:
- Claude or any AI assistant names
- Automation or AI-generated content
- Any reference to automated code generation
ALWAYS:
- Focus purely on technical changes and their purpose
- Use standard software development commit message format
- Describe what was changed and why, not how it was created
- Write from the perspective of a human developer
Examples:
- ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation"
- ❌ "Automated refactoring"
- ✅ "Implement maintenance mode for backup operations"
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
Implementation Rules
- Individual Metrics: Each metric is collected, transmitted, and stored individually
- Agent Status Authority: Agent calculates status for each metric using thresholds
- Dashboard Composition: Dashboard widgets subscribe to specific metrics by name
- Status Aggregation: Dashboard aggregates individual metric statuses for widget status
NEVER:
- Copy/paste ANY code from legacy implementations
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
- Create files unless absolutely necessary for achieving goals
- Create documentation files unless explicitly requested
ALWAYS:
- Prefer editing existing files to creating new ones
- Follow existing code conventions and patterns
- Use existing libraries and utilities
- Follow security best practices