- Group single disk filesystems by physical drive during auto-discovery - Create physical drive pools with filesystem children - Display temperature, wear, and health at drive level - Provide consistent hierarchical storage visualization - Fix borrow checker issues in create_physical_drive_pool method - Add PhysicalDrive case to all StoragePoolType match statements
10 KiB
CM Dashboard - Infrastructure Monitoring TUI
Overview
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
Current Features
Core Functionality
- Real-time Monitoring: CPU, RAM, Storage, and Service status
- Service Management: Start/stop services with user-stopped tracking
- Multi-host Support: Monitor multiple servers from single dashboard
- NixOS Integration: System rebuild via SSH + tmux popup
- Backup Monitoring: Borgbackup status and scheduling
User-Stopped Service Tracking
- Services stopped via dashboard are marked as "user-stopped"
- User-stopped services report Status::OK instead of Warning
- Prevents false alerts during intentional maintenance
- Persistent storage survives agent restarts
- Automatic flag clearing when services are restarted via dashboard
Custom Service Logs
- Configure service-specific log file paths per host in dashboard config
- Press
Lon any service to view custom log files viatail -f - Configuration format in dashboard config:
[service_logs]
hostname1 = [
{ service_name = "nginx", log_file_path = "/var/log/nginx/access.log" },
{ service_name = "app", log_file_path = "/var/log/myapp/app.log" }
]
hostname2 = [
{ service_name = "database", log_file_path = "/var/log/postgres/postgres.log" }
]
Service Management
- Direct Control: Arrow keys (↑↓) or vim keys (j/k) navigate services
- Service Actions:
s- Start service (sends UserStart command)S- Stop service (sends UserStop command)J- Show service logs (journalctl in tmux popup)L- Show custom log files (tail -f custom paths in tmux popup)R- Rebuild current host
- Visual Status: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
- Transitional Icons: Blue arrows during operations
Navigation
- Tab: Switch between hosts
- ↑↓ or j/k: Select services
- s: Start selected service (UserStart)
- S: Stop selected service (UserStop)
- J: Show service logs (journalctl)
- L: Show custom log files
- R: Rebuild current host
- B: Run backup on current host
- q: Quit dashboard
Core Architecture Principles
Individual Metrics Philosophy
- Agent collects individual metrics, dashboard composes widgets
- Each metric collected, transmitted, and stored individually
- Agent calculates status for each metric using thresholds
- Dashboard aggregates individual metric statuses for widget status
Maintenance Mode
- Agent checks for
/tmp/cm-maintenancefile before sending notifications - File presence suppresses all email notifications while continuing monitoring
- Dashboard continues to show real status, only notifications are blocked
Usage:
# Enable maintenance mode
touch /tmp/cm-maintenance
# Run maintenance tasks
systemctl stop service
# ... maintenance work ...
systemctl start service
# Disable maintenance mode
rm /tmp/cm-maintenance
Development and Deployment Architecture
Development Path
- Location:
~/projects/cm-dashboard - Purpose: Development workflow only - for committing new code
- Access: Only for developers to commit changes
Deployment Path
- Location:
/var/lib/cm-dashboard/nixos-config - Purpose: Production deployment only - agent clones/pulls from git
- Workflow: git pull →
/var/lib/cm-dashboard/nixos-config→ nixos-rebuild
Git Flow
Development: ~/projects/cm-dashboard → git commit → git push
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
Automated Binary Release System
CM Dashboard uses automated binary releases instead of source builds.
Creating New Releases
cd ~/projects/cm-dashboard
git tag v0.1.X
git push origin v0.1.X
This automatically:
- Builds static binaries with
RUSTFLAGS="-C target-feature=+crt-static" - Creates GitHub-style release with tarball
- Uploads binaries via Gitea API
NixOS Configuration Updates
Edit ~/projects/nixosbox/hosts/services/cm-dashboard.nix:
version = "v0.1.X";
src = pkgs.fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-NEW_HASH_HERE";
};
Get Release Hash
cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
Building
Testing & Building:
- Workspace builds:
nix-shell -p openssl pkg-config --run "cargo build --workspace" - Clean compilation: Remove
target/between major changes
Enhanced Storage Pool Visualization
Auto-Discovery Architecture
The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping.
Discovery Process
At Agent Startup:
- Parse
/proc/mountsto identify all mounted filesystems - Detect MergerFS pools by analyzing
fuse.mergerfsmount sources - Identify member disks and potential parity relationships via heuristics
- Store discovered storage topology for continuous monitoring
- Generate pool-aware metrics with hierarchical relationships
Continuous Monitoring:
- Use stored discovery data for efficient metric collection
- Monitor individual drives for SMART data, temperature, wear
- Calculate pool-level health based on member drive status
- Generate enhanced metrics for dashboard visualization
Supported Storage Types
Single Disks:
- ext4, xfs, btrfs mounted directly
- Individual drive monitoring with SMART data
- Traditional single-disk display for root, boot, etc.
MergerFS Pools:
- Auto-detect from
/proc/mountsfuse.mergerfs entries - Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2")
- Heuristic parity disk detection (sequential device names, "parity" in path)
- Pool health calculation (healthy/degraded/critical)
- Hierarchical tree display with data/parity disk grouping
Future Extensions Ready:
- RAID arrays via
/proc/mdstatparsing - ZFS pools via
zpool statusintegration - LVM logical volumes via
lvsdiscovery
Configuration
[collectors.disk]
enabled = true
auto_discover = true # Default: true
# Optional exclusions for special filesystems
exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"]
exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
Display Format
Storage:
● /srv/media (mergerfs (2+1)):
├─ Pool Status: ● Healthy (3 drives)
├─ Total: ● 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ Parity: ● sdc T: 24°C
● /:
├─ ● nvme0n1 W: 13%
└─ ● 7% 14.5GB/218.5GB
Implementation Benefits
- Zero Configuration: No manual pool definitions required
- Always Accurate: Reflects actual system state automatically
- Scales Automatically: Handles any number of pools without config changes
- Backwards Compatible: Single disks continue working unchanged
- Future Ready: Easy extension for additional storage technologies
Current Status (v0.1.100)
✅ Completed:
- Auto-discovery system implemented and deployed
/proc/mountsparsing with smart heuristics for parity detection- Storage topology stored at agent startup for efficient monitoring
- Universal zero-configuration for all hosts (cmbox, steambox, simonbox, srv01, srv02, srv03)
- Enhanced pool health calculation (healthy/degraded/critical)
- Hierarchical tree visualization with data/parity disk separation
🔄 In Progress - Unified Pool Visualization:
Current auto-discovery works but displays filesystems separately instead of grouped by physical drives. Need to implement unified pool concept where single drives are treated as pools.
Current Display (needs improvement):
● /boot: (separate entry)
● /nix_store: (separate entry)
● /: (separate entry)
Target Display (unified pools):
● nvme0n1:
├─ Drive: T: 35°C W: 1%
├─ /boot: 11% 0.1GB/1.0GB
├─ /nix_store: 23% 214.9GB/928.2GB
└─ /: 23% 214.9GB/928.2GB
Required Changes:
- Enhanced Auto-Discovery: Group filesystems by backing physical drive during discovery
- UI Pool Logic: Treat single drives as "pools" with drive name as header
- Drive Info Display: Show temperature, wear, health at pool level for single drives
- Filesystem Children: Display mount points as children under their physical drives
- Hybrid Rendering: Physical grouping for single drives, logical grouping for mergerfs pools
Expected Result: Consistent hierarchical storage visualization where everything follows pool->children pattern, regardless of underlying storage technology.
Important Communication Guidelines
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
Commit Message Guidelines
NEVER mention:
- Claude or any AI assistant names
- Automation or AI-generated content
- Any reference to automated code generation
ALWAYS:
- Focus purely on technical changes and their purpose
- Use standard software development commit message format
- Describe what was changed and why, not how it was created
- Write from the perspective of a human developer
Examples:
- ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation"
- ❌ "Automated refactoring"
- ✅ "Implement maintenance mode for backup operations"
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
Implementation Rules
- Individual Metrics: Each metric is collected, transmitted, and stored individually
- Agent Status Authority: Agent calculates status for each metric using thresholds
- Dashboard Composition: Dashboard widgets subscribe to specific metrics by name
- Status Aggregation: Dashboard aggregates individual metric statuses for widget status
NEVER:
- Copy/paste ANY code from legacy implementations
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
- Create files unless absolutely necessary for achieving goals
- Create documentation files unless explicitly requested
ALWAYS:
- Prefer editing existing files to creating new ones
- Follow existing code conventions and patterns
- Use existing libraries and utilities
- Follow security best practices