7.3 KiB
CM Dashboard - Infrastructure Monitoring TUI
Overview
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection.
Implementation Strategy
Next Phase: Systemd Collector Optimization (Based on TODO.md)
Current Status: Reverted to working baseline (commit 245e546) after optimization broke service discovery.
Planned Implementation Steps (step-by-step to avoid breaking functionality):
Phase 1: Exact Name Filtering
- Replace
contains()matching with exact name matching for service filters - Change
service_name.contains(pattern) || pattern.contains(service_name)toservice_name == pattern - Test: Ensure cmbox remains visible with exact service names in config
- Commit and test after each change
Phase 2: Remove User Service Collection
- Remove all
sudo -usystemctl commands for user services - Remove user_unit_files_output and user_units_output logic
- Keep only system service discovery via
systemctl list-units --type=service - Test: Verify system services still discovered correctly
Phase 3: Add Wildcard Support
- Implement glob pattern matching for service filters
- Support patterns like "nginx*" to match "nginx", "nginx-config-reload", etc.
- Use fnmatch or similar for wildcard expansion
- Test: Verify patterns work as expected
Phase 4: Optimize systemctl Calls
- Cache service status information during discovery
- Eliminate redundant
systemctl is-activeandsystemctl showcalls per service - Parse status from
systemctl list-unitsoutput directly - Test: Ensure performance improvement without functionality loss
Phase 5: Include-Only Discovery
- Remove auto-discovery of all services
- Only check services explicitly listed in service_name_filters
- Skip systemctl discovery entirely, use configured list directly
- Test: Verify only configured services are monitored
Critical Requirements:
- Each phase must be tested independently
- cmbox must remain visible in dashboard after each change
- No functionality regressions allowed
- Commit each phase separately with descriptive messages
Rollback Strategy:
- If any phase breaks functionality, immediately revert that specific commit
- Do not attempt to "fix forward" - revert and redesign the problematic step
- Each phase should be atomic and independently revertible
Core Architecture Principles - CRITICAL
Individual Metrics Philosophy
NEW ARCHITECTURE: Agent collects individual metrics, dashboard composes widgets from those metrics.
Maintenance Mode
Purpose:
- Suppress email notifications during planned maintenance or backups
- Prevents false alerts when services are intentionally stopped
Implementation:
- Agent checks for
/tmp/cm-maintenancefile before sending notifications - File presence suppresses all email notifications while continuing monitoring
- Dashboard continues to show real status, only notifications are blocked
Usage:
# Enable maintenance mode
touch /tmp/cm-maintenance
# Run maintenance tasks (backups, service restarts, etc.)
systemctl stop service
# ... maintenance work ...
systemctl start service
# Disable maintenance mode
rm /tmp/cm-maintenance
NixOS Integration:
- Borgbackup script automatically creates/removes maintenance file
- Automatic cleanup via trap ensures maintenance mode doesn't stick
- All cinfiguration are shall be done from nixos config
ARCHITECTURE ENFORCEMENT:
- ZERO legacy code reuse - Fresh implementation following ARCHITECT.md exactly
- Individual metrics only - NO grouped metric structures
- Reference-only legacy - Study old functionality, implement new architecture
- Clean slate mindset - Build as if legacy codebase never existed
Implementation Rules:
- Individual Metrics: Each metric is collected, transmitted, and stored individually
- Agent Status Authority: Agent calculates status for each metric using thresholds
- Dashboard Composition: Dashboard widgets subscribe to specific metrics by name
- Status Aggregation: Dashboard aggregates individual metric statuses for widget status Testing & Building:
- Workspace builds:
cargo build --workspacefor all testing - Clean compilation: Remove
target/between architecture changes - ZMQ testing: Test agent-dashboard communication independently
- Widget testing: Verify UI layout matches legacy appearance exactly
NEVER in New Implementation:
- Copy/paste ANY code from legacy backup
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
Important Communication Guidelines
NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise.
NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation.
Commit Message Guidelines
NEVER mention:
- Claude or any AI assistant names
- Automation or AI-generated content
- Any reference to automated code generation
ALWAYS:
- Focus purely on technical changes and their purpose
- Use standard software development commit message format
- Describe what was changed and why, not how it was created
- Write from the perspective of a human developer
Examples:
- ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation"
- ❌ "Automated refactoring"
- ✅ "Implement maintenance mode for backup operations"
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
NixOS Configuration Updates
When code changes are made to cm-dashboard, the NixOS configuration at ~/nixosbox must be updated to deploy the changes.
Update Process
-
Get Latest Commit Hash
git log -1 --format="%H" -
Update NixOS Configuration Edit
~/nixosbox/hosts/common/cm-dashboard.nix:src = pkgs.fetchgit { url = "https://gitea.cmtec.se/cm/cm-dashboard.git"; rev = "NEW_COMMIT_HASH_HERE"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; # Placeholder }; -
Get Correct Source Hash Build with placeholder hash to get the actual hash:
cd ~/nixosbox nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchgit { url = "https://gitea.cmtec.se/cm/cm-dashboard.git"; rev = "NEW_COMMIT_HASH"; sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; }' 2>&1 | grep "got:"Example output:
error: hash mismatch in fixed-output derivation '/nix/store/...': specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= got: sha256-x8crxNusOUYRrkP9mYEOG+Ga3JCPIdJLkEAc5P1ZxdQ= -
Update Configuration with Correct Hash Replace the placeholder with the hash from the error message (the "got:" line).
-
Commit NixOS Configuration
cd ~/nixosbox git add hosts/common/cm-dashboard.nix git commit -m "Update cm-dashboard to latest version (SHORT_HASH)" git push -
Rebuild System The user handles the system rebuild step - this cannot be automated.