Replace ZMQ-based service start/stop commands with SSH execution in tmux
popups. This provides better user feedback with real-time systemctl output
while eliminating blocking operations from the main message processing loop.
Changes:
- Service start/stop now use SSH with progress display
- Added backup functionality with 'B' key
- Preserved transitional icons (↑/↓) for immediate visual feedback
- Removed all ZMQ service control commands and handlers
- Updated configuration to include backup_alias setting
- All operations (rebuild, backup, services) now use consistent SSH interface
This ensures stable heartbeat processing while providing superior user
experience with live command output and service status feedback.
Eliminates automatic CollectNow command sending on host connection which
was blocking the main message processing loop for up to 5 seconds per
command. Since agents transmit cached data every 2 seconds anyway, the
CollectNow optimization provided minimal benefit while causing heartbeat
detection issues. Also removes unused send_command wrapper method.
This should completely resolve intermittent host connection dropping.
Implement configurable network routing for both local and Tailscale networks.
Dashboard now supports intelligent connection selection with automatic fallback
between network types. Add IP configuration fields and connection routing logic
for ZMQ and SSH operations.
Features:
- Host configuration with local and Tailscale IP addresses
- Configurable connection types (local/tailscale/auto)
- Automatic fallback between network connections
- Updated ZMQ connection logic with retry support
- SSH command routing through configured IP addresses
- Add dedicated heartbeat transmission every 5 seconds independent of metric collection
- Fix host offline detection by clearing metrics for disconnected hosts
- Move exclude_email_metrics to NotificationConfig for better organization
- Add cleanup_offline_hosts method to remove stale metrics after heartbeat timeout
- Ensure offline hosts show proper status icons and visual indicators
Version 0.1.63
- Add independent heartbeat checking timer (1 second interval) separate from metric reception
- Move get_connected_hosts() call outside metric receive condition to run periodically
- Remove duplicate update_hosts() call from metric processing to avoid redundancy
- Ensure offline host detection works even when no new metrics are received
- Fix issue where hosts going offline were never detected due to conditional heartbeat check
- Heartbeat timeouts now properly detected within configured timeout + 1 second
- Bump version to 0.1.62
- Add immediate terminal.draw() call after input handling in main loop
- Eliminate delay between Tab key press and visual host switching
- Provide instant visual feedback for all navigation inputs
- Maintain existing metric update render cycle without duplication
- Fix UI update timing issue where changes only appeared on metric intervals
- Bump version to 0.1.59
- Add agent_heartbeat metric to agent transmission for reliable host detection
- Update dashboard to track heartbeat timestamps per host instead of general metrics
- Add configurable heartbeat_timeout_seconds to dashboard ZMQ config (default 10s)
- Remove unused timeout_ms from agent config and revert to non-blocking command reception
- Remove unused heartbeat_interval_ms from agent configuration
- Host disconnect detection now uses dedicated heartbeat metrics for improved reliability
- Bump version to 0.1.57
- Add Status::Offline enum variant for disconnected hosts
- All configured hosts now always visible showing offline status when disconnected
- Add WakeOnLAN support using wake-on-lan Rust crate
- Implement w key binding to wake offline hosts with MAC addresses
- Simplify configuration to single [hosts] section with MAC addresses only
- Change critical status icon from ◯ to ! for better visibility
- Add proper MAC address parsing and error handling
- Silent WakeOnLAN operation with logging for success/failure
Configuration format:
[hosts]
hostname = { mac_address = "AA:BB:CC:DD:EE:FF" }
Add comprehensive tracking for services stopped via dashboard to prevent
false alerts when users intentionally stop services.
Features:
- User-stopped services report Status::Ok instead of Warning
- Persistent storage survives agent restarts
- Dashboard sends UserStart/UserStop commands
- Agent tracks and syncs user-stopped state globally
- Systemd collector respects user-stopped flags
Implementation:
- New service_tracker module with persistent JSON storage
- Enhanced ServiceAction enum with UserStart/UserStop variants
- Global singleton tracker accessible by collectors
- Service status logic updated to check user-stopped flag
- Dashboard version now uses CARGO_PKG_VERSION automatically
Bump version to v0.1.43
Simplified keyboard controls by removing service restart functionality:
- Removed 'r' key restart functionality from Services panel
- Made 'R' key always trigger system rebuild regardless of focused panel
- Updated context shortcuts to show 'R: Rebuild Host' globally
- Removed all ServiceRestart enum variants and associated code:
- UiCommand::ServiceRestart
- CommandType::ServiceRestart
- ServiceAction::Restart
- Cleaned up pending transition logic to only handle Start/Stop commands
The 'R' key now consistently rebuilds the current host from any panel,
while 's' and 'S' continue to handle service start/stop in Services panel.
- Add service state detection before executing start/stop/restart commands
- Prevent redundant operations (start active services, stop inactive services)
- Show immediate directional arrows for command feedback (↑ starting, ↓ stopping, ↻ restarting)
- Add get_service_status() method to ServicesWidget for state access
- Remove unused TerminalPopup code and dangling methods
- Clean up warnings and unused code throughout codebase
Service commands now validate current state and provide instant UX feedback while
preserving existing status icons and colors during transitions.
- Remove all SystemRebuild command infrastructure from agent and dashboard
- Replace with direct tmux popup execution: ssh {user}@{host} {alias}
- Add configurable SSH user and rebuild alias in dashboard config
- Eliminate agent process crashes during rebuilds
- Simplify architecture by removing ZMQ command streaming complexity
- Clean up all related dead code and fix compilation warnings
Benefits:
- Process isolation: rebuild runs independently via SSH
- Crash resilience: agent/dashboard can restart without affecting rebuilds
- Configuration flexibility: SSH user and alias configurable per deployment
- Operational simplicity: standard tmux popup interface
- Remove auto-close behavior from terminal popup for manual review
- Fix system panel to show correct NixOS section layout
- Add missing Active users line after Agent version
- Switch agent version from nix store hash to actual version number (v0.1.11)
- Display full version string without truncation for clear version tracking
- Add terminal popup UI component with 80% screen coverage and terminal styling
- Extend ZMQ protocol with CommandOutputMessage for streaming output
- Implement real-time output streaming in agent system rebuild handler
- Add keyboard controls (ESC/Q to close, ↑↓ to scroll) for popup interaction
- Fix system panel Build display to show actual NixOS build instead of config hash
- Update service filters in README with wildcard patterns for better matching
- Add periodic progress updates during nixos-rebuild execution
- Integrate command output handling in dashboard main loop
- MetricStore tracks agent versions from all hosts
- Detects version mismatches using most common version as reference
- Dashboard logs warnings for hosts with outdated agents
- Foundation for visual version mismatch indicators in UI
- Helps identify deployment inconsistencies across infrastructure
- Add blue circular arrow (↻) status icon during SystemRebuild commands
- Keep rebuilding hosts visible in dashboard even when temporarily offline
- Extend connection timeout to 5 minutes for hosts undergoing rebuild
- Prevent host switching during rebuild operations
- Update status bar to show rebuild progress immediately when R key pressed
- Add nixos_config_api_key_file option to NixOS configuration
- Support reading API token from file for private repositories
- Automatically inject token into HTTPS URLs (https://token@host/repo.git)
- Graceful fallback to original URL if key file missing/empty
- Default key file location: /var/lib/cm-dashboard/git-api-key
Usage: echo 'your-api-token' | sudo tee /var/lib/cm-dashboard/git-api-key
Replace direct directory access with git clone/pull approach:
- Add git configuration options (url, branch, working_dir) to NixOS module
- Update SystemConfig and AgentCommand to use git parameters
- Implement ensure_git_repository() method for clone/pull operations
- Agent clones nixosbox to /var/lib/cm-dashboard/nixos-config
- Maintains security while solving permission denied issues
The agent now manages its own copy of the configuration without
needing access to /home/cm directory.
- Change Space key toggle to 's' for start and 'S' for stop
- Remove problematic service status detection logic
- Update statusbar shortcuts to show S/Shift+S instead of Space
- Split UiCommand::ServiceStartStop into ServiceStart and ServiceStop
- Simplify command handling with explicit actions
This resolves the toggle bug where Space always sent start commands
regardless of current service status.
- Implement proper service status checking for start/stop toggle
- Space key now checks current service status and toggles appropriately
- Active services get stopped, inactive services get started
- Fix nixos-rebuild sudo permissions (remove invalid package reference)
- Use only /run/current-system/sw/bin/nixos-rebuild path
This implements the core functionality for executing remote commands through
the dashboard and providing real-time visual feedback to users.
Key Features:
- Remote service control (start/stop/restart) via existing keyboard shortcuts
- System rebuild command with maintenance mode integration
- Real-time visual feedback with service status transitions
- ZMQ command protocol extension for service and system operations
Implementation Details:
- Extended AgentCommand enum with ServiceControl and SystemRebuild variants
- Added agent-side handlers for systemctl and nixos-rebuild execution
- Implemented command status tracking system for visual feedback
- Enhanced services widget to show progress states (⏳ restarting)
- Integrated command execution with existing keyboard navigation
Keyboard Controls:
- Services Panel: Space (start/stop), R (restart)
- System Panel: R (nixos-rebuild switch)
- Backup Panel: B (trigger backup)
Technical Architecture:
- Command flow: UI → Dashboard → ZMQ → Agent → systemctl/nixos-rebuild
- Status tracking: InProgress/Success/Failed states with visual indicators
- Maintenance mode: Automatic /tmp/cm-maintenance file management
- Service feedback: Icon transitions (● → ⏳ → ● with status text)
The new keybindings (Shift+Tab, Up, Down, BackTab) weren't working because
the main app.rs event loop was only passing specific keys to the TUI app.
Added handling for:
- KeyCode::BackTab (Shift+Tab panel switching)
- KeyCode::Up (scroll up in focused panel)
- KeyCode::Down (scroll down in focused panel)
Now all keyboard navigation features work correctly.
- Dashboard now automatically looks for /etc/cm-dashboard/dashboard.toml
- No need to specify --config flag when using standard NixOS deployment
- Fallback to manual config path if default not found
- Update help text to reflect optional config parameter
- Simplifies dashboard usage - just run 'cm-dashboard' without arguments
- Remove all unused configuration options from dashboard config module
- Eliminate hardcoded defaults - dashboard now requires config file like agent
- Keep only actually used config: zmq.subscriber_ports and hosts.predefined_hosts
- Remove unused get_host_metrics function from metric store
- Clean up missing module imports (hosts, utils)
- Make dashboard fail fast if no configuration provided
- Align dashboard config approach with agent configuration pattern
Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.
Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C
Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface
Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)
Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
Removed unused widget subscription system, cache utilities, error variants,
theme functions, and struct fields. Replaced subscription-based widgets
with direct metric filtering. Build now completes with zero warnings.
- Add KeyCode::Tab support to main dashboard event loop
- Add Tab key handling to TuiApp handle_input method
- Tab key now cycles to next host using existing navigate_host logic
- Host switching infrastructure was already implemented, just needed Tab key support
- Current host displayed in bold in title bar, other hosts shown normally
- Metrics filtered by selected host, full navigation working
This commit addresses several key issues identified during development:
Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information
Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"
Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy
UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface
Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details
- Remove 'r' key handler that was causing hang on refresh
- Remove RefreshRequested event and check_refresh_request method
- Remove send_refresh_commands function and ZMQ command protocol
- Remove refresh_requested field from App struct
- Clean up status line text (refresh -> tick)
The refresh functionality was causing the dashboard to become unresponsive
when pressing 'r' key. This removes all refresh-related code to fix the issue.
Implements ZMQ command protocol for dashboard-to-agent communication:
- Agents listen on port 6131 for REQ/REP commands
- Dashboard sends "refresh" command when 'r' key is pressed
- Agents force immediate collection of all metrics via force_refresh_all()
- Fresh data is broadcast immediately to dashboard
- Updated help text to show "r: Refresh all metrics"
Also includes metric-level caching architecture foundation for future
granular control over individual metric update frequencies.
- Add DEFAULT_HOSTS constant in config.rs for centralized host management
- Update ZMQ endpoint generation to connect to all configured hosts
- Implement graceful connection handling for unreachable endpoints
- Dashboard now auto-discovers and connects to available agents on cmbox, labbox, simonbox, steambox, srv01
- Storage widget: Restructure with Name/Temp/Wear/Usage columns, SMART details as descriptions
- Host navigation: Only cycle through connected hosts, no disconnected hosts
- Auto-discovery: Skip config files, use predefined CMTEC host list
- Maintenance mode: Suppress notifications during backup via /tmp/cm-maintenance file
- CPU thresholds: Update to warning ≥9.0, critical ≥10.0 for production use
- Agent-dashboard separation: Agent provides descriptions, dashboard displays only
Replaced system-wide disk usage with accurate per-service tracking by scanning
service-specific directories. Services like sshd now correctly show minimal
disk usage instead of misleading system totals.
- Rename storage widget and add drive capacity/usage columns
- Move host display to main dashboard title for cleaner layout
- Replace separate alert displays with color-coded row highlighting
- Add per-service disk usage collection using du command
- Update services widget formatting to handle small disk values
- Restructure into workspace with dedicated agent and dashboard packages