Eliminates test_tcp_connectivity function that was causing 5-10 second
startup delays. ZMQ connections are non-blocking and we rely entirely
on heartbeat mechanism for connectivity detection. This restores fast
dashboard startup time.
Simplifies host connection configuration by removing tailscale_ip field,
connection_type preferences, and fallback retry logic. Now uses only the
ip field or hostname as fallback. Eliminates blocking TCP connectivity
tests that interfered with heartbeat processing.
This resolves intermittent host lost/found issues by removing the
connection retry timeouts that blocked the ZMQ message processing loop.
Implement configurable network routing for both local and Tailscale networks.
Dashboard now supports intelligent connection selection with automatic fallback
between network types. Add IP configuration fields and connection routing logic
for ZMQ and SSH operations.
Features:
- Host configuration with local and Tailscale IP addresses
- Configurable connection types (local/tailscale/auto)
- Automatic fallback between network connections
- Updated ZMQ connection logic with retry support
- SSH command routing through configured IP addresses
- Change receive_metrics() from blocking to DONTWAIT to prevent main loop freezing
- Eliminate 1-second ZMQ socket timeout that was blocking UI after Tab key press
- Main loop now continues immediately after immediate render instead of waiting
- Maintain heartbeat-based host detection while fixing visual responsiveness
- Fix blocking operation introduced when implementing heartbeat timeout mechanism
- Tab navigation now truly immediate without any network operation delays
- Bump version to 0.1.61
- Add agent_heartbeat metric to agent transmission for reliable host detection
- Update dashboard to track heartbeat timestamps per host instead of general metrics
- Add configurable heartbeat_timeout_seconds to dashboard ZMQ config (default 10s)
- Remove unused timeout_ms from agent config and revert to non-blocking command reception
- Remove unused heartbeat_interval_ms from agent configuration
- Host disconnect detection now uses dedicated heartbeat metrics for improved reliability
- Bump version to 0.1.57
Add comprehensive tracking for services stopped via dashboard to prevent
false alerts when users intentionally stop services.
Features:
- User-stopped services report Status::Ok instead of Warning
- Persistent storage survives agent restarts
- Dashboard sends UserStart/UserStop commands
- Agent tracks and syncs user-stopped state globally
- Systemd collector respects user-stopped flags
Implementation:
- New service_tracker module with persistent JSON storage
- Enhanced ServiceAction enum with UserStart/UserStop variants
- Global singleton tracker accessible by collectors
- Service status logic updated to check user-stopped flag
- Dashboard version now uses CARGO_PKG_VERSION automatically
Bump version to v0.1.43
Simplified keyboard controls by removing service restart functionality:
- Removed 'r' key restart functionality from Services panel
- Made 'R' key always trigger system rebuild regardless of focused panel
- Updated context shortcuts to show 'R: Rebuild Host' globally
- Removed all ServiceRestart enum variants and associated code:
- UiCommand::ServiceRestart
- CommandType::ServiceRestart
- ServiceAction::Restart
- Cleaned up pending transition logic to only handle Start/Stop commands
The 'R' key now consistently rebuilds the current host from any panel,
while 's' and 'S' continue to handle service start/stop in Services panel.
- Add terminal popup UI component with 80% screen coverage and terminal styling
- Extend ZMQ protocol with CommandOutputMessage for streaming output
- Implement real-time output streaming in agent system rebuild handler
- Add keyboard controls (ESC/Q to close, ↑↓ to scroll) for popup interaction
- Fix system panel Build display to show actual NixOS build instead of config hash
- Update service filters in README with wildcard patterns for better matching
- Add periodic progress updates during nixos-rebuild execution
- Integrate command output handling in dashboard main loop
- Add nixos_config_api_key_file option to NixOS configuration
- Support reading API token from file for private repositories
- Automatically inject token into HTTPS URLs (https://token@host/repo.git)
- Graceful fallback to original URL if key file missing/empty
- Default key file location: /var/lib/cm-dashboard/git-api-key
Usage: echo 'your-api-token' | sudo tee /var/lib/cm-dashboard/git-api-key
Replace direct directory access with git clone/pull approach:
- Add git configuration options (url, branch, working_dir) to NixOS module
- Update SystemConfig and AgentCommand to use git parameters
- Implement ensure_git_repository() method for clone/pull operations
- Agent clones nixosbox to /var/lib/cm-dashboard/nixos-config
- Maintains security while solving permission denied issues
The agent now manages its own copy of the configuration without
needing access to /home/cm directory.
This implements the core functionality for executing remote commands through
the dashboard and providing real-time visual feedback to users.
Key Features:
- Remote service control (start/stop/restart) via existing keyboard shortcuts
- System rebuild command with maintenance mode integration
- Real-time visual feedback with service status transitions
- ZMQ command protocol extension for service and system operations
Implementation Details:
- Extended AgentCommand enum with ServiceControl and SystemRebuild variants
- Added agent-side handlers for systemctl and nixos-rebuild execution
- Implemented command status tracking system for visual feedback
- Enhanced services widget to show progress states (⏳ restarting)
- Integrated command execution with existing keyboard navigation
Keyboard Controls:
- Services Panel: Space (start/stop), R (restart)
- System Panel: R (nixos-rebuild switch)
- Backup Panel: B (trigger backup)
Technical Architecture:
- Command flow: UI → Dashboard → ZMQ → Agent → systemctl/nixos-rebuild
- Status tracking: InProgress/Success/Failed states with visual indicators
- Maintenance mode: Automatic /tmp/cm-maintenance file management
- Service feedback: Icon transitions (● → ⏳ → ● with status text)
Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.
Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C
Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface
Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)
Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
This commit addresses several key issues identified during development:
Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information
Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"
Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy
UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface
Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details