- Add blue circular arrow (↻) status icon during SystemRebuild commands
- Keep rebuilding hosts visible in dashboard even when temporarily offline
- Extend connection timeout to 5 minutes for hosts undergoing rebuild
- Prevent host switching during rebuild operations
- Update status bar to show rebuild progress immediately when R key pressed
This implements the core functionality for executing remote commands through
the dashboard and providing real-time visual feedback to users.
Key Features:
- Remote service control (start/stop/restart) via existing keyboard shortcuts
- System rebuild command with maintenance mode integration
- Real-time visual feedback with service status transitions
- ZMQ command protocol extension for service and system operations
Implementation Details:
- Extended AgentCommand enum with ServiceControl and SystemRebuild variants
- Added agent-side handlers for systemctl and nixos-rebuild execution
- Implemented command status tracking system for visual feedback
- Enhanced services widget to show progress states (⏳ restarting)
- Integrated command execution with existing keyboard navigation
Keyboard Controls:
- Services Panel: Space (start/stop), R (restart)
- System Panel: R (nixos-rebuild switch)
- Backup Panel: B (trigger backup)
Technical Architecture:
- Command flow: UI → Dashboard → ZMQ → Agent → systemctl/nixos-rebuild
- Status tracking: InProgress/Success/Failed states with visual indicators
- Maintenance mode: Automatic /tmp/cm-maintenance file management
- Service feedback: Icon transitions (● → ⏳ → ● with status text)
- Mark keyboard navigation and service management as completed
- Document all implemented navigation controls and features
- Update current status to reflect working build display
- Replace old future priorities with actual service management tasks
- Add focus-aware selection and visual feedback documentation
- Create NixOS collector for version and active users detection
- Add SystemWidget combining all system information in TODO.md layout
- Replace separate CPU/Memory widgets with unified system display
- Add tree structure for storage with drive temperature/wear info
- Support NixOS version, active users, load averages, memory usage
- Follow exact decimal formatting from specification
Add proper hierarchical tree display for storage pools and drives:
- Pool headers with status icons and type indication (Single/multi-drive)
- Individual drive lines with ├─ tree symbols and health status
- Usage summary with └─ end symbol and capacity status
- T: and W: prefixes for temperature and wear level metrics
- Themed status icons using StatusIcons::get_icon() with proper colors
- 2-space indentation for clean tree structure appearance
Replace flat storage display with beautiful tree format:
● Storage steampool (multi-drive):
├─ ● sdb T:35°C W:12%
├─ ● sdc T:38°C W:8%
└─ ● 78.1% 1250.3GB/1600.0GB
Uses agent-calculated status from NixOS-configured thresholds.
Update CLAUDE.md with complete implementation specification.
- Remove all Default implementations from agent configuration structs
- Make configuration file required for agent startup
- Update NixOS module to generate complete agent.toml configuration
- Add comprehensive configuration options to NixOS module including:
- Service include/exclude patterns for systemd collector
- All thresholds and intervals
- ZMQ communication settings
- Notification and cache configuration
- Agent now fails fast if no configuration provided
- Eliminates configuration drift between defaults and NixOS settings
Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.
Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C
Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface
Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)
Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
- Update test expectations from 5s to 2s intervals for realtime tier
- Fix comment to reflect actual 2s interval instead of outdated 5s reference
- All tests now pass correctly
This commit addresses several key issues identified during development:
Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information
Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"
Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy
UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface
Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details
Implements ZMQ command protocol for dashboard-to-agent communication:
- Agents listen on port 6131 for REQ/REP commands
- Dashboard sends "refresh" command when 'r' key is pressed
- Agents force immediate collection of all metrics via force_refresh_all()
- Fresh data is broadcast immediately to dashboard
- Updated help text to show "r: Refresh all metrics"
Also includes metric-level caching architecture foundation for future
granular control over individual metric update frequencies.
Replace traditional 5-second polling with tiered collection strategy:
- RealTime (5s): CPU load, memory usage
- Medium (5min): Service status, disk usage
- Slow (15min): SMART data, backup status
Key improvements:
- Reduce CPU usage from 9.5% to <2%
- Cache warming for instant dashboard responsiveness
- Background refresh at 80% of tier intervals
- Thread-safe cache with automatic cleanup
Remove legacy polling code - smart caching is now the default and only mode.
Agent startup enhanced with parallel cache population for immediate data availability.
Architecture: SmartCache + CachedCollector + tiered CollectionScheduler
- Remove unreachable descriptions from failed nginx sites
- Show complete site URLs instead of truncating at first dot
- Implement service-specific disk quotas (docker: 4GB, immich: 4GB, others: 1-2GB)
- Truncate process names to show only executable name without full path
- Display only highest C-state instead of all C-states for cleaner output
- Format system RAM as xxxMB/GB (totalGB) to match services format
Services widget:
- Fix disk quota formatting with proper rounding instead of truncation
- Remove decimals from RAM quotas and use GB instead of G
- Change quota display to use GB consistently
Backups widget:
- Change GiB to GB for consistency
- Remove spaces between numbers and units
- Update disk usage format to match other widgets: used (totalGB)
- Remove percentage display for cleaner format
System widget:
- Add support for logged-in users in description lines
- Format C-states with "C-State:" prefix on first line, indent subsequent lines
- Add logged_in_users field to SystemSummary data structure
Documentation:
- Add example hash error output to NixOS update instructions
Services widget:
- Remove SB (sandbox) column and related formatting function
- Fix quota formatting to show decimals when needed (1.5G not 1G)
- Remove spaces in unit display (128MB not 128 MB)
Storage widget:
- Change usage format to 23GB (932GB) for better readability
Documentation:
- Add NixOS configuration update process to CLAUDE.md
Replaced system-wide disk usage with accurate per-service tracking by scanning
service-specific directories. Services like sshd now correctly show minimal
disk usage instead of misleading system totals.
- Rename storage widget and add drive capacity/usage columns
- Move host display to main dashboard title for cleaner layout
- Replace separate alert displays with color-coded row highlighting
- Add per-service disk usage collection using du command
- Update services widget formatting to handle small disk values
- Restructure into workspace with dedicated agent and dashboard packages