- Dashboard now automatically looks for /etc/cm-dashboard/dashboard.toml
- No need to specify --config flag when using standard NixOS deployment
- Fallback to manual config path if default not found
- Update help text to reflect optional config parameter
- Simplifies dashboard usage - just run 'cm-dashboard' without arguments
- Remove all unused configuration options from dashboard config module
- Eliminate hardcoded defaults - dashboard now requires config file like agent
- Keep only actually used config: zmq.subscriber_ports and hosts.predefined_hosts
- Remove unused get_host_metrics function from metric store
- Clean up missing module imports (hosts, utils)
- Make dashboard fail fast if no configuration provided
- Align dashboard config approach with agent configuration pattern
Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.
Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C
Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface
Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)
Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
This commit addresses several key issues identified during development:
Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information
Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"
Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy
UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface
Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details
- Remove 'r' key handler that was causing hang on refresh
- Remove RefreshRequested event and check_refresh_request method
- Remove send_refresh_commands function and ZMQ command protocol
- Remove refresh_requested field from App struct
- Clean up status line text (refresh -> tick)
The refresh functionality was causing the dashboard to become unresponsive
when pressing 'r' key. This removes all refresh-related code to fix the issue.
Implements ZMQ command protocol for dashboard-to-agent communication:
- Agents listen on port 6131 for REQ/REP commands
- Dashboard sends "refresh" command when 'r' key is pressed
- Agents force immediate collection of all metrics via force_refresh_all()
- Fresh data is broadcast immediately to dashboard
- Updated help text to show "r: Refresh all metrics"
Also includes metric-level caching architecture foundation for future
granular control over individual metric update frequencies.
- Add DEFAULT_HOSTS constant in config.rs for centralized host management
- Update ZMQ endpoint generation to connect to all configured hosts
- Implement graceful connection handling for unreachable endpoints
- Dashboard now auto-discovers and connects to available agents on cmbox, labbox, simonbox, steambox, srv01
Replaced system-wide disk usage with accurate per-service tracking by scanning
service-specific directories. Services like sshd now correctly show minimal
disk usage instead of misleading system totals.
- Rename storage widget and add drive capacity/usage columns
- Move host display to main dashboard title for cleaner layout
- Replace separate alert displays with color-coded row highlighting
- Add per-service disk usage collection using du command
- Update services widget formatting to handle small disk values
- Restructure into workspace with dedicated agent and dashboard packages