cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	00a8ed3da2	Implement hysteresis for metric status changes to prevent flapping Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets	2025-10-20 18:45:41 +02:00
Christoffer Martinsson	dcca5bbea3	Fix cache tier test to match actual configuration - Update test expectations from 5s to 2s intervals for realtime tier - Fix comment to reflect actual 2s interval instead of outdated 5s reference - All tests now pass correctly	2025-10-18 18:44:13 +02:00
Christoffer Martinsson	8a36472a3d	Implement real-time process monitoring and fix UI hardcoded data This commit addresses several key issues identified during development: Major Changes: - Replace hardcoded top CPU/RAM process display with real system data - Add intelligent process monitoring to CpuCollector using ps command - Fix disk metrics permission issues in systemd collector - Optimize service collection to focus on status, memory, and disk only - Update dashboard widgets to display live process information Process Monitoring Implementation: - Added collect_top_cpu_process() and collect_top_ram_process() methods - Implemented ps-based monitoring with accurate CPU percentages - Added filtering to prevent self-monitoring artifacts (ps commands) - Enhanced error handling and validation for process data - Dashboard now shows realistic values like "claude (PID 2974) 11.0%" Service Collection Optimization: - Removed CPU monitoring from systemd collector for efficiency - Enhanced service directory permission error logging - Simplified services widget to show essential metrics only - Fixed service-to-directory mapping accuracy UI and Dashboard Improvements: - Reorganized dashboard layout with btop-inspired multi-panel design - Updated system panel to include real top CPU/RAM process display - Enhanced widget formatting and data presentation - Removed placeholder/hardcoded data throughout the interface Technical Details: - Updated agent/src/collectors/cpu.rs with process monitoring - Modified dashboard/src/ui/mod.rs for real-time process display - Enhanced systemd collector error handling and disk metrics - Updated CLAUDE.md documentation with implementation details	2025-10-16 23:55:05 +02:00
Christoffer Martinsson	6bc7f97375	Add refresh shortkey 'r' for on-demand metrics refresh Implements ZMQ command protocol for dashboard-to-agent communication: - Agents listen on port 6131 for REQ/REP commands - Dashboard sends "refresh" command when 'r' key is pressed - Agents force immediate collection of all metrics via force_refresh_all() - Fresh data is broadcast immediately to dashboard - Updated help text to show "r: Refresh all metrics" Also includes metric-level caching architecture foundation for future granular control over individual metric update frequencies.	2025-10-15 22:30:04 +02:00
Christoffer Martinsson	1b572c5c1d	Implement intelligent caching system for optimal CPU performance Replace traditional 5-second polling with tiered collection strategy: - RealTime (5s): CPU load, memory usage - Medium (5min): Service status, disk usage - Slow (15min): SMART data, backup status Key improvements: - Reduce CPU usage from 9.5% to <2% - Cache warming for instant dashboard responsiveness - Background refresh at 80% of tier intervals - Thread-safe cache with automatic cleanup Remove legacy polling code - smart caching is now the default and only mode. Agent startup enhanced with parallel cache population for immediate data availability. Architecture: SmartCache + CachedCollector + tiered CollectionScheduler	2025-10-15 11:21:36 +02:00
Christoffer Martinsson	efdd713f62	Improve dashboard display and fix service issues - Remove unreachable descriptions from failed nginx sites - Show complete site URLs instead of truncating at first dot - Implement service-specific disk quotas (docker: 4GB, immich: 4GB, others: 1-2GB) - Truncate process names to show only executable name without full path - Display only highest C-state instead of all C-states for cleaner output - Format system RAM as xxxMB/GB (totalGB) to match services format	2025-10-15 09:36:03 +02:00
Christoffer Martinsson	1ee398e648	Improve widget formatting and add logged-in users support Services widget: - Fix disk quota formatting with proper rounding instead of truncation - Remove decimals from RAM quotas and use GB instead of G - Change quota display to use GB consistently Backups widget: - Change GiB to GB for consistency - Remove spaces between numbers and units - Update disk usage format to match other widgets: used (totalGB) - Remove percentage display for cleaner format System widget: - Add support for logged-in users in description lines - Format C-states with "C-State:" prefix on first line, indent subsequent lines - Add logged_in_users field to SystemSummary data structure Documentation: - Add example hash error output to NixOS update instructions	2025-10-14 18:59:31 +02:00
Christoffer Martinsson	3e5e91f078	Remove SB column and improve widget formatting Services widget: - Remove SB (sandbox) column and related formatting function - Fix quota formatting to show decimals when needed (1.5G not 1G) - Remove spaces in unit display (128MB not 128 MB) Storage widget: - Change usage format to 23GB (932GB) for better readability Documentation: - Add NixOS configuration update process to CLAUDE.md	2025-10-14 18:40:12 +02:00
Christoffer Martinsson	b0d7d5ce35	Testing	2025-10-13 11:23:49 +02:00
Christoffer Martinsson	c68ccf023e	Testing	2025-10-13 00:28:06 +02:00
Christoffer Martinsson	57b676ad25	Testing	2025-10-13 00:16:24 +02:00
Christoffer Martinsson	9e344fb66d	Testing	2025-10-12 22:31:46 +02:00
Christoffer Martinsson	c3dbaeead2	Optimize agent CPU usage with throttled service descriptions	2025-10-12 15:13:42 +02:00
Christoffer Martinsson	2239badc8a	Testing	2025-10-12 14:53:27 +02:00
Christoffer Martinsson	2581435b10	Implement per-service disk usage monitoring Replaced system-wide disk usage with accurate per-service tracking by scanning service-specific directories. Services like sshd now correctly show minimal disk usage instead of misleading system totals. - Rename storage widget and add drive capacity/usage columns - Move host display to main dashboard title for cleaner layout - Replace separate alert displays with color-coded row highlighting - Add per-service disk usage collection using du command - Update services widget formatting to handle small disk values - Restructure into workspace with dedicated agent and dashboard packages	2025-10-11 22:59:16 +02:00
Christoffer Martinsson	656cb5943b	Switch dashboard to ZMQ gossip data source	2025-10-11 13:36:46 +02:00

16 Commits