cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	eb268922bd	Remove all unused code and fix build warnings - Remove unused struct fields: tier, config_name, last_collection_time - Remove unused structs: PerformanceMetrics, PerfMonitor - Remove unused methods: get_performance_metrics, get_collector_names, get_stats - Remove unused utility functions and system helpers - Remove unused config fields from CPU and Memory collectors - Keep config fields that are actually used (DiskCollector, etc.) - Remove unused proxy_pass_url variable and assignments - Fix duplicate hostname variable declaration - Achieve zero build warnings without functionality changes	2025-10-20 20:20:47 +02:00
Christoffer Martinsson	00a8ed3da2	Implement hysteresis for metric status changes to prevent flapping Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets	2025-10-20 18:45:41 +02:00
Christoffer Martinsson	e998679901	Revert nginx monitoring to check all sites via public HTTPS URLs - Remove proxy_pass backend checking - All sites now checked using https://server_name format - Maintains 10-second timeout for external site checks - Simplifies monitoring to consistent external health checks	2025-10-20 15:06:42 +02:00
Christoffer Martinsson	2ccfc4256a	Fix nginx monitoring and services panel alignment - Add support for both proxied and static nginx sites - Proxied sites show 'P' prefix and check backend URLs - Static sites check external HTTPS URLs - Fix services panel column alignment for main services - Keep 10-second timeout for all site checks	2025-10-20 14:56:26 +02:00
Christoffer Martinsson	66a79574e0	Implement comprehensive monitoring improvements - Add full email notifications with lettre and Stockholm timezone - Add status persistence to prevent notification spam on restart - Change nginx monitoring to check backend proxy_pass URLs instead of frontend domains - Increase nginx site timeout to 10 seconds for backend health checks - Fix cache intervals: disk (5min), backup (10min), systemd (30s), cpu/memory (5s) - Remove rate limiting for immediate notifications on all status changes - Store metric status in /var/lib/cm-dashboard/last-status.json	2025-10-20 14:32:44 +02:00
Christoffer Martinsson	47a7d5ae62	Simplify service disk usage detection - remove all estimation fallbacks - Replace complex multi-strategy detection with single deterministic method - Remove estimate_service_disk_usage and all fallback strategies - Use simple get_service_disk_usage method with clear logic: * Defined path exists → use only that path * Defined path fails → return None (shows as '-') * No defined path → use systemctl WorkingDirectory * No estimates or guessing ever Fixes misleading 5MB estimates when defined paths fail due to permissions.	2025-10-20 11:06:49 +02:00
Christoffer Martinsson	fe18ace767	Fix service disk usage detection to use sudo du for permission access ARK service directories require elevated permissions to access. The NixOS configuration already allows sudo du with NOPASSWD, so use sudo du instead of direct du command to properly detect disk usage for restricted directories.	2025-10-20 10:58:17 +02:00
Christoffer Martinsson	a1c980ad31	Implement deterministic service disk usage detection with defined paths - Prioritize defined service directories over systemctl WorkingDirectory fallbacks - Add ARK Survival Ascended server mappings to correct NixOS-configured paths - Remove legacy get_service_disk_usage method to eliminate code duplication - Ensure deterministic behavior with single-purpose detection logic Fixes ARK service disk usage reporting on srv02 by using actual data paths from NixOS configuration instead of systemctl working directory detection.	2025-10-20 10:45:30 +02:00
Christoffer Martinsson	a3c9ac3617	Add ARK server directory mappings for accurate disk usage detection Map each ARK service to its specific data directory: - ark-island -> /var/lib/ark-servers/island - ark-scorched -> /var/lib/ark-servers/scorched - ark-center -> /var/lib/ark-servers/center - ark-aberration -> /var/lib/ark-servers/aberration - ark-extinction -> /var/lib/ark-servers/extinction - ark-ragnarok -> /var/lib/ark-servers/ragnarok - ark-valguero -> /var/lib/ark-servers/valguero Based on NixOS configuration in srv02/configuration.nix.	2025-10-20 10:15:30 +02:00
Christoffer Martinsson	f67779be9d	Add ARK game servers to systemd service monitoring	2025-10-19 19:23:51 +02:00
Christoffer Martinsson	5d52c5b1aa	Fix SMART data and site latency checking issues - Add sudo to disk collector smartctl commands for proper SMART data access - Add reqwest dependency with blocking feature for HTTP site checks - Replace curl-based site latency with reqwest HTTP client implementation - Maintain 2-second connect timeout and 5-second total timeout - Fix disk health UNKNOWN status by enabling proper SMART permissions - Fix nginx site timeout issues by using proper HTTP client with redirect support	2025-10-18 19:14:29 +02:00
Christoffer Martinsson	125111ee99	Implement comprehensive backup monitoring and fix timestamp issues - Add BackupCollector for reading TOML status files with disk space metrics - Implement BackupWidget with disk usage display and service status details - Fix backup script disk space parsing by adding missing capture_output=True - Update backup widget to show actual disk usage instead of repository size - Fix timestamp parsing to use backup completion time instead of start time - Resolve timezone issues by using UTC timestamps in backup script - Add disk identification metrics (product name, serial number) to backup status - Enhance UI layout with proper backup monitoring integration	2025-10-18 18:33:41 +02:00
Christoffer Martinsson	8a36472a3d	Implement real-time process monitoring and fix UI hardcoded data This commit addresses several key issues identified during development: Major Changes: - Replace hardcoded top CPU/RAM process display with real system data - Add intelligent process monitoring to CpuCollector using ps command - Fix disk metrics permission issues in systemd collector - Optimize service collection to focus on status, memory, and disk only - Update dashboard widgets to display live process information Process Monitoring Implementation: - Added collect_top_cpu_process() and collect_top_ram_process() methods - Implemented ps-based monitoring with accurate CPU percentages - Added filtering to prevent self-monitoring artifacts (ps commands) - Enhanced error handling and validation for process data - Dashboard now shows realistic values like "claude (PID 2974) 11.0%" Service Collection Optimization: - Removed CPU monitoring from systemd collector for efficiency - Enhanced service directory permission error logging - Simplified services widget to show essential metrics only - Fixed service-to-directory mapping accuracy UI and Dashboard Improvements: - Reorganized dashboard layout with btop-inspired multi-panel design - Updated system panel to include real top CPU/RAM process display - Enhanced widget formatting and data presentation - Removed placeholder/hardcoded data throughout the interface Technical Details: - Updated agent/src/collectors/cpu.rs with process monitoring - Modified dashboard/src/ui/mod.rs for real-time process display - Enhanced systemd collector error handling and disk metrics - Updated CLAUDE.md documentation with implementation details	2025-10-16 23:55:05 +02:00

1 2

63 Commits