Implement hysteresis for metric status changes to prevent flapping

Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.

Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C

Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface

Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)

Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
This commit is contained in:
2025-10-20 18:45:41 +02:00
parent e998679901
commit 00a8ed3da2
34 changed files with 1037 additions and 770 deletions

View File

@@ -329,7 +329,7 @@ Agent → ["cpu_load_1min", "memory_usage_percent", ...] → Dashboard → Widge
- [x] All collectors output standardized status strings (ok/warning/critical/unknown)
- [x] Dashboard connection loss detection with 5-second keep-alive
- [x] Removed excessive logging from agent
- [x] Fixed all compiler warnings in both agent and dashboard
- [x] Reduced initial compiler warnings from excessive logging cleanup
- [x] **SystemCollector architecture refactoring completed (2025-10-12)**
- [x] Created SystemCollector for CPU load, memory, temperature, C-states
- [x] Moved system metrics from ServiceCollector to SystemCollector
@@ -376,6 +376,12 @@ Agent → ["cpu_load_1min", "memory_usage_percent", ...] → Dashboard → Widge
- [x] Resolved timezone issues by using UTC timestamps in backup script
- [x] Added disk identification metrics (product name, serial number) to backup status
- [x] Enhanced UI layout with proper backup monitoring integration
- [x] **Complete warning elimination and code cleanup (2025-10-18)**
- [x] Removed all unused code including widget subscription system and WidgetType enum
- [x] Eliminated unused cache utilities, error variants, and theme functions
- [x] Removed unused struct fields and imports throughout codebase
- [x] Fixed lifetime warnings and replaced subscription-based widgets with direct metric filtering
- [x] Achieved zero build warnings in both agent and dashboard (down from 46 total warnings)
**Production Configuration:**
- CPU load thresholds: Warning ≥ 9.0, Critical ≥ 10.0