Implement hysteresis for metric status changes to prevent flapping

Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets
2025-10-20 18:45:41 +02:00
parent e998679901
commit 00a8ed3da2
34 changed files with 1037 additions and 770 deletions
--- a/agent/src/collectors/systemd.rs
+++ b/agent/src/collectors/systemd.rs
@@ -1,6 +1,6 @@
 use anyhow::Result;
 use async_trait::async_trait;
-use cm_dashboard_shared::{Metric, MetricValue, Status};
+use cm_dashboard_shared::{Metric, MetricValue, Status, StatusTracker};
 use std::process::Command;
 use std::sync::RwLock;
 use std::time::Instant;
@@ -401,7 +401,7 @@ impl Collector for SystemdCollector {
        "systemd"
    }

-    async fn collect(&self) -> Result<Vec<Metric>, CollectorError> {
+    async fn collect(&self, _status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> {
        let start_time = Instant::now();
        debug!("Collecting systemd services metrics");