This commit is contained in:
Christoffer Martinsson 2025-10-12 22:31:46 +02:00
parent 4d8bacef50
commit 9e344fb66d
17 changed files with 283 additions and 297 deletions

106
CLAUDE.md
View File

@ -184,25 +184,103 @@ Keys: [←→] hosts [r]efresh [q]uit
Keys: [Enter] details [r]efresh [s]ort [f]ilter [q]uit Keys: [Enter] details [r]efresh [s]ort [f]ilter [q]uit
``` ```
## Development Status ## Architecture Principles - CRITICAL
### Immediate TODOs ### Agent-Dashboard Separation of Concerns
- Refactor all dashboard widgets to use a shared table/layout helper so icons, padding, and titles remain consistent across panels **AGENT IS SINGLE SOURCE OF TRUTH FOR ALL STATUS CALCULATIONS**
- Agent calculates status ("ok"/"warning"/"critical"/"unknown") using defined thresholds
- Agent sends status to dashboard via ZMQ
- Dashboard NEVER calculates status - only displays what agent provides
- Investigate why the backup metrics agent is not publishing data to the dashboard **Data Flow Architecture:**
- Resize the services widget so it can display more services without truncation ```
- Remove the dedicated status widget and redistribute the layout space Agent (calculations + thresholds) → Status → Dashboard (display only) → TableBuilder (colors)
- Add responsive scaling within each widget so columns and content adapt dynamically ```
### Phase 3: Advanced Features 🚧 IN PROGRESS **Status Handling Rules:**
- Agent provides status → Dashboard uses agent status
- Agent doesn't provide status → Dashboard shows "unknown" (NOT "ok")
- Dashboard widgets NEVER contain hardcoded thresholds
- TableBuilder converts status to colors for display
- [x] ZMQ gossip network implementation ### Current Agent Thresholds (as of 2025-10-12)
- [x] Comprehensive error handling
- [x] Performance optimizations **CPU Load (service.rs:392-400):**
- [ ] Predictive analytics for wear levels - Warning: ≥ 2.0 (testing value, was 5.0)
- [ ] Custom alert rules engine - Critical: ≥ 4.0 (testing value, was 8.0)
- [ ] Historical data export capabilities
**CPU Temperature (service.rs:412-420):**
- Warning: ≥ 70.0°C
- Critical: ≥ 80.0°C
**Memory Usage (service.rs:402-410):**
- Warning: ≥ 80%
- Critical: ≥ 95%
### Email Notifications
**System Configuration:**
- From: `{hostname}@cmtec.se` (e.g., cmbox@cmtec.se)
- To: `cm@cmtec.se`
- SMTP: localhost:25 (postfix)
- Timezone: Europe/Stockholm (not UTC)
**Notification Triggers:**
- Status degradation: any → "warning" or "critical"
- Recovery: "warning"/"critical" → "ok"
- Rate limiting: configurable (set to 0 for testing, 30 minutes for production)
**Monitored Components:**
- system.cpu (load status)
- system.cpu_temp (temperature status)
- system.memory (usage status)
- system.services (service health status)
- storage.smart (drive health)
- backup.overall (backup status)
### Pure Auto-Discovery Implementation
**Agent Configuration:**
- No config files required
- Auto-detects storage devices, services, backup systems
- Runtime discovery of system capabilities
- CLI: `cm-dashboard-agent [-v]` (only verbose flag)
**Service Discovery:**
- Scans running systemd services
- Filters by predefined interesting patterns (gitea, nginx, docker, etc.)
- No host-specific hardcoded service lists
### Current Implementation Status
**Completed:**
- [x] Pure auto-discovery agent (no config files)
- [x] Agent-side status calculations with defined thresholds
- [x] Dashboard displays agent status (no dashboard calculations)
- [x] Email notifications with Stockholm timezone
- [x] CPU temperature monitoring and notifications
- [x] ZMQ message format standardization
- [x] Removed all hardcoded dashboard thresholds
**Testing Configuration (REVERT FOR PRODUCTION):**
- CPU thresholds lowered to 2.0/4.0 for easy testing
- Email rate limiting disabled (0 minutes)
### Development Guidelines
**When Adding New Metrics:**
1. Agent calculates status with thresholds
2. Agent adds `{metric}_status` field to JSON output
3. Dashboard data structure adds `{metric}_status: Option<String>`
4. Dashboard uses `status_level_from_agent_status()` for display
5. Agent adds notification monitoring for status changes
**NEVER:**
- Add hardcoded thresholds to dashboard widgets
- Calculate status in dashboard with different thresholds than agent
- Use "ok" as default when agent status is missing (use "unknown")
- Calculate colors in widgets (TableBuilder's responsibility)
# Important Communication Guidelines # Important Communication Guidelines

88
Cargo.lock generated
View File

@ -220,6 +220,28 @@ dependencies = [
"windows-link", "windows-link",
] ]
[[package]]
name = "chrono-tz"
version = "0.8.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d59ae0466b83e838b81a54256c39d5d7c20b9d7daa10510a242d9b75abd5936e"
dependencies = [
"chrono",
"chrono-tz-build",
"phf",
]
[[package]]
name = "chrono-tz-build"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "433e39f13c9a060046954e0592a8d0a4bcb1040125cbf91cb8ee58964cfb350f"
dependencies = [
"parse-zoneinfo",
"phf",
"phf_codegen",
]
[[package]] [[package]]
name = "chumsky" name = "chumsky"
version = "0.9.3" version = "0.9.3"
@ -298,6 +320,7 @@ dependencies = [
"anyhow", "anyhow",
"async-trait", "async-trait",
"chrono", "chrono",
"chrono-tz",
"clap", "clap",
"cm-dashboard-shared", "cm-dashboard-shared",
"futures", "futures",
@ -1078,6 +1101,15 @@ dependencies = [
"windows-link", "windows-link",
] ]
[[package]]
name = "parse-zoneinfo"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1f2a05b18d44e2957b88f96ba460715e295bc1d7510468a2f3d3b44535d26c24"
dependencies = [
"regex",
]
[[package]] [[package]]
name = "paste" name = "paste"
version = "1.0.15" version = "1.0.15"
@ -1090,6 +1122,44 @@ version = "2.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220" checksum = "9b4f627cb1b25917193a259e49bdad08f671f8d9708acfd5fe0a8c1455d87220"
[[package]]
name = "phf"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fd6780a80ae0c52cc120a26a1a42c1ae51b247a253e4e06113d23d2c2edd078"
dependencies = [
"phf_shared",
]
[[package]]
name = "phf_codegen"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "aef8048c789fa5e851558d709946d6d79a8ff88c0440c587967f8e94bfb1216a"
dependencies = [
"phf_generator",
"phf_shared",
]
[[package]]
name = "phf_generator"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3c80231409c20246a13fddb31776fb942c38553c51e871f8cbd687a4cfb5843d"
dependencies = [
"phf_shared",
"rand",
]
[[package]]
name = "phf_shared"
version = "0.11.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "67eabc2ef2a60eb7faa00097bd1ffdb5bd28e62bf39990626a582201b7a754e5"
dependencies = [
"siphasher",
]
[[package]] [[package]]
name = "pin-project-lite" name = "pin-project-lite"
version = "0.2.16" version = "0.2.16"
@ -1248,6 +1318,18 @@ dependencies = [
"bitflags 2.9.4", "bitflags 2.9.4",
] ]
[[package]]
name = "regex"
version = "1.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4a52d8d02cacdb176ef4678de6c052efb4b3da14b78e4db683a4252762be5433"
dependencies = [
"aho-corasick",
"memchr",
"regex-automata",
"regex-syntax",
]
[[package]] [[package]]
name = "regex-automata" name = "regex-automata"
version = "0.4.12" version = "0.4.12"
@ -1395,6 +1477,12 @@ dependencies = [
"libc", "libc",
] ]
[[package]]
name = "siphasher"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "56199f7ddabf13fe5074ce809e7d3f42b42ae711800501b5b16ea82ad029c39d"
[[package]] [[package]]
name = "slab" name = "slab"
version = "0.4.11" version = "0.4.11"

View File

@ -10,7 +10,8 @@ async-trait = "0.1"
clap = { version = "4.0", features = ["derive"] } clap = { version = "4.0", features = ["derive"] }
serde = { version = "1.0", features = ["derive"] } serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0" serde_json = "1.0"
chrono = { version = "0.4", features = ["serde"] } chrono = { version = "0.4", features = ["serde", "clock"] }
chrono-tz = "0.8"
thiserror = "1.0" thiserror = "1.0"
tracing = "0.1" tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] } tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] }

View File

@ -12,7 +12,6 @@ use super::{AgentType, Collector, CollectorError, CollectorOutput};
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct BackupCollector { pub struct BackupCollector {
pub enabled: bool,
pub interval: Duration, pub interval: Duration,
pub restic_repo: Option<String>, pub restic_repo: Option<String>,
pub backup_service: String, pub backup_service: String,
@ -21,13 +20,12 @@ pub struct BackupCollector {
impl BackupCollector { impl BackupCollector {
pub fn new( pub fn new(
enabled: bool, _enabled: bool,
interval_ms: u64, interval_ms: u64,
restic_repo: Option<String>, restic_repo: Option<String>,
backup_service: String, backup_service: String,
) -> Self { ) -> Self {
Self { Self {
enabled,
interval: Duration::from_millis(interval_ms), interval: Duration::from_millis(interval_ms),
restic_repo, restic_repo,
backup_service, backup_service,
@ -300,13 +298,6 @@ impl Collector for BackupCollector {
self.interval self.interval
} }
fn is_enabled(&self) -> bool {
self.enabled
}
fn requires_root(&self) -> bool {
false // Depends on restic repo permissions
}
async fn collect(&self) -> Result<CollectorOutput, CollectorError> { async fn collect(&self) -> Result<CollectorOutput, CollectorError> {
// Try to get borgbackup metrics first, fall back to restic if not available // Try to get borgbackup metrics first, fall back to restic if not available
@ -383,9 +374,17 @@ impl Collector for BackupCollector {
last_message: None, last_message: None,
}); });
// Convert BackupStatus to standardized string format
let status_string = match overall_status {
BackupStatus::Healthy => "ok",
BackupStatus::Warning => "warning",
BackupStatus::Failed => "critical",
BackupStatus::Unknown => "unknown",
};
// Add disk information if available from borgbackup metrics // Add disk information if available from borgbackup metrics
let mut backup_json = json!({ let mut backup_json = json!({
"overall_status": overall_status, "overall_status": status_string,
"backup": backup_info, "backup": backup_info,
"service": service_data, "service": service_data,
"timestamp": Utc::now() "timestamp": Utc::now()
@ -407,7 +406,6 @@ impl Collector for BackupCollector {
Ok(CollectorOutput { Ok(CollectorOutput {
agent_type: AgentType::Backup, agent_type: AgentType::Backup,
data: backup_metrics, data: backup_metrics,
timestamp: Utc::now(),
}) })
} }
} }
@ -457,39 +455,25 @@ struct JournalEntry {
// Borgbackup metrics structure from backup script // Borgbackup metrics structure from backup script
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
struct BorgbackupMetrics { struct BorgbackupMetrics {
backup_name: String,
start_time: String,
end_time: String,
duration_seconds: i64,
status: String, status: String,
exit_codes: ExitCodes,
repository: Repository, repository: Repository,
backup_disk: BackupDisk, backup_disk: BackupDisk,
timestamp: i64, timestamp: i64,
} }
#[derive(Debug, Deserialize)]
struct ExitCodes {
global: i32,
backup: i32,
prune: i32,
compact: i32,
}
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
struct Repository { struct Repository {
total_archives: i32, total_archives: i32,
latest_archive_size_bytes: i64, latest_archive_size_bytes: i64,
total_repository_size_bytes: i64, total_repository_size_bytes: i64,
path: String,
} }
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
struct BackupDisk { struct BackupDisk {
device: String, device: String,
health: String, health: String,
total_bytes: i64, total_bytes: i64,
used_bytes: i64, used_bytes: i64,
available_bytes: i64,
usage_percent: f32, usage_percent: f32,
} }

View File

@ -1,5 +1,4 @@
use async_trait::async_trait; use async_trait::async_trait;
use chrono::{DateTime, Utc};
use serde_json::Value; use serde_json::Value;
use std::time::Duration; use std::time::Duration;
@ -17,7 +16,6 @@ pub use cm_dashboard_shared::envelope::AgentType;
pub struct CollectorOutput { pub struct CollectorOutput {
pub agent_type: AgentType, pub agent_type: AgentType,
pub data: Value, pub data: Value,
pub timestamp: DateTime<Utc>,
} }
#[async_trait] #[async_trait]
@ -26,10 +24,4 @@ pub trait Collector: Send + Sync {
fn agent_type(&self) -> AgentType; fn agent_type(&self) -> AgentType;
fn collect_interval(&self) -> Duration; fn collect_interval(&self) -> Duration;
async fn collect(&self) -> Result<CollectorOutput, CollectorError>; async fn collect(&self) -> Result<CollectorOutput, CollectorError>;
fn is_enabled(&self) -> bool {
true
}
fn requires_root(&self) -> bool {
false
}
} }

View File

@ -13,7 +13,6 @@ use super::{AgentType, Collector, CollectorError, CollectorOutput};
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct ServiceCollector { pub struct ServiceCollector {
pub enabled: bool,
pub interval: Duration, pub interval: Duration,
pub services: Vec<String>, pub services: Vec<String>,
pub timeout_ms: u64, pub timeout_ms: u64,
@ -29,9 +28,8 @@ pub(crate) struct CpuSample {
} }
impl ServiceCollector { impl ServiceCollector {
pub fn new(enabled: bool, interval_ms: u64, services: Vec<String>) -> Self { pub fn new(_enabled: bool, interval_ms: u64, services: Vec<String>) -> Self {
Self { Self {
enabled,
interval: Duration::from_millis(interval_ms), interval: Duration::from_millis(interval_ms),
services, services,
timeout_ms: 10000, // 10 second timeout for service checks timeout_ms: 10000, // 10 second timeout for service checks
@ -409,6 +407,16 @@ impl ServiceCollector {
} }
} }
fn determine_cpu_temp_status(&self, temp_c: f32) -> String {
if temp_c >= 80.0 {
"critical".to_string()
} else if temp_c >= 70.0 {
"warning".to_string()
} else {
"ok".to_string()
}
}
fn determine_services_status(&self, healthy: usize, degraded: usize, failed: usize) -> String { fn determine_services_status(&self, healthy: usize, degraded: usize, failed: usize) -> String {
if failed > 0 { if failed > 0 {
"critical".to_string() "critical".to_string()
@ -929,13 +937,6 @@ impl Collector for ServiceCollector {
self.interval self.interval
} }
fn is_enabled(&self) -> bool {
self.enabled
}
fn requires_root(&self) -> bool {
false // Most systemctl commands work without root
}
async fn collect(&self) -> Result<CollectorOutput, CollectorError> { async fn collect(&self) -> Result<CollectorOutput, CollectorError> {
let mut services = Vec::new(); let mut services = Vec::new();
@ -1013,6 +1014,7 @@ impl Collector for ServiceCollector {
let cpu_cstate_info = self.get_cpu_cstate_info().await; let cpu_cstate_info = self.get_cpu_cstate_info().await;
let cpu_temp_c = self.get_cpu_temperature_c().await; let cpu_temp_c = self.get_cpu_temperature_c().await;
let cpu_temp_status = cpu_temp_c.map(|temp| self.determine_cpu_temp_status(temp));
let (gpu_load_percent, gpu_temp_c) = self.get_gpu_metrics().await; let (gpu_load_percent, gpu_temp_c) = self.get_gpu_metrics().await;
// If no specific quotas are set, use system memory as reference // If no specific quotas are set, use system memory as reference
@ -1039,6 +1041,7 @@ impl Collector for ServiceCollector {
"cpu_status": cpu_status, "cpu_status": cpu_status,
"cpu_cstate": cpu_cstate_info, "cpu_cstate": cpu_cstate_info,
"cpu_temp_c": cpu_temp_c, "cpu_temp_c": cpu_temp_c,
"cpu_temp_status": cpu_temp_status,
"gpu_load_percent": gpu_load_percent, "gpu_load_percent": gpu_load_percent,
"gpu_temp_c": gpu_temp_c, "gpu_temp_c": gpu_temp_c,
}, },
@ -1049,7 +1052,6 @@ impl Collector for ServiceCollector {
Ok(CollectorOutput { Ok(CollectorOutput {
agent_type: AgentType::Service, agent_type: AgentType::Service,
data: service_metrics, data: service_metrics,
timestamp: Utc::now(),
}) })
} }
} }

View File

@ -12,16 +12,14 @@ use super::{AgentType, Collector, CollectorError, CollectorOutput};
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct SmartCollector { pub struct SmartCollector {
pub enabled: bool,
pub interval: Duration, pub interval: Duration,
pub devices: Vec<String>, pub devices: Vec<String>,
pub timeout_ms: u64, pub timeout_ms: u64,
} }
impl SmartCollector { impl SmartCollector {
pub fn new(enabled: bool, interval_ms: u64, devices: Vec<String>) -> Self { pub fn new(_enabled: bool, interval_ms: u64, devices: Vec<String>) -> Self {
Self { Self {
enabled,
interval: Duration::from_millis(interval_ms), interval: Duration::from_millis(interval_ms),
devices, devices,
timeout_ms: 30000, // 30 second timeout for smartctl timeout_ms: 30000, // 30 second timeout for smartctl
@ -274,13 +272,6 @@ impl Collector for SmartCollector {
self.interval self.interval
} }
fn is_enabled(&self) -> bool {
self.enabled
}
fn requires_root(&self) -> bool {
true // smartctl typically requires root access
}
async fn collect(&self) -> Result<CollectorOutput, CollectorError> { async fn collect(&self) -> Result<CollectorOutput, CollectorError> {
let mut drives = Vec::new(); let mut drives = Vec::new();
@ -327,11 +318,11 @@ impl Collector for SmartCollector {
let disk_usage = self.get_disk_usage().await?; let disk_usage = self.get_disk_usage().await?;
let status = if critical > 0 { let status = if critical > 0 {
"CRITICAL" "critical"
} else if warning > 0 { } else if warning > 0 {
"WARNING" "warning"
} else { } else {
"HEALTHY" "ok"
}; };
let smart_metrics = json!({ let smart_metrics = json!({
@ -352,7 +343,6 @@ impl Collector for SmartCollector {
Ok(CollectorOutput { Ok(CollectorOutput {
agent_type: AgentType::Smart, agent_type: AgentType::Smart,
data: smart_metrics, data: smart_metrics,
timestamp: Utc::now(),
}) })
} }
} }

View File

@ -1,5 +1,6 @@
use std::collections::HashMap; use std::collections::HashMap;
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use chrono_tz::Europe::Stockholm;
use lettre::{Message, SmtpTransport, Transport}; use lettre::{Message, SmtpTransport, Transport};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use tracing::{info, error, warn}; use tracing::{info, error, warn};
@ -81,15 +82,21 @@ impl NotificationManager {
fn should_notify(&mut self, change: &StatusChange) -> bool { fn should_notify(&mut self, change: &StatusChange) -> bool {
if !self.config.enabled { if !self.config.enabled {
info!("Notifications disabled, skipping {}.{}", change.component, change.metric);
return false; return false;
} }
// Only notify on transitions to warning/critical, or recovery to ok // Only notify on transitions to warning/critical, or recovery to ok
match (change.old_status.as_str(), change.new_status.as_str()) { let should_send = match (change.old_status.as_str(), change.new_status.as_str()) {
(_, "warning") | (_, "critical") => true, (_, "warning") | (_, "critical") => true,
("warning" | "critical", "ok") => true, ("warning" | "critical", "ok") => true,
_ => false, _ => false,
} };
info!("Status change {}.{}: {} -> {} (notify: {})",
change.component, change.metric, change.old_status, change.new_status, should_send);
should_send
} }
fn is_rate_limited(&mut self, change: &StatusChange) -> bool { fn is_rate_limited(&mut self, change: &StatusChange) -> bool {
@ -98,11 +105,14 @@ impl NotificationManager {
if let Some(last_time) = self.last_notification.get(&key) { if let Some(last_time) = self.last_notification.get(&key) {
let minutes_since = Utc::now().signed_duration_since(*last_time).num_minutes(); let minutes_since = Utc::now().signed_duration_since(*last_time).num_minutes();
if minutes_since < self.config.rate_limit_minutes as i64 { if minutes_since < self.config.rate_limit_minutes as i64 {
info!("Rate limiting {}.{}: {} minutes since last notification (limit: {})",
change.component, change.metric, minutes_since, self.config.rate_limit_minutes);
return true; return true;
} }
} }
self.last_notification.insert(key, Utc::now()); self.last_notification.insert(key.clone(), Utc::now());
info!("Not rate limited {}.{}, sending notification", change.component, change.metric);
false false
} }
@ -161,8 +171,8 @@ impl NotificationManager {
change.metric, change.metric,
change.old_status, change.old_status,
change.new_status, change.new_status,
change.timestamp.format("%Y-%m-%d %H:%M:%S UTC"), change.timestamp.with_timezone(&Stockholm).format("%Y-%m-%d %H:%M:%S CET/CEST"),
Utc::now().format("%Y-%m-%d %H:%M:%S UTC") Utc::now().with_timezone(&Stockholm).format("%Y-%m-%d %H:%M:%S CET/CEST")
) )
} }

View File

@ -41,7 +41,7 @@ impl SimpleAgent {
smtp_port: 25, smtp_port: 25,
from_email: format!("{}@cmtec.se", hostname), from_email: format!("{}@cmtec.se", hostname),
to_email: "cm@cmtec.se".to_string(), to_email: "cm@cmtec.se".to_string(),
rate_limit_minutes: 30, rate_limit_minutes: 0, // Disabled for testing
}; };
let notification_manager = NotificationManager::new(notification_config.clone()); let notification_manager = NotificationManager::new(notification_config.clone());
info!("Notifications: {} -> {}", notification_config.from_email, notification_config.to_email); info!("Notifications: {} -> {}", notification_config.from_email, notification_config.to_email);
@ -164,6 +164,7 @@ impl SimpleAgent {
// Check CPU status // Check CPU status
if let Some(cpu_status) = summary.get("cpu_status").and_then(|v| v.as_str()) { if let Some(cpu_status) = summary.get("cpu_status").and_then(|v| v.as_str()) {
if let Some(change) = self.notification_manager.update_status("system", "cpu", cpu_status) { if let Some(change) = self.notification_manager.update_status("system", "cpu", cpu_status) {
info!("CPU status change detected: {} -> {}", change.old_status, change.new_status);
self.notification_manager.send_notification(change).await; self.notification_manager.send_notification(change).await;
} }
} }
@ -175,6 +176,14 @@ impl SimpleAgent {
} }
} }
// Check CPU temperature status
if let Some(cpu_temp_status) = summary.get("cpu_temp_status").and_then(|v| v.as_str()) {
if let Some(change) = self.notification_manager.update_status("system", "cpu_temp", cpu_temp_status) {
info!("CPU temp status change detected: {} -> {}", change.old_status, change.new_status);
self.notification_manager.send_notification(change).await;
}
}
// Check services status // Check services status
if let Some(services_status) = summary.get("services_status").and_then(|v| v.as_str()) { if let Some(services_status) = summary.get("services_status").and_then(|v| v.as_str()) {
if let Some(change) = self.notification_manager.update_status("system", "services", services_status) { if let Some(change) = self.notification_manager.update_status("system", "services", services_status) {

View File

@ -259,7 +259,7 @@ impl App {
if service_metrics.timestamp != timestamp { if service_metrics.timestamp != timestamp {
service_metrics.timestamp = timestamp; service_metrics.timestamp = timestamp;
} }
let mut snapshot = service_metrics.clone(); let snapshot = service_metrics.clone();
// No more need for dashboard-side description caching since agent handles it // No more need for dashboard-side description caching since agent handles it

View File

@ -71,6 +71,8 @@ pub struct ServiceSummary {
#[serde(default)] #[serde(default)]
pub cpu_temp_c: Option<f32>, pub cpu_temp_c: Option<f32>,
#[serde(default)] #[serde(default)]
pub cpu_temp_status: Option<String>,
#[serde(default)]
pub gpu_load_percent: Option<f32>, pub gpu_load_percent: Option<f32>,
#[serde(default)] #[serde(default)]
pub gpu_temp_c: Option<f32>, pub gpu_temp_c: Option<f32>,
@ -100,7 +102,7 @@ pub enum ServiceStatus {
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BackupMetrics { pub struct BackupMetrics {
pub overall_status: BackupStatus, pub overall_status: String,
pub backup: BackupInfo, pub backup: BackupInfo,
pub service: BackupServiceInfo, pub service: BackupServiceInfo,
#[serde(default)] #[serde(default)]

View File

@ -1,6 +1,5 @@
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use ratatui::layout::{Constraint, Rect}; use ratatui::layout::Rect;
use ratatui::style::Color;
use ratatui::Frame; use ratatui::Frame;
use crate::app::HostDisplayData; use crate::app::HostDisplayData;
@ -8,17 +7,7 @@ use crate::ui::system::{evaluate_performance, PerfSeverity};
use crate::ui::widget::{render_widget_data, WidgetData, WidgetStatus, StatusLevel}; use crate::ui::widget::{render_widget_data, WidgetData, WidgetStatus, StatusLevel};
pub fn render(frame: &mut Frame, hosts: &[HostDisplayData], area: Rect) { pub fn render(frame: &mut Frame, hosts: &[HostDisplayData], area: Rect) {
let (severity, ok_count, warn_count, fail_count) = classify_hosts(hosts); let (severity, _ok_count, _warn_count, _fail_count) = classify_hosts(hosts);
let mut color = match severity {
AlertSeverity::Critical => Color::Red,
AlertSeverity::Warning => Color::Yellow,
AlertSeverity::Healthy => Color::Green,
AlertSeverity::Unknown => Color::Gray,
};
if hosts.is_empty() {
color = Color::Gray;
}
let title = "Alerts".to_string(); let title = "Alerts".to_string();
@ -140,9 +129,9 @@ fn host_severity(host: &HostDisplayData) -> AlertSeverity {
} }
if let Some(backup) = host.backup.as_ref() { if let Some(backup) = host.backup.as_ref() {
match backup.overall_status { match backup.overall_status.as_str() {
crate::data::metrics::BackupStatus::Failed => return AlertSeverity::Critical, "critical" => return AlertSeverity::Critical,
crate::data::metrics::BackupStatus::Warning => return AlertSeverity::Warning, "warning" => return AlertSeverity::Warning,
_ => {} _ => {}
} }
} }
@ -211,15 +200,15 @@ fn host_status(host: &HostDisplayData) -> (String, AlertSeverity, bool) {
} }
if let Some(backup) = host.backup.as_ref() { if let Some(backup) = host.backup.as_ref() {
match backup.overall_status { match backup.overall_status.as_str() {
crate::data::metrics::BackupStatus::Failed => { "critical" => {
return ( return (
"critical: backup failed".to_string(), "critical: backup failed".to_string(),
AlertSeverity::Critical, AlertSeverity::Critical,
true, true,
); );
} }
crate::data::metrics::BackupStatus::Warning => { "warning" => {
return ( return (
"warning: backup warning".to_string(), "warning: backup warning".to_string(),
AlertSeverity::Warning, AlertSeverity::Warning,
@ -243,14 +232,6 @@ fn host_status(host: &HostDisplayData) -> (String, AlertSeverity, bool) {
("ok".to_string(), AlertSeverity::Healthy, false) ("ok".to_string(), AlertSeverity::Healthy, false)
} }
fn severity_color(severity: AlertSeverity) -> Color {
match severity {
AlertSeverity::Critical => Color::Red,
AlertSeverity::Warning => Color::Yellow,
AlertSeverity::Healthy => Color::Green,
AlertSeverity::Unknown => Color::Gray,
}
}
fn latest_timestamp(host: &HostDisplayData) -> Option<DateTime<Utc>> { fn latest_timestamp(host: &HostDisplayData) -> Option<DateTime<Utc>> {
let mut latest = host.last_success; let mut latest = host.last_success;
@ -279,11 +260,3 @@ fn latest_timestamp(host: &HostDisplayData) -> Option<DateTime<Utc>> {
latest latest
} }
fn severity_symbol(severity: AlertSeverity) -> &'static str {
match severity {
AlertSeverity::Critical => "",
AlertSeverity::Warning => "!",
AlertSeverity::Healthy => "",
AlertSeverity::Unknown => "?",
}
}

View File

@ -1,10 +1,9 @@
use ratatui::layout::Rect; use ratatui::layout::Rect;
use ratatui::style::Color;
use ratatui::Frame; use ratatui::Frame;
use crate::app::HostDisplayData; use crate::app::HostDisplayData;
use crate::data::metrics::{BackupMetrics, BackupStatus}; use crate::data::metrics::BackupMetrics;
use crate::ui::widget::{render_placeholder, render_widget_data, WidgetData, WidgetStatus, StatusLevel}; use crate::ui::widget::{render_placeholder, render_widget_data, status_level_from_agent_status, WidgetData, WidgetStatus, StatusLevel};
pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) { pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) {
match host { match host {
@ -25,12 +24,7 @@ pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) {
} }
fn render_metrics(frame: &mut Frame, _host: &HostDisplayData, metrics: &BackupMetrics, area: Rect) { fn render_metrics(frame: &mut Frame, _host: &HostDisplayData, metrics: &BackupMetrics, area: Rect) {
let widget_status = match metrics.overall_status { let widget_status = status_level_from_agent_status(Some(&metrics.overall_status));
BackupStatus::Failed => StatusLevel::Error,
BackupStatus::Warning => StatusLevel::Warning,
BackupStatus::Unknown => StatusLevel::Unknown,
BackupStatus::Healthy => StatusLevel::Ok,
};
let mut data = WidgetData::new( let mut data = WidgetData::new(
"Backups", "Backups",
@ -93,46 +87,4 @@ fn render_metrics(frame: &mut Frame, _host: &HostDisplayData, metrics: &BackupMe
render_widget_data(frame, area, data); render_widget_data(frame, area, data);
} }
fn backup_status_color(status: &BackupStatus) -> Color {
match status {
BackupStatus::Failed => Color::Red,
BackupStatus::Warning => Color::Yellow,
BackupStatus::Unknown => Color::LightYellow,
BackupStatus::Healthy => Color::Green,
}
}
fn format_timestamp(timestamp: Option<&chrono::DateTime<chrono::Utc>>) -> String {
timestamp
.map(|ts| ts.format("%Y-%m-%d %H:%M:%S").to_string())
.unwrap_or_else(|| "".to_string())
}
fn repo_status_level(metrics: &BackupMetrics) -> StatusLevel {
match metrics.overall_status {
BackupStatus::Failed => StatusLevel::Error,
BackupStatus::Warning => StatusLevel::Warning,
_ => {
if metrics.backup.snapshot_count > 0 {
StatusLevel::Ok
} else {
StatusLevel::Warning
}
}
}
}
fn service_status_level(metrics: &BackupMetrics) -> StatusLevel {
match metrics.overall_status {
BackupStatus::Failed => StatusLevel::Error,
BackupStatus::Warning => StatusLevel::Warning,
BackupStatus::Unknown => StatusLevel::Unknown,
BackupStatus::Healthy => {
if metrics.service.enabled {
StatusLevel::Ok
} else {
StatusLevel::Warning
}
}
}
}

View File

@ -1,9 +1,8 @@
use ratatui::layout::Rect; use ratatui::layout::Rect;
use ratatui::style::Color;
use ratatui::Frame; use ratatui::Frame;
use crate::app::HostDisplayData; use crate::app::HostDisplayData;
use crate::data::metrics::{ServiceStatus, ServiceSummary}; use crate::data::metrics::ServiceStatus;
use crate::ui::widget::{render_placeholder, render_widget_data, status_level_from_agent_status, WidgetData, WidgetStatus, StatusLevel}; use crate::ui::widget::{render_placeholder, render_widget_data, status_level_from_agent_status, WidgetData, WidgetStatus, StatusLevel};
pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) { pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) {
@ -31,7 +30,6 @@ fn render_metrics(
area: Rect, area: Rect,
) { ) {
let summary = &metrics.summary; let summary = &metrics.summary;
let color = summary_color(summary);
let title = "Services".to_string(); let title = "Services".to_string();
// Use agent-calculated services status // Use agent-calculated services status
@ -105,24 +103,6 @@ fn status_weight(status: &ServiceStatus) -> i32 {
} }
} }
fn status_symbol(status: &ServiceStatus) -> (&'static str, Color) {
match status {
ServiceStatus::Running => ("", Color::Green),
ServiceStatus::Degraded => ("!", Color::Yellow),
ServiceStatus::Restarting => ("", Color::Yellow),
ServiceStatus::Stopped => ("", Color::Red),
}
}
fn summary_color(summary: &ServiceSummary) -> Color {
if summary.failed > 0 {
Color::Red
} else if summary.degraded > 0 {
Color::Yellow
} else {
Color::Green
}
}
fn format_memory_value(used: f32, quota: f32) -> String { fn format_memory_value(used: f32, quota: f32) -> String {
let used_gb = used / 1000.0; let used_gb = used / 1000.0;

View File

@ -1,10 +1,9 @@
use ratatui::layout::Rect; use ratatui::layout::Rect;
use ratatui::style::Color;
use ratatui::Frame; use ratatui::Frame;
use crate::app::HostDisplayData; use crate::app::HostDisplayData;
use crate::data::metrics::SmartMetrics; use crate::data::metrics::SmartMetrics;
use crate::ui::widget::{render_placeholder, render_widget_data, WidgetData, WidgetStatus, StatusLevel}; use crate::ui::widget::{render_placeholder, render_widget_data, status_level_from_agent_status, WidgetData, WidgetStatus, StatusLevel};
pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) { pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) {
match host { match host {
@ -25,16 +24,9 @@ pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) {
} }
fn render_metrics(frame: &mut Frame, _host: &HostDisplayData, metrics: &SmartMetrics, area: Rect) { fn render_metrics(frame: &mut Frame, _host: &HostDisplayData, metrics: &SmartMetrics, area: Rect) {
let color = smart_status_color(&metrics.status);
let title = "Storage".to_string(); let title = "Storage".to_string();
let widget_status = if metrics.summary.critical > 0 { let widget_status = status_level_from_agent_status(Some(&metrics.status));
StatusLevel::Error
} else if metrics.summary.warning > 0 {
StatusLevel::Warning
} else {
StatusLevel::Ok
};
let mut data = WidgetData::new( let mut data = WidgetData::new(
title, title,
@ -95,13 +87,6 @@ fn render_metrics(frame: &mut Frame, _host: &HostDisplayData, metrics: &SmartMet
render_widget_data(frame, area, data); render_widget_data(frame, area, data);
} }
fn smart_status_color(status: &str) -> Color {
match status.to_uppercase().as_str() {
"CRITICAL" => Color::Red,
"WARNING" => Color::Yellow,
_ => Color::Green,
}
}
fn format_temperature(value: f32) -> String { fn format_temperature(value: f32) -> String {
if value.abs() < f32::EPSILON { if value.abs() < f32::EPSILON {

View File

@ -1,12 +1,11 @@
use ratatui::layout::Rect; use ratatui::layout::Rect;
use ratatui::style::Color;
use ratatui::Frame; use ratatui::Frame;
use crate::app::HostDisplayData; use crate::app::HostDisplayData;
use crate::data::metrics::{ServiceMetrics, ServiceSummary}; use crate::data::metrics::{ServiceMetrics, ServiceSummary};
use crate::ui::widget::{ use crate::ui::widget::{
combined_color, render_placeholder, render_combined_widget_data, status_color_for_cpu_load, status_color_from_metric, render_placeholder, render_combined_widget_data,
status_color_from_percentage, status_level_from_agent_status, WidgetDataSet, WidgetStatus, StatusLevel, status_level_from_agent_status, WidgetDataSet, WidgetStatus, StatusLevel,
}; };
pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) { pub fn render(frame: &mut Frame, host: Option<&HostDisplayData>, area: Rect) {
@ -44,33 +43,19 @@ fn render_metrics(
} else { } else {
summary.memory_used_mb summary.memory_used_mb
}; };
let usage_ratio = if system_total > 0.0 { let _usage_ratio = if system_total > 0.0 {
(system_used / system_total) * 100.0 (system_used / system_total) * 100.0
} else { } else {
0.0 0.0
}; };
let (perf_severity, _reason) = evaluate_performance(summary); let (perf_severity, _reason) = evaluate_performance(summary);
let border_color = match perf_severity { // Dashboard should NOT calculate border colors - agent is the source of truth
PerfSeverity::Critical => Color::Red,
PerfSeverity::Warning => Color::Yellow,
PerfSeverity::Ok => Color::Green,
};
// Use agent-calculated statuses instead of dashboard calculations // Use agent-calculated statuses instead of dashboard calculations
let memory_status = status_level_from_agent_status(summary.memory_status.as_ref()); let memory_status = status_level_from_agent_status(summary.memory_status.as_ref());
let cpu_status = status_level_from_agent_status(summary.cpu_status.as_ref()); let cpu_status = status_level_from_agent_status(summary.cpu_status.as_ref());
let cpu_temp_color = status_color_from_metric(summary.cpu_temp_c, 80.0, 90.0); // Dashboard should NOT calculate colors - agent is the source of truth
let gpu_load_color = summary
.gpu_load_percent
.map(|value| status_color_from_percentage(value, 85.0, 95.0))
.unwrap_or(Color::Green);
let gpu_temp_color = summary
.gpu_temp_c
.map(|value| status_color_from_metric(Some(value), 75.0, 85.0))
.unwrap_or(Color::Green);
let gpu_icon_color = combined_color(&[gpu_load_color, gpu_temp_color]);
// Memory dataset - use agent-calculated status // Memory dataset - use agent-calculated status
let mut memory_dataset = WidgetDataSet::new(vec!["Memory usage".to_string()], Some(WidgetStatus::new(memory_status))); let mut memory_dataset = WidgetDataSet::new(vec!["Memory usage".to_string()], Some(WidgetStatus::new(memory_status)));
@ -156,7 +141,8 @@ fn render_metrics(
} }
// GPU dataset // GPU dataset
let gpu_status = status_level_from_color(gpu_icon_color); // GPU status should come from agent when available
let gpu_status = StatusLevel::Unknown; // Default until agent provides gpu_status
let mut gpu_dataset = WidgetDataSet::new(vec!["GPU load".to_string(), "GPU temp".to_string()], Some(WidgetStatus::new(gpu_status))); let mut gpu_dataset = WidgetDataSet::new(vec!["GPU load".to_string(), "GPU temp".to_string()], Some(WidgetStatus::new(gpu_status)));
gpu_dataset.add_row( gpu_dataset.add_row(
Some(WidgetStatus::new(gpu_status)), Some(WidgetStatus::new(gpu_status)),
@ -206,13 +192,6 @@ fn format_optional_percent(value: Option<f32>) -> String {
} }
} }
fn status_level_from_color(color: Color) -> StatusLevel {
match color {
Color::Red => StatusLevel::Error,
Color::Yellow => StatusLevel::Warning,
_ => StatusLevel::Ok,
}
}
pub(crate) fn evaluate_performance(summary: &ServiceSummary) -> (PerfSeverity, Option<String>) { pub(crate) fn evaluate_performance(summary: &ServiceSummary) -> (PerfSeverity, Option<String>) {
let mem_percent = if summary.system_memory_total_mb > 0.0 { let mem_percent = if summary.system_memory_total_mb > 0.0 {
@ -233,43 +212,38 @@ pub(crate) fn evaluate_performance(summary: &ServiceSummary) -> (PerfSeverity, O
} }
}; };
if mem_percent >= 95.0 { // Use agent's memory status instead of hardcoded thresholds
consider(PerfSeverity::Critical, format!("RAM {:.0}%", mem_percent)); if let Some(memory_status) = &summary.memory_status {
} else if mem_percent >= 80.0 { match memory_status.as_str() {
consider(PerfSeverity::Warning, format!("RAM {:.0}%", mem_percent)); "critical" => consider(PerfSeverity::Critical, format!("RAM {:.0}%", mem_percent)),
} "warning" => consider(PerfSeverity::Warning, format!("RAM {:.0}%", mem_percent)),
_ => {} // "ok" - no alert needed
let load = summary.cpu_load_5;
if load >= 4.0 {
consider(PerfSeverity::Critical, format!("CPU load {:.2}", load));
} else if load >= 2.0 {
consider(PerfSeverity::Warning, format!("CPU load {:.2}", load));
}
if let Some(temp) = summary.cpu_temp_c {
if temp >= 90.0 {
consider(PerfSeverity::Critical, format!("CPU temp {:.0}°C", temp));
} else if temp >= 80.0 {
consider(PerfSeverity::Warning, format!("CPU temp {:.0}°C", temp));
} }
} }
if let Some(load) = summary.gpu_load_percent { // Use agent's CPU status instead of hardcoded thresholds
if load >= 95.0 { if let Some(cpu_status) = &summary.cpu_status {
consider(PerfSeverity::Critical, format!("GPU load {:.0}%", load)); match cpu_status.as_str() {
} else if load >= 85.0 { "critical" => consider(PerfSeverity::Critical, format!("CPU load {:.2}", summary.cpu_load_5)),
consider(PerfSeverity::Warning, format!("GPU load {:.0}%", load)); "warning" => consider(PerfSeverity::Warning, format!("CPU load {:.2}", summary.cpu_load_5)),
_ => {} // "ok" - no alert needed
} }
} }
if let Some(temp) = summary.gpu_temp_c { // Use agent's CPU temperature status instead of hardcoded thresholds
if temp >= 85.0 { if let Some(cpu_temp_status) = &summary.cpu_temp_status {
consider(PerfSeverity::Critical, format!("GPU temp {:.0}°C", temp)); if let Some(temp) = summary.cpu_temp_c {
} else if temp >= 75.0 { match cpu_temp_status.as_str() {
consider(PerfSeverity::Warning, format!("GPU temp {:.0}°C", temp)); "critical" => consider(PerfSeverity::Critical, format!("CPU temp {:.0}°C", temp)),
"warning" => consider(PerfSeverity::Warning, format!("CPU temp {:.0}°C", temp)),
_ => {} // "ok" - no alert needed
}
} }
} }
// TODO: GPU status should come from agent, not calculated here with hardcoded thresholds
// For now, remove hardcoded GPU thresholds until agent provides gpu_status
if severity == PerfSeverity::Ok { if severity == PerfSeverity::Ok {
(PerfSeverity::Ok, None) (PerfSeverity::Ok, None)
} else { } else {

View File

@ -24,33 +24,8 @@ fn neutral_border_style(color: Color) -> Style {
Style::default().fg(color) Style::default().fg(color)
} }
pub fn status_color_from_percentage(value: f32, warn: f32, crit: f32) -> Color {
if value >= crit {
Color::Red
} else if value >= warn {
Color::Yellow
} else {
Color::Green
}
}
pub fn status_color_from_metric(value: Option<f32>, warn: f32, crit: f32) -> Color {
match value {
Some(v) if v >= crit => Color::Red,
Some(v) if v >= warn => Color::Yellow,
_ => Color::Green,
}
}
pub fn status_color_for_cpu_load(load: f32) -> Color {
if load >= 8.0 {
Color::Red
} else if load >= 5.0 {
Color::Yellow
} else {
Color::Green
}
}
pub fn status_level_from_agent_status(agent_status: Option<&String>) -> StatusLevel { pub fn status_level_from_agent_status(agent_status: Option<&String>) -> StatusLevel {
match agent_status.map(|s| s.as_str()) { match agent_status.map(|s| s.as_str()) {
@ -62,15 +37,6 @@ pub fn status_level_from_agent_status(agent_status: Option<&String>) -> StatusLe
} }
} }
pub fn combined_color(colors: &[Color]) -> Color {
if colors.iter().any(|&c| c == Color::Red) {
Color::Red
} else if colors.iter().any(|&c| c == Color::Yellow) {
Color::Yellow
} else {
Color::Green
}
}
pub fn render_placeholder(frame: &mut Frame, area: Rect, title: &str, message: &str) { pub fn render_placeholder(frame: &mut Frame, area: Rect, title: &str, message: &str) {