cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	1e0510be81	Add comprehensive timeouts to all blocking system commands Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission. Changes: - Add run_command_with_timeout() wrapper using tokio for async command execution - Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives) - Apply 5s timeout to du, lsblk, systemctl list commands - Apply 3s timeout to systemctl show/is-active, df, ip commands - Apply 2s timeout to hostname command - Use system 'timeout' command for sync operations where async not needed Critical fixes: - smartctl: Failing drives could block for 30+ seconds per drive - du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds - systemctl/docker: Commands could block indefinitely during system issues With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.	2025-11-27 16:34:08 +01:00
Christoffer Martinsson	fc247bd0ad	Create dedicated network collector with physical/virtual interface grouping All checks were successful Build and Release / build-and-release (push) Successful in 1m43s Details Move network collection from NixOS collector to dedicated NetworkCollector. Add link status detection for physical interfaces (up/down). Group interfaces by physical/virtual, show status icons for physical NICs only. Down interfaces show as Inactive instead of Critical. Version bump to 0.1.165	2025-11-26 19:02:50 +01:00
Christoffer Martinsson	2b2cb2da3e	Complete atomic migration to structured data architecture All checks were successful Build and Release / build-and-release (push) Successful in 1m7s Details Implements clean structured data collection eliminating all string metric parsing bugs. Collectors now populate AgentData directly with type-safe field access. Key improvements: - Mount points preserved correctly (/ and /boot instead of root/boot) - Tmpfs discovery added to memory collector - Temperature data flows as typed f32 fields - Zero string parsing overhead - Complete removal of MetricCollectionManager bridge - Direct ZMQ transmission of structured JSON All functionality maintained: service tracking, notifications, status evaluation, and multi-host monitoring.	2025-11-24 18:53:31 +01:00
Christoffer Martinsson	c8e26b9bac	Remove redundant smart collector - consolidate SMART into disk collector - Remove separate smart collector implementation - Disk collector already handles SMART data for drives - Eliminates duplicate smartctl calls causing performance issues - SMART functionality remains in logical place with disk monitoring - Fixes infinite smartctl loop issue	2025-10-25 22:25:22 +02:00
Christoffer Martinsson	83cb43bcf1	Restore missing smart collector implementation Some checks failed Build and Release / build-and-release (push) Failing after 1m24s Details - Rewrite smart collector to match current architecture - Add back to mod.rs exports - Fixes infinite smartctl loop issue - Uses simple health and temperature monitoring	2025-10-25 16:59:09 +02:00
Christoffer Martinsson	4b54a59e35	Remove unused code and eliminate compiler warnings - Remove unused fields from CommandStatus variants - Clean up unused methods and unused collector fields - Fix lifetime syntax warning in SystemWidget - Delete unused cache module completely - Remove redundant render methods from widgets All agent and dashboard warnings eliminated while preserving panel switching and scrolling functionality.	2025-10-25 14:15:52 +02:00
Christoffer Martinsson	39fc9cd22f	Implement unified system widget with NixOS info, CPU, RAM, and Storage - Create NixOS collector for version and active users detection - Add SystemWidget combining all system information in TODO.md layout - Replace separate CPU/Memory widgets with unified system display - Add tree structure for storage with drive temperature/wear info - Support NixOS version, active users, load averages, memory usage - Follow exact decimal formatting from specification	2025-10-23 14:01:14 +02:00
Christoffer Martinsson	eb268922bd	Remove all unused code and fix build warnings - Remove unused struct fields: tier, config_name, last_collection_time - Remove unused structs: PerformanceMetrics, PerfMonitor - Remove unused methods: get_performance_metrics, get_collector_names, get_stats - Remove unused utility functions and system helpers - Remove unused config fields from CPU and Memory collectors - Keep config fields that are actually used (DiskCollector, etc.) - Remove unused proxy_pass_url variable and assignments - Fix duplicate hostname variable declaration - Achieve zero build warnings without functionality changes	2025-10-20 20:20:47 +02:00
Christoffer Martinsson	00a8ed3da2	Implement hysteresis for metric status changes to prevent flapping Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets	2025-10-20 18:45:41 +02:00
Christoffer Martinsson	0141a6e111	Remove unused code and eliminate build warnings Removed unused widget subscription system, cache utilities, error variants, theme functions, and struct fields. Replaced subscription-based widgets with direct metric filtering. Build now completes with zero warnings.	2025-10-18 23:50:15 +02:00
Christoffer Martinsson	7f85a6436e	Clean up unused imports and fix build warnings - Remove unused imports (Duration, HashMap, SharedError, DateTime, etc.) - Fix unused variables by prefixing with underscore - Remove redundant dashboard.toml config file - Update theme imports to use only needed components - Maintain all functionality while reducing warnings - Add srv02 to predefined hosts configuration - Remove unused broadcast_command methods	2025-10-18 23:12:07 +02:00
Christoffer Martinsson	125111ee99	Implement comprehensive backup monitoring and fix timestamp issues - Add BackupCollector for reading TOML status files with disk space metrics - Implement BackupWidget with disk usage display and service status details - Fix backup script disk space parsing by adding missing capture_output=True - Update backup widget to show actual disk usage instead of repository size - Fix timestamp parsing to use backup completion time instead of start time - Resolve timezone issues by using UTC timestamps in backup script - Add disk identification metrics (product name, serial number) to backup status - Enhance UI layout with proper backup monitoring integration	2025-10-18 18:33:41 +02:00
Christoffer Martinsson	8a36472a3d	Implement real-time process monitoring and fix UI hardcoded data This commit addresses several key issues identified during development: Major Changes: - Replace hardcoded top CPU/RAM process display with real system data - Add intelligent process monitoring to CpuCollector using ps command - Fix disk metrics permission issues in systemd collector - Optimize service collection to focus on status, memory, and disk only - Update dashboard widgets to display live process information Process Monitoring Implementation: - Added collect_top_cpu_process() and collect_top_ram_process() methods - Implemented ps-based monitoring with accurate CPU percentages - Added filtering to prevent self-monitoring artifacts (ps commands) - Enhanced error handling and validation for process data - Dashboard now shows realistic values like "claude (PID 2974) 11.0%" Service Collection Optimization: - Removed CPU monitoring from systemd collector for efficiency - Enhanced service directory permission error logging - Simplified services widget to show essential metrics only - Fixed service-to-directory mapping accuracy UI and Dashboard Improvements: - Reorganized dashboard layout with btop-inspired multi-panel design - Updated system panel to include real top CPU/RAM process display - Enhanced widget formatting and data presentation - Removed placeholder/hardcoded data throughout the interface Technical Details: - Updated agent/src/collectors/cpu.rs with process monitoring - Modified dashboard/src/ui/mod.rs for real-time process display - Enhanced systemd collector error handling and disk metrics - Updated CLAUDE.md documentation with implementation details	2025-10-16 23:55:05 +02:00
Christoffer Martinsson	57b676ad25	Testing	2025-10-13 00:16:24 +02:00
Christoffer Martinsson	9e344fb66d	Testing	2025-10-12 22:31:46 +02:00
Christoffer Martinsson	4d8bacef50	Testing	2025-10-12 20:35:09 +02:00
Christoffer Martinsson	d9edcda36c	Testing	2025-10-12 20:29:08 +02:00
Christoffer Martinsson	2581435b10	Implement per-service disk usage monitoring Replaced system-wide disk usage with accurate per-service tracking by scanning service-specific directories. Services like sshd now correctly show minimal disk usage instead of misleading system totals. - Rename storage widget and add drive capacity/usage columns - Move host display to main dashboard title for cleaner layout - Replace separate alert displays with color-coded row highlighting - Add per-service disk usage collection using du command - Update services widget formatting to handle small disk values - Restructure into workspace with dedicated agent and dashboard packages	2025-10-11 22:59:16 +02:00

18 Commits