cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	c0f7a97a6f	Remove all scrolling code and user-stopped tracking logic All checks were successful Build and Release / build-and-release (push) Successful in 2m36s Details - Remove scroll offset fields from HostWidgets struct - Replace scrolling with simple "X more below" indicators in all widgets - Remove user-stopped service tracking from agent (now uses SSH control) - Inactive services now consistently show Status::Inactive with empty circles - Simplify widget render methods by removing scroll parameters - Clean up unused imports and legacy scrolling infrastructure - Fix journalctl command to use -fu for proper log following	2025-11-19 08:32:42 +01:00
Christoffer Martinsson	d11aa11f99	Add Status::Inactive for inactive services with empty circle display All checks were successful Build and Release / build-and-release (push) Successful in 1m12s Details - Add new Status::Inactive variant to enum for better service state representation - Agent now assigns Status::Inactive instead of Status::Warning for inactive services - Dashboard displays inactive services with empty circle (○) icon in gray color - User-stopped services still show as Status::Ok with green filled circle - Inactive services treated as OK for host status aggregation - Improves visual clarity between active (●), inactive (○), and warning (◐) states	2025-11-18 17:54:51 +01:00
Christoffer Martinsson	bd20f0cae1	Fix user-stopped flag timing and service transition handling All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Correct user-stopped service behavior during startup transitions: User-Stopped Flag Timing Fix: - Clear user-stopped flag only when service actually becomes active, not when start command succeeds - Remove premature flag clearing from service control handler - Add automatic flag clearing when service status metrics show active state - Services retain user-stopped status during activating/transitioning states Service Transition Handling: - User-stopped services in activating state now report Status::OK instead of Status::Pending - Prevents host warnings during legitimate service startup transitions - Maintains accurate status reporting throughout service lifecycle - Failed service starts preserve user-stopped flags correctly Journalctl Popup Fix: - Fix terminal corruption when using J key for service logs - Correct command quoting to prevent tmux popup interference - Stable popup display without dashboard interface corruption Result: Clean service startup experience with no false warnings and proper user-stopped tracking throughout the entire service lifecycle. Bump version to v0.1.47	2025-10-30 12:05:54 +01:00
Christoffer Martinsson	c56e9d7be2	Implement user-stopped service tracking system All checks were successful Build and Release / build-and-release (push) Successful in 2m34s Details Add comprehensive tracking for services stopped via dashboard to prevent false alerts when users intentionally stop services. Features: - User-stopped services report Status::Ok instead of Warning - Persistent storage survives agent restarts - Dashboard sends UserStart/UserStop commands - Agent tracks and syncs user-stopped state globally - Systemd collector respects user-stopped flags Implementation: - New service_tracker module with persistent JSON storage - Enhanced ServiceAction enum with UserStart/UserStop variants - Global singleton tracker accessible by collectors - Service status logic updated to check user-stopped flag - Dashboard version now uses CARGO_PKG_VERSION automatically Bump version to v0.1.43	2025-10-30 10:42:56 +01:00
Christoffer Martinsson	6509a2b91a	Make nginx site latency thresholds configurable and simplify status logic All checks were successful Build and Release / build-and-release (push) Successful in 4m25s Details - Replace hardcoded 500ms/2000ms thresholds with configurable nginx_latency_critical_ms - Simplify status logic to only OK or Critical (no Warning status) - Add validation for nginx latency threshold configuration - Re-enable nginx site collection with configurable thresholds - Resolves issue where sites showed critical at 2000ms despite 30s timeout setting - Bump version to v0.1.38	2025-10-28 21:24:34 +01:00
Christoffer Martinsson	e890c5e810	Fix service status detection with combined discovery and status approach All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Enhanced service discovery to properly show status for all services: Changes: - Use systemctl list-unit-files for complete service discovery (finds all services) - Use systemctl list-units --all for batch runtime status fetching - Combine both datasets to get comprehensive service list with correct status - Services found in unit-files but not runtime are marked as inactive (Warning status) - Eliminates 'unknown' status issue while maintaining complete service visibility Now inactive services show as Warning (yellow ◐) and active services show as Ok (green ●) instead of all services showing as unknown (? icon).	2025-10-28 15:56:47 +01:00
Christoffer Martinsson	078c30a592	Fix service discovery to show all configured services regardless of state All checks were successful Build and Release / build-and-release (push) Successful in 2m7s Details Changed service discovery from 'systemctl list-units --all' to 'systemctl list-unit-files' to ensure ALL service unit files are discovered, including services that have never been started. Changes: - Updated systemctl command to use list-unit-files instead of list-units --all - Modified parsing logic to handle unit file format (2 fields vs 4 fields) - Set placeholder values in discovery cache, actual runtime status fetched during collection - This ensures all configured services (like inactive ARK servers) appear in dashboard The issue was that list-units --all only shows services systemd has loaded/attempted to load, but list-unit-files shows ALL service unit files regardless of their runtime state.	2025-10-28 15:41:58 +01:00
Christoffer Martinsson	627c533724	Update to v0.1.18 with per-collector intervals and tmux check All checks were successful Build and Release / build-and-release (push) Successful in 2m7s Details - Implement per-collector interval timing respecting NixOS config - Remove all hardcoded timeout/interval values and make configurable - Add tmux session requirement check for TUI mode (bypassed for headless) - Update agent to send config hash in Build field instead of nixos version - Add nginx check interval, HTTP timeouts, and ZMQ transmission interval configs - Update NixOS configuration with new configurable values Breaking changes: - Build field now shows nix store config hash (8 chars) instead of nixos version - All intervals now follow individual collector configuration instead of global New configuration fields: - systemd.nginx_check_interval_seconds - systemd.http_timeout_seconds - systemd.http_connect_timeout_seconds - zmq.transmission_interval_seconds	2025-10-28 10:08:25 +01:00
Christoffer Martinsson	4b54a59e35	Remove unused code and eliminate compiler warnings - Remove unused fields from CommandStatus variants - Clean up unused methods and unused collector fields - Fix lifetime syntax warning in SystemWidget - Delete unused cache module completely - Remove redundant render methods from widgets All agent and dashboard warnings eliminated while preserving panel switching and scrolling functionality.	2025-10-25 14:15:52 +02:00
Christoffer Martinsson	c99e0bd8ee	Remove hardcoded discovery interval in systemd collector - Use config.interval_seconds instead of hardcoded 300 seconds - Discovery now happens every 10 seconds (configurable) instead of 5 minutes - Follows configuration-driven architecture requirements	2025-10-23 13:20:48 +02:00
Christoffer Martinsson	0f12438ab4	Fix RwLock deadlock in systemd collector Phase 4 - Restructure get_monitored_services to avoid nested write locks - Split discover_services into discover_services_internal that returns data - Update state in separate scope to prevent deadlock - Fix borrow checker errors with clone() for status cache	2025-10-23 13:12:53 +02:00
Christoffer Martinsson	7607e971b8	Add debug logging to diagnose Phase 4 service discovery issue Add detailed debug logging to track: - Service discovery start - Individual service parsing - Final service count and list - Empty results indication This will help identify why cmbox disappeared from dashboard.	2025-10-23 12:57:10 +02:00
Christoffer Martinsson	da6f3c3855	Phase 4: Cache service status from discovery to eliminate per-service calls Major performance optimization: - Parse and cache service status during discovery from systemctl list-units - Eliminate per-service systemctl is-active and show calls - Reduce systemctl calls from 1+2N to just 1 call total - For 10 services: 21 calls → 1 call (95% reduction) - Add fallback to systemctl for cache misses This completes the major systemctl call reduction goal from TODO.md.	2025-10-23 12:51:17 +02:00
Christoffer Martinsson	174b27f31a	Phase 3: Add wildcard support for service pattern matching Implement glob pattern matching for service filters: - nginx* matches nginx, nginx-config-reload, etc. - backup matches any service ending with 'backup' - dockerprune matches docker-weekly-prune, etc. - Exact matches still work as before (backward compatible) Addresses TODO.md requirement for '*' filtering support.	2025-10-23 12:37:16 +02:00
Christoffer Martinsson	dc11538ae9	Phase 2b: Optimize to single systemctl command Reduce from 2 systemctl commands to 1 by using only: systemctl list-units --type=service --all This captures all services (active, inactive, failed) in one call, eliminating the redundant list-unit-files command. Achieves the TODO.md goal of reducing systemctl calls.	2025-10-23 12:34:54 +02:00
Christoffer Martinsson	9133e18090	Phase 2: Remove user service collection logic Remove all sudo -u systemctl commands and user service processing. Now only collects system services via systemctl list-units/list-unit-files. Eliminates user service discovery completely as planned in TODO.md.	2025-10-23 12:32:19 +02:00
Christoffer Martinsson	616fad2c5d	Phase 1: Implement exact name filtering for service matching Change service matching logic from contains-based to exact equality. Services now match only if service_name == pattern exactly. This is the first step in the systemd collector optimization plan.	2025-10-23 12:22:26 +02:00
Christoffer Martinsson	08d3454683	Enhance disk collector with individual drive health monitoring - Add StoragePool and DriveInfo structures for grouping drives by mount point - Implement SMART data collection for individual drives (health, temperature, wear) - Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types - Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear - Add storage_type and underlying_devices to filesystem configuration - Move hardcoded service directory mappings to NixOS configuration - Move hardcoded host-to-user mapping to NixOS configuration - Remove all unused code and fix compilation warnings - Clean implementation with zero warnings and no dead code Individual drives now show health status per storage pool: Storage root (ext4): nvme0n1 PASSED 42°C 5% wear Storage steampool (mergerfs): sda/sdb/sdc with individual health data	2025-10-22 19:59:25 +02:00
Christoffer Martinsson	34822bd835	Fix systemd collector to use Status::Pending for transitional states	2025-10-21 19:08:58 +02:00
Christoffer Martinsson	a937032eb1	Remove hardcoded defaults, require configuration file - Remove all Default implementations from agent configuration structs - Make configuration file required for agent startup - Update NixOS module to generate complete agent.toml configuration - Add comprehensive configuration options to NixOS module including: - Service include/exclude patterns for systemd collector - All thresholds and intervals - ZMQ communication settings - Notification and cache configuration - Agent now fails fast if no configuration provided - Eliminates configuration drift between defaults and NixOS settings	2025-10-21 00:01:26 +02:00
Christoffer Martinsson	1e8da8c187	Add user service discovery to systemd collector - Use systemctl --user commands to discover user-level services - Include both user unit files and loaded user units - Gracefully handle cases where user commands fail (no user session) - Treat user services same as system services in filtering - Enables monitoring of user-level Docker, development servers, etc.	2025-10-20 23:11:11 +02:00
Christoffer Martinsson	1cc31ec26a	Update service filters for better discovery - Add ark-permissions to exclusion list (maintenance service) - Add sunshine to service_name_filters (game streaming server) - Improves service discovery for game streaming infrastructure	2025-10-20 23:01:03 +02:00
Christoffer Martinsson	b580cfde8c	Add more services to exclusion list - Add docker-prune (cleanup services don't need monitoring) - Add sshd-unix-local@ and sshd@ (SSH instance services) - Add docker-registry-gar (Google Artifact Registry services) - Keep main sshd service monitored while excluding per-connection instances	2025-10-20 22:51:15 +02:00
Christoffer Martinsson	5886426dac	Fix service discovery to detect all services regardless of state - Use systemctl list-unit-files and list-units --all to find inactive services - Parse both outputs to ensure all services are discovered - Remove special SSH detection logic since sshd is in service filters - Rename interesting_services to service_name_filters for clarity - Now detects services in any state: active, inactive, failed, dead, etc.	2025-10-20 22:41:21 +02:00
Christoffer Martinsson	eb268922bd	Remove all unused code and fix build warnings - Remove unused struct fields: tier, config_name, last_collection_time - Remove unused structs: PerformanceMetrics, PerfMonitor - Remove unused methods: get_performance_metrics, get_collector_names, get_stats - Remove unused utility functions and system helpers - Remove unused config fields from CPU and Memory collectors - Keep config fields that are actually used (DiskCollector, etc.) - Remove unused proxy_pass_url variable and assignments - Fix duplicate hostname variable declaration - Achieve zero build warnings without functionality changes	2025-10-20 20:20:47 +02:00
Christoffer Martinsson	00a8ed3da2	Implement hysteresis for metric status changes to prevent flapping Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets	2025-10-20 18:45:41 +02:00
Christoffer Martinsson	e998679901	Revert nginx monitoring to check all sites via public HTTPS URLs - Remove proxy_pass backend checking - All sites now checked using https://server_name format - Maintains 10-second timeout for external site checks - Simplifies monitoring to consistent external health checks	2025-10-20 15:06:42 +02:00
Christoffer Martinsson	2ccfc4256a	Fix nginx monitoring and services panel alignment - Add support for both proxied and static nginx sites - Proxied sites show 'P' prefix and check backend URLs - Static sites check external HTTPS URLs - Fix services panel column alignment for main services - Keep 10-second timeout for all site checks	2025-10-20 14:56:26 +02:00
Christoffer Martinsson	66a79574e0	Implement comprehensive monitoring improvements - Add full email notifications with lettre and Stockholm timezone - Add status persistence to prevent notification spam on restart - Change nginx monitoring to check backend proxy_pass URLs instead of frontend domains - Increase nginx site timeout to 10 seconds for backend health checks - Fix cache intervals: disk (5min), backup (10min), systemd (30s), cpu/memory (5s) - Remove rate limiting for immediate notifications on all status changes - Store metric status in /var/lib/cm-dashboard/last-status.json	2025-10-20 14:32:44 +02:00
Christoffer Martinsson	47a7d5ae62	Simplify service disk usage detection - remove all estimation fallbacks - Replace complex multi-strategy detection with single deterministic method - Remove estimate_service_disk_usage and all fallback strategies - Use simple get_service_disk_usage method with clear logic: * Defined path exists → use only that path * Defined path fails → return None (shows as '-') * No defined path → use systemctl WorkingDirectory * No estimates or guessing ever Fixes misleading 5MB estimates when defined paths fail due to permissions.	2025-10-20 11:06:49 +02:00
Christoffer Martinsson	fe18ace767	Fix service disk usage detection to use sudo du for permission access ARK service directories require elevated permissions to access. The NixOS configuration already allows sudo du with NOPASSWD, so use sudo du instead of direct du command to properly detect disk usage for restricted directories.	2025-10-20 10:58:17 +02:00
Christoffer Martinsson	a1c980ad31	Implement deterministic service disk usage detection with defined paths - Prioritize defined service directories over systemctl WorkingDirectory fallbacks - Add ARK Survival Ascended server mappings to correct NixOS-configured paths - Remove legacy get_service_disk_usage method to eliminate code duplication - Ensure deterministic behavior with single-purpose detection logic Fixes ARK service disk usage reporting on srv02 by using actual data paths from NixOS configuration instead of systemctl working directory detection.	2025-10-20 10:45:30 +02:00
Christoffer Martinsson	a3c9ac3617	Add ARK server directory mappings for accurate disk usage detection Map each ARK service to its specific data directory: - ark-island -> /var/lib/ark-servers/island - ark-scorched -> /var/lib/ark-servers/scorched - ark-center -> /var/lib/ark-servers/center - ark-aberration -> /var/lib/ark-servers/aberration - ark-extinction -> /var/lib/ark-servers/extinction - ark-ragnarok -> /var/lib/ark-servers/ragnarok - ark-valguero -> /var/lib/ark-servers/valguero Based on NixOS configuration in srv02/configuration.nix.	2025-10-20 10:15:30 +02:00
Christoffer Martinsson	f67779be9d	Add ARK game servers to systemd service monitoring	2025-10-19 19:23:51 +02:00
Christoffer Martinsson	5d52c5b1aa	Fix SMART data and site latency checking issues - Add sudo to disk collector smartctl commands for proper SMART data access - Add reqwest dependency with blocking feature for HTTP site checks - Replace curl-based site latency with reqwest HTTP client implementation - Maintain 2-second connect timeout and 5-second total timeout - Fix disk health UNKNOWN status by enabling proper SMART permissions - Fix nginx site timeout issues by using proper HTTP client with redirect support	2025-10-18 19:14:29 +02:00
Christoffer Martinsson	125111ee99	Implement comprehensive backup monitoring and fix timestamp issues - Add BackupCollector for reading TOML status files with disk space metrics - Implement BackupWidget with disk usage display and service status details - Fix backup script disk space parsing by adding missing capture_output=True - Update backup widget to show actual disk usage instead of repository size - Fix timestamp parsing to use backup completion time instead of start time - Resolve timezone issues by using UTC timestamps in backup script - Add disk identification metrics (product name, serial number) to backup status - Enhance UI layout with proper backup monitoring integration	2025-10-18 18:33:41 +02:00
Christoffer Martinsson	8a36472a3d	Implement real-time process monitoring and fix UI hardcoded data This commit addresses several key issues identified during development: Major Changes: - Replace hardcoded top CPU/RAM process display with real system data - Add intelligent process monitoring to CpuCollector using ps command - Fix disk metrics permission issues in systemd collector - Optimize service collection to focus on status, memory, and disk only - Update dashboard widgets to display live process information Process Monitoring Implementation: - Added collect_top_cpu_process() and collect_top_ram_process() methods - Implemented ps-based monitoring with accurate CPU percentages - Added filtering to prevent self-monitoring artifacts (ps commands) - Enhanced error handling and validation for process data - Dashboard now shows realistic values like "claude (PID 2974) 11.0%" Service Collection Optimization: - Removed CPU monitoring from systemd collector for efficiency - Enhanced service directory permission error logging - Simplified services widget to show essential metrics only - Fixed service-to-directory mapping accuracy UI and Dashboard Improvements: - Reorganized dashboard layout with btop-inspired multi-panel design - Updated system panel to include real top CPU/RAM process display - Enhanced widget formatting and data presentation - Removed placeholder/hardcoded data throughout the interface Technical Details: - Updated agent/src/collectors/cpu.rs with process monitoring - Modified dashboard/src/ui/mod.rs for real-time process display - Enhanced systemd collector error handling and disk metrics - Updated CLAUDE.md documentation with implementation details	2025-10-16 23:55:05 +02:00

37 Commits