cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	1591565b1b	Update storage widget for enhanced disk collector metrics Restructure storage display to handle new individual metrics architecture: - Parse disk_{pool}_* metrics instead of indexed disk_{index}_* format - Support individual drive metrics disk_{pool}_{drive}_health/temperature/wear - Display tree structure: "Storage {pool} ({type}): drive details" - Show pool usage summary with individual drive health/temp/wear status - Auto-discover storage pools and drives from metric patterns - Maintain proper status aggregation from individual metrics The dashboard now correctly displays the new enhanced disk collector output with storage pools containing multiple drives and their individual metrics.	2025-10-22 20:40:24 +02:00
Christoffer Martinsson	08d3454683	Enhance disk collector with individual drive health monitoring - Add StoragePool and DriveInfo structures for grouping drives by mount point - Implement SMART data collection for individual drives (health, temperature, wear) - Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types - Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear - Add storage_type and underlying_devices to filesystem configuration - Move hardcoded service directory mappings to NixOS configuration - Move hardcoded host-to-user mapping to NixOS configuration - Remove all unused code and fix compilation warnings - Clean implementation with zero warnings and no dead code Individual drives now show health status per storage pool: Storage root (ext4): nvme0n1 PASSED 42°C 5% wear Storage steampool (mergerfs): sda/sdb/sdc with individual health data	2025-10-22 19:59:25 +02:00
Christoffer Martinsson	a6c2983f65	Add automatic config file detection for dashboard TUI - Dashboard now automatically looks for /etc/cm-dashboard/dashboard.toml - No need to specify --config flag when using standard NixOS deployment - Fallback to manual config path if default not found - Update help text to reflect optional config parameter - Simplifies dashboard usage - just run 'cm-dashboard' without arguments	2025-10-21 22:11:35 +02:00
Christoffer Martinsson	3d2b37b26c	Remove hardcoded defaults and migrate dashboard config to NixOS - Remove all unused configuration options from dashboard config module - Eliminate hardcoded defaults - dashboard now requires config file like agent - Keep only actually used config: zmq.subscriber_ports and hosts.predefined_hosts - Remove unused get_host_metrics function from metric store - Clean up missing module imports (hosts, utils) - Make dashboard fail fast if no configuration provided - Align dashboard config approach with agent configuration pattern	2025-10-21 21:54:23 +02:00
Christoffer Martinsson	a6d2a2f086	Code cleanup	2025-10-21 21:19:21 +02:00
Christoffer Martinsson	1315ba1315	Updated readme	2025-10-21 20:47:30 +02:00
Christoffer Martinsson	0417e2c1f1	Update README with actual dashboard interface and implementation details	2025-10-21 20:36:03 +02:00
Christoffer Martinsson	a08670071c	Implement simple persistent cache with automatic saving on status changes	2025-10-21 20:12:19 +02:00
Christoffer Martinsson	338c4457a5	Remove legacy notification code and fix all warnings	2025-10-21 19:48:55 +02:00
Christoffer Martinsson	f4b5bb814d	Fix dashboard UI: correct pending color (blue) and use host_status_summary metric	2025-10-21 19:32:37 +02:00
Christoffer Martinsson	7ead8ee98a	Improve notification email format with detailed service groupings	2025-10-21 19:25:43 +02:00
Christoffer Martinsson	34822bd835	Fix systemd collector to use Status::Pending for transitional states	2025-10-21 19:08:58 +02:00
Christoffer Martinsson	98afb19945	Remove unused ProcessConfig from collector configuration	2025-10-21 18:51:31 +02:00
Christoffer Martinsson	d80f2ce811	Remove unused cache tiers system	2025-10-21 18:43:46 +02:00
Christoffer Martinsson	89afd9143f	Disable broken tests after API changes	2025-10-21 18:33:35 +02:00
Christoffer Martinsson	98e3ecb0ea	Clean up warnings and add Status::Pending support to dashboard UI	2025-10-21 18:27:11 +02:00
Christoffer Martinsson	41208aa2a0	Implement status aggregation with notification batching	2025-10-21 18:12:42 +02:00
Christoffer Martinsson	a937032eb1	Remove hardcoded defaults, require configuration file - Remove all Default implementations from agent configuration structs - Make configuration file required for agent startup - Update NixOS module to generate complete agent.toml configuration - Add comprehensive configuration options to NixOS module including: - Service include/exclude patterns for systemd collector - All thresholds and intervals - ZMQ communication settings - Notification and cache configuration - Agent now fails fast if no configuration provided - Eliminates configuration drift between defaults and NixOS settings	2025-10-21 00:01:26 +02:00
Christoffer Martinsson	1e8da8c187	Add user service discovery to systemd collector - Use systemctl --user commands to discover user-level services - Include both user unit files and loaded user units - Gracefully handle cases where user commands fail (no user session) - Treat user services same as system services in filtering - Enables monitoring of user-level Docker, development servers, etc.	2025-10-20 23:11:11 +02:00
Christoffer Martinsson	1cc31ec26a	Update service filters for better discovery - Add ark-permissions to exclusion list (maintenance service) - Add sunshine to service_name_filters (game streaming server) - Improves service discovery for game streaming infrastructure	2025-10-20 23:01:03 +02:00
Christoffer Martinsson	b580cfde8c	Add more services to exclusion list - Add docker-prune (cleanup services don't need monitoring) - Add sshd-unix-local@ and sshd@ (SSH instance services) - Add docker-registry-gar (Google Artifact Registry services) - Keep main sshd service monitored while excluding per-connection instances	2025-10-20 22:51:15 +02:00
Christoffer Martinsson	5886426dac	Fix service discovery to detect all services regardless of state - Use systemctl list-unit-files and list-units --all to find inactive services - Parse both outputs to ensure all services are discovered - Remove special SSH detection logic since sshd is in service filters - Rename interesting_services to service_name_filters for clarity - Now detects services in any state: active, inactive, failed, dead, etc.	2025-10-20 22:41:21 +02:00
Christoffer Martinsson	eb268922bd	Remove all unused code and fix build warnings - Remove unused struct fields: tier, config_name, last_collection_time - Remove unused structs: PerformanceMetrics, PerfMonitor - Remove unused methods: get_performance_metrics, get_collector_names, get_stats - Remove unused utility functions and system helpers - Remove unused config fields from CPU and Memory collectors - Keep config fields that are actually used (DiskCollector, etc.) - Remove unused proxy_pass_url variable and assignments - Fix duplicate hostname variable declaration - Achieve zero build warnings without functionality changes	2025-10-20 20:20:47 +02:00
Christoffer Martinsson	049ac53629	Simplify service recovery notification logic - Remove bloated last_meaningful_status tracking - Treat any Unknown→Ok transition as recovery - Reduce JSON persistence to only metric_statuses and metric_details - Eliminate unnecessary status history complexity	2025-10-20 19:31:13 +02:00
Christoffer Martinsson	00a8ed3da2	Implement hysteresis for metric status changes to prevent flapping Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets	2025-10-20 18:45:41 +02:00
Christoffer Martinsson	e998679901	Revert nginx monitoring to check all sites via public HTTPS URLs - Remove proxy_pass backend checking - All sites now checked using https://server_name format - Maintains 10-second timeout for external site checks - Simplifies monitoring to consistent external health checks	2025-10-20 15:06:42 +02:00
Christoffer Martinsson	2ccfc4256a	Fix nginx monitoring and services panel alignment - Add support for both proxied and static nginx sites - Proxied sites show 'P' prefix and check backend URLs - Static sites check external HTTPS URLs - Fix services panel column alignment for main services - Keep 10-second timeout for all site checks	2025-10-20 14:56:26 +02:00
Christoffer Martinsson	11be496a26	Update Cargo.lock with chrono-tz dependency for NixOS build	2025-10-20 14:36:17 +02:00
Christoffer Martinsson	66a79574e0	Implement comprehensive monitoring improvements - Add full email notifications with lettre and Stockholm timezone - Add status persistence to prevent notification spam on restart - Change nginx monitoring to check backend proxy_pass URLs instead of frontend domains - Increase nginx site timeout to 10 seconds for backend health checks - Fix cache intervals: disk (5min), backup (10min), systemd (30s), cpu/memory (5s) - Remove rate limiting for immediate notifications on all status changes - Store metric status in /var/lib/cm-dashboard/last-status.json	2025-10-20 14:32:44 +02:00
Christoffer Martinsson	ecaf3aedb5	Add space between archive count and 'archives' in backup panel	2025-10-20 13:24:23 +02:00
Christoffer Martinsson	959745b51b	Fix host navigation to work with alphabetical host ordering - Fix host_index calculation for localhost to use actual position in sorted list - Remove incorrect assumption that localhost is always at index 0 - Host navigation (Tab key) now works correctly with all hosts in alphabetical order Fixes issue where only 3 of 5 hosts were accessible via Tab navigation.	2025-10-20 13:12:39 +02:00
Christoffer Martinsson	d349e2742d	Fix dashboard title host ordering to use alphabetical sort - Remove predefined host order that was causing random display order - Sort hosts alphabetically for consistent title display - Localhost is still auto-selected at startup but doesn't affect display order - Title will now show: cmbox ● labbox ● simonbox ● srv01 ● srv02 ● steambox Eliminates confusing random host order in dashboard title bar.	2025-10-20 13:07:10 +02:00
Christoffer Martinsson	d4531ef2e8	Hide backup panel when no backup data is present - Add has_data() method to BackupWidget to check if backup metrics exist - Modify dashboard layout to conditionally show backup panel only when data exists - When no backup data: system panel takes full left side height - When backup data exists: system and backup panels share left side equally Prevents empty backup panel from taking up screen space unnecessarily.	2025-10-20 13:01:42 +02:00
Christoffer Martinsson	8023da2c1e	Fix dashboard disk widget flickering by sorting disks consistently - Sort physical devices by name to prevent random HashMap iteration order - Sort partitions within each device by disk index for consistency - Eliminates flickering caused by disks changing positions randomly The dashboard storage section now maintains stable disk order across updates.	2025-10-20 11:25:45 +02:00
Christoffer Martinsson	28896d0b1b	Fix CPU load alerting to only trigger on 5-minute load average Only the 5-minute load average should trigger warning/critical alerts. 1-minute and 15-minute load averages now always show Status::Ok. Thresholds (Warning: 9.0, Critical: 10.0) apply only to cpu_load_5min metric.	2025-10-20 11:12:15 +02:00
Christoffer Martinsson	47a7d5ae62	Simplify service disk usage detection - remove all estimation fallbacks - Replace complex multi-strategy detection with single deterministic method - Remove estimate_service_disk_usage and all fallback strategies - Use simple get_service_disk_usage method with clear logic: * Defined path exists → use only that path * Defined path fails → return None (shows as '-') * No defined path → use systemctl WorkingDirectory * No estimates or guessing ever Fixes misleading 5MB estimates when defined paths fail due to permissions.	2025-10-20 11:06:49 +02:00
Christoffer Martinsson	fe18ace767	Fix service disk usage detection to use sudo du for permission access ARK service directories require elevated permissions to access. The NixOS configuration already allows sudo du with NOPASSWD, so use sudo du instead of direct du command to properly detect disk usage for restricted directories.	2025-10-20 10:58:17 +02:00
Christoffer Martinsson	a1c980ad31	Implement deterministic service disk usage detection with defined paths - Prioritize defined service directories over systemctl WorkingDirectory fallbacks - Add ARK Survival Ascended server mappings to correct NixOS-configured paths - Remove legacy get_service_disk_usage method to eliminate code duplication - Ensure deterministic behavior with single-purpose detection logic Fixes ARK service disk usage reporting on srv02 by using actual data paths from NixOS configuration instead of systemctl working directory detection.	2025-10-20 10:45:30 +02:00
Christoffer Martinsson	a3c9ac3617	Add ARK server directory mappings for accurate disk usage detection Map each ARK service to its specific data directory: - ark-island -> /var/lib/ark-servers/island - ark-scorched -> /var/lib/ark-servers/scorched - ark-center -> /var/lib/ark-servers/center - ark-aberration -> /var/lib/ark-servers/aberration - ark-extinction -> /var/lib/ark-servers/extinction - ark-ragnarok -> /var/lib/ark-servers/ragnarok - ark-valguero -> /var/lib/ark-servers/valguero Based on NixOS configuration in srv02/configuration.nix.	2025-10-20 10:15:30 +02:00
Christoffer Martinsson	dfe9c11102	Fix disk metric naming to maintain dashboard compatibility Keep numbered metric names (disk_0_, disk_1_) instead of named metrics (disk_root_, disk_boot_) to ensure existing dashboard continues working. UUID-based detection works internally but produces compatible metric names.	2025-10-20 10:07:34 +02:00
Christoffer Martinsson	e7200fb1b0	Implement UUID-based disk detection for CMTEC infrastructure Replace df-based auto-discovery with UUID-based detection using NixOS hardware configuration data. Each host now has predefined filesystem configurations with predictable metric names. - Add FilesystemConfig struct with UUID, mount point, and filesystem type - Remove auto_discover and devices fields from DiskConfig - Add host-specific UUID defaults for cmbox, srv01, srv02, simonbox, steambox - Remove legacy get_mounted_disks() df-based detection method - Update DiskCollector to use UUID resolution via /dev/disk/by-uuid/ - Generate predictable metric names: disk_root_, disk_boot_, etc. - Maintain fallback for labbox/wslbox (no UUIDs configured yet) Provides consistent metric names across reboots and reliable detection aligned with NixOS deployments without dependency on mount order.	2025-10-20 09:50:10 +02:00
Christoffer Martinsson	f67779be9d	Add ARK game servers to systemd service monitoring	2025-10-19 19:23:51 +02:00
Christoffer Martinsson	ca160c9627	Fix tab navigation to respect user choice and prevent jumping back to localhost - Add user_navigated_away flag to track manual navigation - Only auto-switch to localhost if user hasn't manually navigated away - Reset flag when host disconnects to allow auto-selection - Preserves user's tab navigation choices while still prioritizing localhost initially	2025-10-19 11:21:59 +02:00
Christoffer Martinsson	bf2f066029	Fix localhost prioritization to always switch when localhost connects - Dashboard now switches to localhost even if another host is already selected - Ensures localhost is always preferred regardless of connection order - Resolves issue where srv01 connecting first would prevent localhost selection	2025-10-19 11:12:05 +02:00
Christoffer Martinsson	07633e4e0e	Implement localhost prioritization and status display in dashboard - Always select localhost as default host at startup - Order hosts with localhost first, then predefined sequence - Display hostname status colors in title bar based on metric aggregation - Add gethostname dependency for localhost detection	2025-10-19 10:56:42 +02:00
Christoffer Martinsson	0141a6e111	Remove unused code and eliminate build warnings Removed unused widget subscription system, cache utilities, error variants, theme functions, and struct fields. Replaced subscription-based widgets with direct metric filtering. Build now completes with zero warnings.	2025-10-18 23:50:15 +02:00
Christoffer Martinsson	7f85a6436e	Clean up unused imports and fix build warnings - Remove unused imports (Duration, HashMap, SharedError, DateTime, etc.) - Fix unused variables by prefixing with underscore - Remove redundant dashboard.toml config file - Update theme imports to use only needed components - Maintain all functionality while reducing warnings - Add srv02 to predefined hosts configuration - Remove unused broadcast_command methods	2025-10-18 23:12:07 +02:00
Christoffer Martinsson	f0eec38655	Fix SMART data collection and clean up configuration - Restore sudo smartctl commands for proper SMART data collection - Add srv02 to host configuration for dashboard discovery - Remove redundant hosts.toml file, consolidate into dashboard.toml - Clean up base_url fields that were unused in ZMQ architecture The SMART data collection now works properly with systemd service by using sudo permissions configured in NixOS. Dashboard can now discover and connect to srv02 alongside existing hosts.	2025-10-18 22:22:02 +02:00
Christoffer Martinsson	8cf8d37556	Add srv02 to predefined host list	2025-10-18 20:43:25 +02:00
Christoffer Martinsson	792ad066c9	Fix per-host widget cache to prevent overwriting cached data Only update widgets when metrics are available for the current host, preventing immediate overwrite of cached widget states when switching hosts.	2025-10-18 20:20:58 +02:00

1 2 3 4

158 Commits