cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	bd20f0cae1	Fix user-stopped flag timing and service transition handling All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Correct user-stopped service behavior during startup transitions: User-Stopped Flag Timing Fix: - Clear user-stopped flag only when service actually becomes active, not when start command succeeds - Remove premature flag clearing from service control handler - Add automatic flag clearing when service status metrics show active state - Services retain user-stopped status during activating/transitioning states Service Transition Handling: - User-stopped services in activating state now report Status::OK instead of Status::Pending - Prevents host warnings during legitimate service startup transitions - Maintains accurate status reporting throughout service lifecycle - Failed service starts preserve user-stopped flags correctly Journalctl Popup Fix: - Fix terminal corruption when using J key for service logs - Correct command quoting to prevent tmux popup interference - Stable popup display without dashboard interface corruption Result: Clean service startup experience with no false warnings and proper user-stopped tracking throughout the entire service lifecycle. Bump version to v0.1.47	2025-10-30 12:05:54 +01:00
Christoffer Martinsson	c56e9d7be2	Implement user-stopped service tracking system All checks were successful Build and Release / build-and-release (push) Successful in 2m34s Details Add comprehensive tracking for services stopped via dashboard to prevent false alerts when users intentionally stop services. Features: - User-stopped services report Status::Ok instead of Warning - Persistent storage survives agent restarts - Dashboard sends UserStart/UserStop commands - Agent tracks and syncs user-stopped state globally - Systemd collector respects user-stopped flags Implementation: - New service_tracker module with persistent JSON storage - Enhanced ServiceAction enum with UserStart/UserStop variants - Global singleton tracker accessible by collectors - Service status logic updated to check user-stopped flag - Dashboard version now uses CARGO_PKG_VERSION automatically Bump version to v0.1.43	2025-10-30 10:42:56 +01:00
Christoffer Martinsson	c8f800a1e5	Implement git commit hash tracking for build display All checks were successful Build and Release / build-and-release (push) Successful in 1m24s Details - Add get_git_commit() method to read /var/lib/cm-dashboard/git-commit - Replace NixOS build version with actual git commit hash - Show deployed commit hash as 'Build:' value for accurate tracking - Enable verification of which exact commit is deployed per host - Update version to 0.1.42	2025-10-29 15:29:02 +01:00
Christoffer Martinsson	6509a2b91a	Make nginx site latency thresholds configurable and simplify status logic All checks were successful Build and Release / build-and-release (push) Successful in 4m25s Details - Replace hardcoded 500ms/2000ms thresholds with configurable nginx_latency_critical_ms - Simplify status logic to only OK or Critical (no Warning status) - Add validation for nginx latency threshold configuration - Re-enable nginx site collection with configurable thresholds - Resolves issue where sites showed critical at 2000ms despite 30s timeout setting - Bump version to v0.1.38	2025-10-28 21:24:34 +01:00
Christoffer Martinsson	e890c5e810	Fix service status detection with combined discovery and status approach All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Enhanced service discovery to properly show status for all services: Changes: - Use systemctl list-unit-files for complete service discovery (finds all services) - Use systemctl list-units --all for batch runtime status fetching - Combine both datasets to get comprehensive service list with correct status - Services found in unit-files but not runtime are marked as inactive (Warning status) - Eliminates 'unknown' status issue while maintaining complete service visibility Now inactive services show as Warning (yellow ◐) and active services show as Ok (green ●) instead of all services showing as unknown (? icon).	2025-10-28 15:56:47 +01:00
Christoffer Martinsson	078c30a592	Fix service discovery to show all configured services regardless of state All checks were successful Build and Release / build-and-release (push) Successful in 2m7s Details Changed service discovery from 'systemctl list-units --all' to 'systemctl list-unit-files' to ensure ALL service unit files are discovered, including services that have never been started. Changes: - Updated systemctl command to use list-unit-files instead of list-units --all - Modified parsing logic to handle unit file format (2 fields vs 4 fields) - Set placeholder values in discovery cache, actual runtime status fetched during collection - This ensures all configured services (like inactive ARK servers) appear in dashboard The issue was that list-units --all only shows services systemd has loaded/attempted to load, but list-unit-files shows ALL service unit files regardless of their runtime state.	2025-10-28 15:41:58 +01:00
Christoffer Martinsson	2910b7d875	Update version to 0.1.22 and fix system metric status calculation All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details - Fix /tmp usage status to use proper thresholds instead of hardcoded Ok status - Fix wear level status to use configurable thresholds instead of hardcoded values - Add dedicated tmp_status field to SystemWidget for proper /tmp status display - Remove host-level hourglass icon during service operations - Implement immediate service status updates after start/stop/restart commands - Remove active users display and collection from NixOS section - Fix immediate host status aggregation transmission to dashboard	2025-10-28 13:21:56 +01:00
Christoffer Martinsson	91f037aa3e	Update to v0.1.19 with event-driven status aggregation All checks were successful Build and Release / build-and-release (push) Successful in 2m4s Details Major architectural improvements: CORE CHANGES: - Remove notification_interval_seconds - status aggregation now immediate - Status calculation moved to collection phase instead of transmission - Event-driven transmission triggers immediately on status changes - Dual transmission strategy: immediate on change + periodic backup - Real-time notifications without batching delays TECHNICAL IMPROVEMENTS: - process_metric() now returns bool indicating status change - Immediate ZMQ broadcast when status changes detected - Status aggregation happens during metric collection, not later - Legacy get_nixos_build_info() method removed (unused) - All compilation warnings fixed BEHAVIOR CHANGES: - Critical alerts sent instantly instead of waiting for intervals - Dashboard receives real-time status updates - Notifications triggered immediately on status transitions - Backup periodic transmission every 1s ensures heartbeat This provides much more responsive monitoring with instant alerting while maintaining the reliability of periodic transmission as backup.	2025-10-28 10:36:34 +01:00
Christoffer Martinsson	627c533724	Update to v0.1.18 with per-collector intervals and tmux check All checks were successful Build and Release / build-and-release (push) Successful in 2m7s Details - Implement per-collector interval timing respecting NixOS config - Remove all hardcoded timeout/interval values and make configurable - Add tmux session requirement check for TUI mode (bypassed for headless) - Update agent to send config hash in Build field instead of nixos version - Add nginx check interval, HTTP timeouts, and ZMQ transmission interval configs - Update NixOS configuration with new configurable values Breaking changes: - Build field now shows nix store config hash (8 chars) instead of nixos version - All intervals now follow individual collector configuration instead of global New configuration fields: - systemd.nginx_check_interval_seconds - systemd.http_timeout_seconds - systemd.http_connect_timeout_seconds - zmq.transmission_interval_seconds	2025-10-28 10:08:25 +01:00
Christoffer Martinsson	8dffe18a23	Improve SATA SSD wear level calculation Some checks failed Build and Release / build-and-release (push) Failing after 1m24s Details - Support multiple SATA SSD wear attributes (SSD_Life_Left, Media_Wearout_Indicator, etc.) - Handle manufacturer differences in wear reporting - Proper parsing of SMART table format with VALUE column - Covers Samsung, Intel, Crucial and other common SSD types - NVMe Percentage Used support maintained	2025-10-25 22:32:09 +02:00
Christoffer Martinsson	0c544753f9	Move SMART configuration into disk config - Consolidate SMART thresholds into DiskConfig structure - Remove separate SmartConfig - disk collector handles all drive data - Update NixOS configuration to use disk.temperature_* settings - Remove hardcoded temperature thresholds in disk collector - Logical grouping: disk collector owns all disk/drive configuration	2025-10-25 22:29:26 +02:00
Christoffer Martinsson	c8e26b9bac	Remove redundant smart collector - consolidate SMART into disk collector - Remove separate smart collector implementation - Disk collector already handles SMART data for drives - Eliminates duplicate smartctl calls causing performance issues - SMART functionality remains in logical place with disk monitoring - Fixes infinite smartctl loop issue	2025-10-25 22:25:22 +02:00
Christoffer Martinsson	9160fac80b	Fix smart collector compilation errors - Update to match current Metric structure - Use correct Status enum and collector interface - Fix MetricValue types and constructor usage - Builds successfully with warnings only	2025-10-25 17:13:04 +02:00
Christoffer Martinsson	83cb43bcf1	Restore missing smart collector implementation Some checks failed Build and Release / build-and-release (push) Failing after 1m24s Details - Rewrite smart collector to match current architecture - Add back to mod.rs exports - Fixes infinite smartctl loop issue - Uses simple health and temperature monitoring	2025-10-25 16:59:09 +02:00
Christoffer Martinsson	4b54a59e35	Remove unused code and eliminate compiler warnings - Remove unused fields from CommandStatus variants - Clean up unused methods and unused collector fields - Fix lifetime syntax warning in SystemWidget - Delete unused cache module completely - Remove redundant render methods from widgets All agent and dashboard warnings eliminated while preserving panel switching and scrolling functionality.	2025-10-25 14:15:52 +02:00
Christoffer Martinsson	fb6ee6d7ae	Fix config hash to show actual deployed nix store hash - Replace git commit hash with nix store hash extraction - Read from /run/current-system symlink target - Extract first 8 characters of nix store hash: d8ivwiar - Shows actual deployed configuration, not just source - Enables proper rebuild completion detection - Accurate deployment verification	2025-10-25 12:22:17 +02:00
Christoffer Martinsson	2d3844b5dd	Add configuration hash display to system panel - Collect config hash from cloned nixos-config git repository - Display "Config: xxxxx" after "Build: xxxxx" in NixOS section - Uses /var/lib/cm-dashboard/nixos-config directory - Shows actual configuration hash vs nixpkgs build hash	2025-10-25 01:30:46 +02:00
Christoffer Martinsson	967244064f	Fix command execution permissions and eliminate backup error spam - Add sudo permissions for systemctl and nixos-rebuild commands - Use sudo in agent command execution for proper privileges - Fix backup collector to handle missing status files gracefully - Eliminate backup error spam when no backup system is configured	2025-10-23 23:07:52 +02:00
Christoffer Martinsson	d193b90ba1	Fix device detection to properly parse lsblk output - Handle lsblk tree symbols (├─, └─) in device parsing - Extract base device names from partitions (nvme0n1p2 -> nvme0n1) - Support both NVMe and traditional device naming schemes - Fixes missing device lines in storage display	2025-10-23 19:16:33 +02:00
Christoffer Martinsson	ad298ac70c	Fix device detection, tree indentation, and hide Single storage type - Replace findmnt with lsblk for efficient device name detection - Fix tree indentation to align consistently with status icon text - Hide '(Single)' label for single disk storage pools - Device detection returns actual names (nvme0n1, sda) not UUID paths	2025-10-23 19:06:52 +02:00
Christoffer Martinsson	9f34c67bfa	Fix debug log reference to removed underlying_devices field	2025-10-23 18:56:16 +02:00
Christoffer Martinsson	5134c5320a	Fix disk collector to use dynamic device detection - Remove underlying_devices field from FilesystemConfig - Add device detection at startup using findmnt command - Store detected devices in HashMap for reuse during collection - Keep all existing functionality (StoragePool, DriveInfo, SMART data) - Detect devices only once at initialization, not every collection cycle - Fixes agent startup failure due to missing underlying_devices config	2025-10-23 18:50:40 +02:00
Christoffer Martinsson	c5ec529210	Add agent hash display to system panel Implement agent version tracking to diagnose deployment issues: - Add get_agent_hash() method to extract Nix store hash from executable path - Collect system_agent_hash metric in NixOS collector - Display "Agent Hash" in system panel under NixOS section - Update metric filtering to include agent hash This helps identify which version of the agent is actually running when troubleshooting deployment or metric collection issues.	2025-10-23 17:33:45 +02:00
Christoffer Martinsson	3b1bda741b	Remove codename from NixOS build display - Strip codename part (e.g., '(Warbler)') from nixos-version output - Display clean version format: '25.05.20251004.3bcc93c' - Simplify parsing to use raw nixos-version output as requested	2025-10-23 14:55:18 +02:00
Christoffer Martinsson	64af24dc40	Update NixOS display format to show build hash and timestamp - Change from showing version to build format: 'hash dd/mm/yy H:M:S' - Parse nixos-version output to extract short hash and format date - Update system widget to display 'Build:' instead of 'Version:' - Remove version/build_date fields in favor of single build string - Follow TODO.md specification for NixOS section layout	2025-10-23 14:48:25 +02:00
Christoffer Martinsson	9e80d6b654	Remove hardcoded /tmp autodetection and implement proper tmpfs monitoring - Remove /tmp autodetection from disk collector (57 lines removed) - Add tmpfs monitoring to memory collector with get_tmpfs_metrics() method - Generate memory_tmp_* metrics for proper RAM-based tmpfs monitoring - Fix type annotations in tmpfs parsing for compilation - System widget now correctly displays tmpfs usage in RAM section	2025-10-23 14:26:15 +02:00
Christoffer Martinsson	39fc9cd22f	Implement unified system widget with NixOS info, CPU, RAM, and Storage - Create NixOS collector for version and active users detection - Add SystemWidget combining all system information in TODO.md layout - Replace separate CPU/Memory widgets with unified system display - Add tree structure for storage with drive temperature/wear info - Support NixOS version, active users, load averages, memory usage - Follow exact decimal formatting from specification	2025-10-23 14:01:14 +02:00
Christoffer Martinsson	c99e0bd8ee	Remove hardcoded discovery interval in systemd collector - Use config.interval_seconds instead of hardcoded 300 seconds - Discovery now happens every 10 seconds (configurable) instead of 5 minutes - Follows configuration-driven architecture requirements	2025-10-23 13:20:48 +02:00
Christoffer Martinsson	0f12438ab4	Fix RwLock deadlock in systemd collector Phase 4 - Restructure get_monitored_services to avoid nested write locks - Split discover_services into discover_services_internal that returns data - Update state in separate scope to prevent deadlock - Fix borrow checker errors with clone() for status cache	2025-10-23 13:12:53 +02:00
Christoffer Martinsson	7607e971b8	Add debug logging to diagnose Phase 4 service discovery issue Add detailed debug logging to track: - Service discovery start - Individual service parsing - Final service count and list - Empty results indication This will help identify why cmbox disappeared from dashboard.	2025-10-23 12:57:10 +02:00
Christoffer Martinsson	da6f3c3855	Phase 4: Cache service status from discovery to eliminate per-service calls Major performance optimization: - Parse and cache service status during discovery from systemctl list-units - Eliminate per-service systemctl is-active and show calls - Reduce systemctl calls from 1+2N to just 1 call total - For 10 services: 21 calls → 1 call (95% reduction) - Add fallback to systemctl for cache misses This completes the major systemctl call reduction goal from TODO.md.	2025-10-23 12:51:17 +02:00
Christoffer Martinsson	174b27f31a	Phase 3: Add wildcard support for service pattern matching Implement glob pattern matching for service filters: - nginx* matches nginx, nginx-config-reload, etc. - backup matches any service ending with 'backup' - dockerprune matches docker-weekly-prune, etc. - Exact matches still work as before (backward compatible) Addresses TODO.md requirement for '*' filtering support.	2025-10-23 12:37:16 +02:00
Christoffer Martinsson	dc11538ae9	Phase 2b: Optimize to single systemctl command Reduce from 2 systemctl commands to 1 by using only: systemctl list-units --type=service --all This captures all services (active, inactive, failed) in one call, eliminating the redundant list-unit-files command. Achieves the TODO.md goal of reducing systemctl calls.	2025-10-23 12:34:54 +02:00
Christoffer Martinsson	9133e18090	Phase 2: Remove user service collection logic Remove all sudo -u systemctl commands and user service processing. Now only collects system services via systemctl list-units/list-unit-files. Eliminates user service discovery completely as planned in TODO.md.	2025-10-23 12:32:19 +02:00
Christoffer Martinsson	616fad2c5d	Phase 1: Implement exact name filtering for service matching Change service matching logic from contains-based to exact equality. Services now match only if service_name == pattern exactly. This is the first step in the systemd collector optimization plan.	2025-10-23 12:22:26 +02:00
Christoffer Martinsson	08d3454683	Enhance disk collector with individual drive health monitoring - Add StoragePool and DriveInfo structures for grouping drives by mount point - Implement SMART data collection for individual drives (health, temperature, wear) - Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types - Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear - Add storage_type and underlying_devices to filesystem configuration - Move hardcoded service directory mappings to NixOS configuration - Move hardcoded host-to-user mapping to NixOS configuration - Remove all unused code and fix compilation warnings - Clean implementation with zero warnings and no dead code Individual drives now show health status per storage pool: Storage root (ext4): nvme0n1 PASSED 42°C 5% wear Storage steampool (mergerfs): sda/sdb/sdc with individual health data	2025-10-22 19:59:25 +02:00
Christoffer Martinsson	34822bd835	Fix systemd collector to use Status::Pending for transitional states	2025-10-21 19:08:58 +02:00
Christoffer Martinsson	41208aa2a0	Implement status aggregation with notification batching	2025-10-21 18:12:42 +02:00
Christoffer Martinsson	a937032eb1	Remove hardcoded defaults, require configuration file - Remove all Default implementations from agent configuration structs - Make configuration file required for agent startup - Update NixOS module to generate complete agent.toml configuration - Add comprehensive configuration options to NixOS module including: - Service include/exclude patterns for systemd collector - All thresholds and intervals - ZMQ communication settings - Notification and cache configuration - Agent now fails fast if no configuration provided - Eliminates configuration drift between defaults and NixOS settings	2025-10-21 00:01:26 +02:00
Christoffer Martinsson	1e8da8c187	Add user service discovery to systemd collector - Use systemctl --user commands to discover user-level services - Include both user unit files and loaded user units - Gracefully handle cases where user commands fail (no user session) - Treat user services same as system services in filtering - Enables monitoring of user-level Docker, development servers, etc.	2025-10-20 23:11:11 +02:00
Christoffer Martinsson	1cc31ec26a	Update service filters for better discovery - Add ark-permissions to exclusion list (maintenance service) - Add sunshine to service_name_filters (game streaming server) - Improves service discovery for game streaming infrastructure	2025-10-20 23:01:03 +02:00
Christoffer Martinsson	b580cfde8c	Add more services to exclusion list - Add docker-prune (cleanup services don't need monitoring) - Add sshd-unix-local@ and sshd@ (SSH instance services) - Add docker-registry-gar (Google Artifact Registry services) - Keep main sshd service monitored while excluding per-connection instances	2025-10-20 22:51:15 +02:00
Christoffer Martinsson	5886426dac	Fix service discovery to detect all services regardless of state - Use systemctl list-unit-files and list-units --all to find inactive services - Parse both outputs to ensure all services are discovered - Remove special SSH detection logic since sshd is in service filters - Rename interesting_services to service_name_filters for clarity - Now detects services in any state: active, inactive, failed, dead, etc.	2025-10-20 22:41:21 +02:00
Christoffer Martinsson	eb268922bd	Remove all unused code and fix build warnings - Remove unused struct fields: tier, config_name, last_collection_time - Remove unused structs: PerformanceMetrics, PerfMonitor - Remove unused methods: get_performance_metrics, get_collector_names, get_stats - Remove unused utility functions and system helpers - Remove unused config fields from CPU and Memory collectors - Keep config fields that are actually used (DiskCollector, etc.) - Remove unused proxy_pass_url variable and assignments - Fix duplicate hostname variable declaration - Achieve zero build warnings without functionality changes	2025-10-20 20:20:47 +02:00
Christoffer Martinsson	00a8ed3da2	Implement hysteresis for metric status changes to prevent flapping Add comprehensive hysteresis support to prevent status oscillation near threshold boundaries while maintaining responsive alerting. Key Features: - HysteresisThresholds with configurable upper/lower limits - StatusTracker for per-metric status history - Default gaps: CPU load 10%, memory 5%, disk temp 5°C Updated Components: - CPU load collector (5-minute average with hysteresis) - Memory usage collector (percentage-based thresholds) - Disk temperature collector (SMART data monitoring) - All collectors updated to support StatusTracker interface Cache Interval Adjustments: - Service status: 60s → 10s (faster response) - Disk usage: 300s → 60s (more frequent checks) - Backup status: 900s → 60s (quicker updates) - SMART data: moved to 600s tier (10 minutes) Architecture: - Individual metric status calculation in collectors - Centralized StatusTracker in MetricCollectionManager - Status aggregation preserved in dashboard widgets	2025-10-20 18:45:41 +02:00
Christoffer Martinsson	e998679901	Revert nginx monitoring to check all sites via public HTTPS URLs - Remove proxy_pass backend checking - All sites now checked using https://server_name format - Maintains 10-second timeout for external site checks - Simplifies monitoring to consistent external health checks	2025-10-20 15:06:42 +02:00
Christoffer Martinsson	2ccfc4256a	Fix nginx monitoring and services panel alignment - Add support for both proxied and static nginx sites - Proxied sites show 'P' prefix and check backend URLs - Static sites check external HTTPS URLs - Fix services panel column alignment for main services - Keep 10-second timeout for all site checks	2025-10-20 14:56:26 +02:00
Christoffer Martinsson	66a79574e0	Implement comprehensive monitoring improvements - Add full email notifications with lettre and Stockholm timezone - Add status persistence to prevent notification spam on restart - Change nginx monitoring to check backend proxy_pass URLs instead of frontend domains - Increase nginx site timeout to 10 seconds for backend health checks - Fix cache intervals: disk (5min), backup (10min), systemd (30s), cpu/memory (5s) - Remove rate limiting for immediate notifications on all status changes - Store metric status in /var/lib/cm-dashboard/last-status.json	2025-10-20 14:32:44 +02:00
Christoffer Martinsson	28896d0b1b	Fix CPU load alerting to only trigger on 5-minute load average Only the 5-minute load average should trigger warning/critical alerts. 1-minute and 15-minute load averages now always show Status::Ok. Thresholds (Warning: 9.0, Critical: 10.0) apply only to cpu_load_5min metric.	2025-10-20 11:12:15 +02:00
Christoffer Martinsson	47a7d5ae62	Simplify service disk usage detection - remove all estimation fallbacks - Replace complex multi-strategy detection with single deterministic method - Remove estimate_service_disk_usage and all fallback strategies - Use simple get_service_disk_usage method with clear logic: * Defined path exists → use only that path * Defined path fails → return None (shows as '-') * No defined path → use systemctl WorkingDirectory * No estimates or guessing ever Fixes misleading 5MB estimates when defined paths fail due to permissions.	2025-10-20 11:06:49 +02:00

1 2 3

132 Commits