cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	c62c7fa698	Remove debug logging from disk collector All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Removed all debug! statements from disk collector to reduce log noise. Bump version to v0.1.226 v0.1.226	2025-11-30 00:44:38 +01:00
Christoffer Martinsson	0b1d8c0a73	Fix Data_3 showing as unknown by handling smartctl warning exit codes All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Root cause: sda's temperature exceeded threshold in the past, causing smartctl to return exit code 32 (warning: "Attributes have been <= threshold in the past"). The agent checked output.status.success() and rejected the entire output as failed, even though the data (serial, temperature, health) was perfectly valid. Smartctl exit codes are bit flags for informational warnings: - Exit 0: No warnings - Exit 32 (bit 5): Attributes were at/below threshold in past - Exit 64 (bit 6): Error log has entries - etc. The output data is valid regardless of these warning flags. Solution: Parse output as long as it's not empty, ignore exit code. Only return UNKNOWN if output is actually empty (command truly failed). Result: Data_3 will now show "ZDZ4VE0B T: 31°C" instead of "? Data_3: sda" Bump version to v0.1.225 v0.1.225	2025-11-30 00:35:19 +01:00
Christoffer Martinsson	c77aa6eaaa	Fix Data_3 timeout by removing sequential SMART during pool detection All checks were successful Build and Release / build-and-release (push) Successful in 1m34s Details Root cause: SMART data was collected TWICE: 1. Sequential collection during pool detection in get_drive_info_for_path() using problematic tokio::task::block_in_place() nesting 2. Parallel collection in get_smart_data_for_drives() (v0.1.223) The sequential collection happened FIRST during pool detection, causing sda (Data_3) to timeout due to: - Bad async nesting: block_in_place() wrapping block_on() - Sequential execution causing runtime issues - sda being third in sequence, runtime degraded by then Solution: Remove SMART collection from get_drive_info_for_path(). Pool drive temperatures are populated later from the parallel SMART collection which properly uses futures::join_all. Benefits: - Eliminates problematic async nesting - All SMART queries happen once in parallel only - sda/Data_3 should now show serial (ZDZ4VE0B) and temperature Bump version to v0.1.224 v0.1.224	2025-11-30 00:14:25 +01:00
Christoffer Martinsson	8a0e68f0e3	Fix Data_3 timeout by parallelizing SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Root cause: SMART data was collected sequentially, one drive at a time. With 5 drives taking ~500ms each, total collection time was 2.5+ seconds. When disk collector runs every 1 second, this caused overlapping collections creating resource contention. The last drive (sda/Data_3) would timeout due to the drive being accessed by the previous collection. Solution: Query all drives in parallel using futures::join_all. Now all drives get their SMART data collected simultaneously with independent 3-second timeouts, eliminating contention and reducing total collection time from 2.5+ seconds to ~500ms (the slowest single drive). Benefits: - All drives complete in ~500ms instead of 2.5+ seconds - No overlapping collections causing resource contention - Each drive gets full 3-second timeout window - sda/Data_3 should now show temperature and serial number Bump version to v0.1.223 v0.1.223	2025-11-29 23:51:43 +01:00
Christoffer Martinsson	2d653fe9ae	Fix empty Storage section by configuring stdio pipes All checks were successful Build and Release / build-and-release (push) Successful in 1m15s Details Root cause: run_command_with_timeout() was calling cmd.spawn() without configuring stdout/stderr pipes. This caused command output to go to journald instead of being captured by wait_with_output(). The disk collector received empty output and failed silently. Solution: Configure stdout(Stdio::piped()) and stderr(Stdio::piped()) before spawning commands. This ensures wait_with_output() can properly capture command output. Fixes: Empty Storage section, lsblk output appearing in journald Bump version to v0.1.222 v0.1.222	2025-11-29 23:25:17 +01:00
Christoffer Martinsson	caba78004e	Fix empty Storage section by properly aliasing command types All checks were successful Build and Release / build-and-release (push) Successful in 2m6s Details v0.1.220 broke disk collector by changing the import from std::process::Command to tokio::process::Command, but lines 193 and 767 explicitly used std::process::Command::new() which silently failed. Solution: Import both as aliases (TokioCommand/StdCommand) and use appropriate type for each operation - async commands use TokioCommand with run_command_with_timeout, sync commands use StdCommand with system timeout wrapper. Fixes: Empty Storage section after v0.1.220 deployment Bump version to v0.1.221 v0.1.221	2025-11-29 21:29:33 +01:00
Christoffer Martinsson	77bf08a978	Fix blocking smartctl commands with proper async/timeout handling All checks were successful Build and Release / build-and-release (push) Successful in 2m2s Details - Changed disk collector to use tokio::process::Command instead of std::process::Command - Updated run_command_with_timeout to properly kill processes on timeout - Fixes issue where smartctl hangs on problematic drives (/dev/sda) freezing entire agent - Timeout now force-kills hung processes using kill -9, preventing orphaned smartctl processes This resolves the issue where Data_3 showed unknown status because smartctl was hanging indefinitely trying to read from a problematic drive, blocking the entire collector. Bump version to v0.1.220 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.220	2025-11-29 21:09:04 +01:00
Christoffer Martinsson	929870f8b6	Bump version to v0.1.219 All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details v0.1.219	2025-11-29 18:35:14 +01:00
Christoffer Martinsson	7aae852b7b	Bump version to v0.1.218 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.218	2025-11-29 17:59:33 +01:00
Christoffer Martinsson	40f3ff66d8	Show archive count range to detect inconsistencies - Display single number if all services have same count - Display min-max range if counts differ (indicates problem)	2025-11-29 17:59:24 +01:00
Christoffer Martinsson	1c1beddb55	Bump version to v0.1.217 All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details v0.1.217	2025-11-29 17:51:13 +01:00
Christoffer Martinsson	620d1f10b6	Show archive count per service instead of total sum	2025-11-29 17:51:01 +01:00
Christoffer Martinsson	a0d571a40e	Bump version to v0.1.216 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.216	2025-11-29 17:44:12 +01:00
Christoffer Martinsson	977200fff3	Move archive count to Usage line in backup display	2025-11-29 17:44:05 +01:00
Christoffer Martinsson	d692de5f83	Bump version to v0.1.215 All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details v0.1.215	2025-11-29 17:41:49 +01:00
Christoffer Martinsson	f5913dbd43	Add archive count to backup disk display	2025-11-29 17:41:11 +01:00
Christoffer Martinsson	faa30a7839	Sort backup repositories and disks for stable display All checks were successful Build and Release / build-and-release (push) Successful in 1m21s Details - Sort repositories alphabetically before rendering - Sort backup disks by serial number - Prevents display jumping between different orderings on updates - Consistent display order across refreshes Bump version to v0.1.214 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.214	2025-11-29 17:15:17 +01:00
Christoffer Martinsson	6e4a42799f	Bump version to v0.1.213 All checks were successful Build and Release / build-and-release (push) Successful in 1m22s Details v0.1.213	2025-11-29 16:46:16 +01:00
Christoffer Martinsson	afb8d68e03	Implement multi-disk backup support - Update BackupData structure to support multiple backup disks - Scan /var/lib/backup/status/ directory for all status files - Calculate status icons for backup and disk usage - Aggregate repository status from all disks - Update dashboard to display all backup disks with per-disk status - Display repository list with count and aggregated status	2025-11-29 16:44:50 +01:00
Christoffer Martinsson	5e08b34280	Move C-state name cleaning to agent for smaller JSON All checks were successful Build and Release / build-and-release (push) Successful in 1m32s Details - Agent now extracts "C" + digits pattern (C3, C10) using char parsing - Removes suffixes like "_ACPI", "_MWAIT" at source - Reduces JSON payload size over ZMQ - No regex dependency - uses fast char iteration (~1μs overhead) - Robust fallback to original name if pattern not found - Dashboard simplified to use clean names directly Bump version to v0.1.212 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.212	2025-11-29 14:05:55 +01:00
Christoffer Martinsson	0d8284b69c	Clean C-state display to show only CX format All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details - Strip suffixes like "_ACPI" from C-state names - Display changes from "C3_ACPI:51%" to "C3:51%" - Cleaner, more concise presentation Bump version to v0.1.211 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.211	2025-11-29 13:34:01 +01:00
Christoffer Martinsson	d84690cb3b	Move transmission interval to ZMQ config section All checks were successful Build and Release / build-and-release (push) Successful in 1m43s Details - Changed code to use zmq.transmission_interval_seconds instead of top-level collection_interval_seconds - Removed collection_interval_seconds from AgentConfig - Updated validation to check zmq.transmission_interval_seconds - Improves config organization by grouping all ZMQ settings together Bump version to v0.1.210 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.210	2025-11-29 13:31:39 +01:00
Christoffer Martinsson	7c030b33d6	Show top 3 C-states with usage percentages All checks were successful Build and Release / build-and-release (push) Successful in 1m21s Details - Changed CpuData.cstate from String to Vec<CStateInfo> - Added CStateInfo struct with name and percent fields - Collector calculates percentage for each C-state based on accumulated time - Sorts and returns top 3 C-states by usage - Dashboard displays: "C10:79% C8:10% C6:8%" Provides better visibility into CPU idle state distribution. Bump version to v0.1.209 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.209	2025-11-28 23:45:46 +01:00
Christoffer Martinsson	c6817537a8	Replace CPU frequency with C-state monitoring All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Changed CpuData.frequency_mhz to CpuData.cstate (String) - Implemented collect_cstate() to read CPU idle depth from sysfs - Finds deepest C-state with most accumulated time (C0-C10) - Updated dashboard to display C-state instead of frequency - More accurate indicator of CPU activity vs power management Bump version to v0.1.208 Co-Authored-By: Claude <noreply@anthropic.com> v0.1.208	2025-11-28 23:30:14 +01:00
Christoffer Martinsson	2189d34b16	Bump version to v0.1.207 All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.207	2025-11-28 23:16:33 +01:00
Christoffer Martinsson	28cfd5758f	Fix service metrics not showing - remove cache check The service_status_cache from discovery only has active_state with all detailed metrics set to None. During collection, get_service_status() was returning cached data instead of fetching fresh systemctl show data. Now always fetch fresh data to populate memory_bytes, restart_count, and uptime_seconds properly.	2025-11-28 23:15:51 +01:00
Christoffer Martinsson	5deb8cf8d8	Bump version to v0.1.206 All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details v0.1.206	2025-11-28 23:07:20 +01:00
Christoffer Martinsson	0e01813ff5	Add service metrics from systemctl (memory, uptime, restarts) Shared: - Add memory_bytes, restart_count, uptime_seconds to ServiceData Agent: - Add new fields to ServiceStatusInfo struct - Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show - Calculate uptime from start timestamp - Parse and populate new fields in ServiceData - Remove unused load_state and sub_state fields Dashboard: - Add memory_bytes, restart_count, uptime_seconds to ServiceInfo - Update header: Service, Status, RAM, Uptime, ↻ (restarts) - Format memory as MB/GB - Format uptime as Xd Xh, Xh Xm, or Xm - Show restart count with ! prefix if > 0 to indicate instability All metrics obtained from single systemctl show call - zero overhead.	2025-11-28 23:06:13 +01:00
Christoffer Martinsson	c3c9507a42	Bump version to v0.1.205 All checks were successful Build and Release / build-and-release (push) Successful in 1m22s Details v0.1.205	2025-11-28 22:37:28 +01:00
Christoffer Martinsson	4d77ffe17e	Remove RAM and Disk columns from services widget header Changed header from 4 columns to 2 columns: - Before: Service, Status, RAM, Disk - After: Service, Status Matches the removal of memory_mb and disk_gb fields.	2025-11-28 22:37:14 +01:00
Christoffer Martinsson	14f74b4cac	Bump version to v0.1.204 All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details v0.1.204	2025-11-28 14:33:19 +01:00
Christoffer Martinsson	67b686f8c7	Remove RAM and disk collection for services Complete removal of service resource metrics: Agent: - Remove memory_mb and disk_gb fields from ServiceData struct - Remove get_service_memory_usage() method - Remove get_service_disk_usage() method - Remove get_directory_size() method - Remove unused warn import Dashboard: - Remove memory_mb and disk_gb from ServiceInfo struct - Remove memory/disk display from format_parent_service_line - Remove memory/disk parsing in legacy metric path - Remove unused format_disk_size() function Service resource metrics were slow, unreliable, and never worked properly since structured data migration. Will be handled differently in the future.	2025-11-28 14:25:12 +01:00
Christoffer Martinsson	e3996fdb84	Fix compilation errors from command receiver removal All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details - Remove AgentCommand import from agent.rs - Remove handle_commands() method - Remove command handling from main loop - Remove command_port validation checks v0.1.203	2025-11-28 13:01:36 +01:00
Christoffer Martinsson	f94ca60e69	Bump version to v0.1.203 Some checks failed Build and Release / build-and-release (push) Failing after 1m36s Details	2025-11-28 12:53:56 +01:00
Christoffer Martinsson	c19ff56df8	Remove unused ZMQ command receiver (port 6131) Service control migrated to SSH, command receiver no longer needed. - Remove command_receiver Socket from ZmqHandler - Remove try_receive_command method - Remove AgentCommand enum - Remove command_port from ZmqConfig	2025-11-28 12:52:43 +01:00
Christoffer Martinsson	fe2f604703	Bump version to v0.1.202 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.202	2025-11-28 12:45:25 +01:00
Christoffer Martinsson	8bfd416327	Revert to v0.1.192 - fix agent hang issue Some checks failed Build and Release / build-and-release (push) Failing after 1m8s Details v0.1.201	2025-11-28 12:42:24 +01:00
Christoffer Martinsson	85c6c624fb	Revert D-Bus usage, use systemctl commands only All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Remove zbus dependency from agent - Replace D-Bus Connection calls with systemctl show commands - Fix agent hang by eliminating blocking D-Bus operations - get_unit_property now uses systemctl show with property flags - Memory, disk usage, and nginx config queries use systemctl - Simpler, more reliable service monitoring v0.1.200	2025-11-28 12:15:04 +01:00
Christoffer Martinsson	eab3f17428	Fix agent hang by reverting service discovery to systemctl All checks were successful Build and Release / build-and-release (push) Successful in 1m31s Details The D-Bus ListUnits call in discover_services_internal() was causing the agent to hang on startup. Root cause: - D-Bus ListUnits call with complex tuple destructuring hung indefinitely - Agent never completed first collection cycle - No collector output in logs Fix: - Revert discover_services_internal() to use systemctl list-units/list-unit-files - Keep D-Bus-based property queries (WorkingDirectory, MemoryCurrent, ExecStart) - Hybrid approach: systemctl for discovery, D-Bus for individual queries External commands still used: - systemctl list-units, list-unit-files (service discovery) - smartctl (SMART data) - sudo du (directory sizes) - nginx -T (config fallback) Version bump: 0.1.198 → 0.1.199 v0.1.199	2025-11-28 11:57:31 +01:00
Christoffer Martinsson	7ad149bbe4	Replace all systemctl commands with zbus D-Bus API All checks were successful Build and Release / build-and-release (push) Successful in 1m31s Details Complete migration from systemctl subprocess calls to native D-Bus communication: Removed systemctl commands: - systemctl is-active (fallback) - use D-Bus cache from ListUnits - systemctl show --property=LoadState,ActiveState,SubState - use D-Bus cache - systemctl show --property=WorkingDirectory - use D-Bus Properties.Get - systemctl show --property=MemoryCurrent - use D-Bus Properties.Get - systemctl show nginx --property=ExecStart - use D-Bus Properties.Get Implementation details: - Added get_unit_property() helper for D-Bus property access - Made get_nginx_site_metrics() async to support D-Bus calls - Made get_nginx_sites_internal() async - Made discover_nginx_sites() async - Made get_nginx_config_from_systemd() async - Fixed RwLock guard Send issues by using scoped locks Remaining external commands: - smartctl (disk.rs) - No Rust alternative for SMART data - sudo du (systemd.rs) - Directory size measurement - nginx -T (systemd.rs) - Nginx config fallback - timeout hostname (nixos.rs) - Rare fallback only Version bump: 0.1.197 → 0.1.198 v0.1.198	2025-11-28 11:46:28 +01:00
Christoffer Martinsson	b444c88ea0	Replace external commands with native Rust APIs All checks were successful Build and Release / build-and-release (push) Successful in 1m54s Details Significant performance improvements by eliminating subprocess spawning: - Replace 'ip' commands with rtnetlink for network interface discovery - Replace 'docker ps/images' with bollard Docker API client - Replace 'systemctl list-units' with zbus D-Bus for systemd interaction - Replace 'df' with statvfs() syscall for filesystem statistics - Replace 'lsblk' with /proc/mounts parsing Add interval-based caching to collectors: - DiskCollector now respects interval_seconds configuration - SystemdCollector now respects interval_seconds configuration - CpuCollector now respects interval_seconds configuration Remove unused command communication infrastructure: - Remove port 6131 ZMQ command receiver - Clean up unused AgentCommand types Dependencies added: - rtnetlink = "0.14" - netlink-packet-route = "0.19" - bollard = "0.17" - zbus = "4.0" - nix (fs features for statvfs) v0.1.197	2025-11-28 11:27:33 +01:00
Christoffer Martinsson	317cf76bd1	Bump version to v0.1.196 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.196	2025-11-27 23:16:40 +01:00
Christoffer Martinsson	0db1a165b9	Revert "Implement cached collector architecture with configurable timeouts" This reverts commit 2740de9b54e5239acfe9d788f9e6a877f7274331.	2025-11-27 23:12:08 +01:00
Christoffer Martinsson	3c2955376d	Revert "Fix ZMQ sender blocking - move to independent thread with try_read" This reverts commit 01e1f33b66f3f33a448b95be73aabcfdd3a8c6d0.	2025-11-27 23:10:55 +01:00
Christoffer Martinsson	f09ccabc7f	Revert "Fix data duplication in cached collector architecture" This reverts commit 14618c59c61b2f8d731697f01b8388ace825a809.	2025-11-27 23:09:40 +01:00
Christoffer Martinsson	43dd5a901a	Update CLAUDE.md with correct ZMQ sender architecture	2025-11-27 22:59:38 +01:00
Christoffer Martinsson	01e1f33b66	Fix ZMQ sender blocking - move to independent thread with try_read All checks were successful Build and Release / build-and-release (push) Successful in 1m21s Details CRITICAL FIX: The previous cached collector architecture still had ZMQ sending in the main event loop, where it could block waiting for RwLock when collectors were writing. This caused the 3-8 second delays you observed. Changes: - Move ZMQ publisher to dedicated std::thread (ZMQ sockets aren't thread-safe) - Use try_read() instead of read() to avoid blocking on write locks - Send previous data if cache is locked by collector - ZMQ now sends every 2s regardless of collector timing - Remove publisher from ZmqHandler (now only handles commands) Architecture: - Collectors: Independent tokio tasks updating shared cache - ZMQ Sender: Dedicated OS thread with its own publisher socket - Main Loop: Only handles commands and notifications This ensures ZMQ transmission is NEVER blocked by slow collectors. Bump version to v0.1.195 v0.1.195	2025-11-27 22:56:58 +01:00
Christoffer Martinsson	ed6399b914	Bump version to v0.1.194 All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details v0.1.194	2025-11-27 22:46:17 +01:00
Christoffer Martinsson	14618c59c6	Fix data duplication in cached collector architecture Critical bug fix: Collectors were appending to Vecs instead of replacing them, causing duplicate entries with each collection cycle. Fixed by adding .clear() calls before populating: - Memory collector: tmpfs Vec (was showing 11+ duplicates) - Disk collector: drives and pools Vecs - Systemd collector: services Vec - Network collector: Already correct (assigns new Vec) This prevents the exponential growth of duplicate entries in the dashboard UI.	2025-11-27 22:45:44 +01:00
Christoffer Martinsson	2740de9b54	Implement cached collector architecture with configurable timeouts All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details Major architectural refactor to eliminate false "host offline" alerts: - Replace sequential blocking collectors with independent async tasks - Each collector runs at configurable interval and updates shared cache - ZMQ sender reads cache every 1-2s regardless of collector speed - Collector intervals: CPU/Memory (1-10s), Backup/NixOS (30-60s), Disk/Systemd (60-300s) All intervals now configurable via NixOS config: - collectors..interval_seconds (collection frequency per collector) - collectors..command_timeout_seconds (timeout for shell commands) - notifications.check_interval_seconds (status change detection rate) Command timeouts increased from hardcoded 2-3s to configurable 10-30s: - Disk collector: 30s (SMART operations, lsblk) - Systemd collector: 15s (systemctl, docker, du commands) - Network collector: 10s (ip route, ip addr) Benefits: - No false "offline" alerts when slow collectors take >10s - Different update rates for different metric types - Better resource management with longer timeouts - Full NixOS configuration control Bump version to v0.1.193 v0.1.193	2025-11-27 22:37:20 +01:00

1 2 3 4 5 ...

491 Commits