cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	c62c7fa698	Remove debug logging from disk collector All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Removed all debug! statements from disk collector to reduce log noise. Bump version to v0.1.226	2025-11-30 00:44:38 +01:00
Christoffer Martinsson	0b1d8c0a73	Fix Data_3 showing as unknown by handling smartctl warning exit codes All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Root cause: sda's temperature exceeded threshold in the past, causing smartctl to return exit code 32 (warning: "Attributes have been <= threshold in the past"). The agent checked output.status.success() and rejected the entire output as failed, even though the data (serial, temperature, health) was perfectly valid. Smartctl exit codes are bit flags for informational warnings: - Exit 0: No warnings - Exit 32 (bit 5): Attributes were at/below threshold in past - Exit 64 (bit 6): Error log has entries - etc. The output data is valid regardless of these warning flags. Solution: Parse output as long as it's not empty, ignore exit code. Only return UNKNOWN if output is actually empty (command truly failed). Result: Data_3 will now show "ZDZ4VE0B T: 31°C" instead of "? Data_3: sda" Bump version to v0.1.225	2025-11-30 00:35:19 +01:00
Christoffer Martinsson	c77aa6eaaa	Fix Data_3 timeout by removing sequential SMART during pool detection All checks were successful Build and Release / build-and-release (push) Successful in 1m34s Details Root cause: SMART data was collected TWICE: 1. Sequential collection during pool detection in get_drive_info_for_path() using problematic tokio::task::block_in_place() nesting 2. Parallel collection in get_smart_data_for_drives() (v0.1.223) The sequential collection happened FIRST during pool detection, causing sda (Data_3) to timeout due to: - Bad async nesting: block_in_place() wrapping block_on() - Sequential execution causing runtime issues - sda being third in sequence, runtime degraded by then Solution: Remove SMART collection from get_drive_info_for_path(). Pool drive temperatures are populated later from the parallel SMART collection which properly uses futures::join_all. Benefits: - Eliminates problematic async nesting - All SMART queries happen once in parallel only - sda/Data_3 should now show serial (ZDZ4VE0B) and temperature Bump version to v0.1.224	2025-11-30 00:14:25 +01:00
Christoffer Martinsson	8a0e68f0e3	Fix Data_3 timeout by parallelizing SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Root cause: SMART data was collected sequentially, one drive at a time. With 5 drives taking ~500ms each, total collection time was 2.5+ seconds. When disk collector runs every 1 second, this caused overlapping collections creating resource contention. The last drive (sda/Data_3) would timeout due to the drive being accessed by the previous collection. Solution: Query all drives in parallel using futures::join_all. Now all drives get their SMART data collected simultaneously with independent 3-second timeouts, eliminating contention and reducing total collection time from 2.5+ seconds to ~500ms (the slowest single drive). Benefits: - All drives complete in ~500ms instead of 2.5+ seconds - No overlapping collections causing resource contention - Each drive gets full 3-second timeout window - sda/Data_3 should now show temperature and serial number Bump version to v0.1.223	2025-11-29 23:51:43 +01:00
Christoffer Martinsson	2d653fe9ae	Fix empty Storage section by configuring stdio pipes All checks were successful Build and Release / build-and-release (push) Successful in 1m15s Details Root cause: run_command_with_timeout() was calling cmd.spawn() without configuring stdout/stderr pipes. This caused command output to go to journald instead of being captured by wait_with_output(). The disk collector received empty output and failed silently. Solution: Configure stdout(Stdio::piped()) and stderr(Stdio::piped()) before spawning commands. This ensures wait_with_output() can properly capture command output. Fixes: Empty Storage section, lsblk output appearing in journald Bump version to v0.1.222	2025-11-29 23:25:17 +01:00
Christoffer Martinsson	caba78004e	Fix empty Storage section by properly aliasing command types All checks were successful Build and Release / build-and-release (push) Successful in 2m6s Details v0.1.220 broke disk collector by changing the import from std::process::Command to tokio::process::Command, but lines 193 and 767 explicitly used std::process::Command::new() which silently failed. Solution: Import both as aliases (TokioCommand/StdCommand) and use appropriate type for each operation - async commands use TokioCommand with run_command_with_timeout, sync commands use StdCommand with system timeout wrapper. Fixes: Empty Storage section after v0.1.220 deployment Bump version to v0.1.221	2025-11-29 21:29:33 +01:00
Christoffer Martinsson	77bf08a978	Fix blocking smartctl commands with proper async/timeout handling All checks were successful Build and Release / build-and-release (push) Successful in 2m2s Details - Changed disk collector to use tokio::process::Command instead of std::process::Command - Updated run_command_with_timeout to properly kill processes on timeout - Fixes issue where smartctl hangs on problematic drives (/dev/sda) freezing entire agent - Timeout now force-kills hung processes using kill -9, preventing orphaned smartctl processes This resolves the issue where Data_3 showed unknown status because smartctl was hanging indefinitely trying to read from a problematic drive, blocking the entire collector. Bump version to v0.1.220 Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 21:09:04 +01:00
Christoffer Martinsson	40f3ff66d8	Show archive count range to detect inconsistencies - Display single number if all services have same count - Display min-max range if counts differ (indicates problem)	2025-11-29 17:59:24 +01:00
Christoffer Martinsson	620d1f10b6	Show archive count per service instead of total sum	2025-11-29 17:51:01 +01:00
Christoffer Martinsson	f5913dbd43	Add archive count to backup disk display	2025-11-29 17:41:11 +01:00
Christoffer Martinsson	afb8d68e03	Implement multi-disk backup support - Update BackupData structure to support multiple backup disks - Scan /var/lib/backup/status/ directory for all status files - Calculate status icons for backup and disk usage - Aggregate repository status from all disks - Update dashboard to display all backup disks with per-disk status - Display repository list with count and aggregated status	2025-11-29 16:44:50 +01:00
Christoffer Martinsson	5e08b34280	Move C-state name cleaning to agent for smaller JSON All checks were successful Build and Release / build-and-release (push) Successful in 1m32s Details - Agent now extracts "C" + digits pattern (C3, C10) using char parsing - Removes suffixes like "_ACPI", "_MWAIT" at source - Reduces JSON payload size over ZMQ - No regex dependency - uses fast char iteration (~1μs overhead) - Robust fallback to original name if pattern not found - Dashboard simplified to use clean names directly Bump version to v0.1.212 Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 14:05:55 +01:00
Christoffer Martinsson	d84690cb3b	Move transmission interval to ZMQ config section All checks were successful Build and Release / build-and-release (push) Successful in 1m43s Details - Changed code to use zmq.transmission_interval_seconds instead of top-level collection_interval_seconds - Removed collection_interval_seconds from AgentConfig - Updated validation to check zmq.transmission_interval_seconds - Improves config organization by grouping all ZMQ settings together Bump version to v0.1.210 Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 13:31:39 +01:00
Christoffer Martinsson	7c030b33d6	Show top 3 C-states with usage percentages All checks were successful Build and Release / build-and-release (push) Successful in 1m21s Details - Changed CpuData.cstate from String to Vec<CStateInfo> - Added CStateInfo struct with name and percent fields - Collector calculates percentage for each C-state based on accumulated time - Sorts and returns top 3 C-states by usage - Dashboard displays: "C10:79% C8:10% C6:8%" Provides better visibility into CPU idle state distribution. Bump version to v0.1.209 Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 23:45:46 +01:00
Christoffer Martinsson	c6817537a8	Replace CPU frequency with C-state monitoring All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Changed CpuData.frequency_mhz to CpuData.cstate (String) - Implemented collect_cstate() to read CPU idle depth from sysfs - Finds deepest C-state with most accumulated time (C0-C10) - Updated dashboard to display C-state instead of frequency - More accurate indicator of CPU activity vs power management Bump version to v0.1.208 Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 23:30:14 +01:00
Christoffer Martinsson	28cfd5758f	Fix service metrics not showing - remove cache check The service_status_cache from discovery only has active_state with all detailed metrics set to None. During collection, get_service_status() was returning cached data instead of fetching fresh systemctl show data. Now always fetch fresh data to populate memory_bytes, restart_count, and uptime_seconds properly.	2025-11-28 23:15:51 +01:00
Christoffer Martinsson	0e01813ff5	Add service metrics from systemctl (memory, uptime, restarts) Shared: - Add memory_bytes, restart_count, uptime_seconds to ServiceData Agent: - Add new fields to ServiceStatusInfo struct - Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show - Calculate uptime from start timestamp - Parse and populate new fields in ServiceData - Remove unused load_state and sub_state fields Dashboard: - Add memory_bytes, restart_count, uptime_seconds to ServiceInfo - Update header: Service, Status, RAM, Uptime, ↻ (restarts) - Format memory as MB/GB - Format uptime as Xd Xh, Xh Xm, or Xm - Show restart count with ! prefix if > 0 to indicate instability All metrics obtained from single systemctl show call - zero overhead.	2025-11-28 23:06:13 +01:00
Christoffer Martinsson	67b686f8c7	Remove RAM and disk collection for services Complete removal of service resource metrics: Agent: - Remove memory_mb and disk_gb fields from ServiceData struct - Remove get_service_memory_usage() method - Remove get_service_disk_usage() method - Remove get_directory_size() method - Remove unused warn import Dashboard: - Remove memory_mb and disk_gb from ServiceInfo struct - Remove memory/disk display from format_parent_service_line - Remove memory/disk parsing in legacy metric path - Remove unused format_disk_size() function Service resource metrics were slow, unreliable, and never worked properly since structured data migration. Will be handled differently in the future.	2025-11-28 14:25:12 +01:00
Christoffer Martinsson	e3996fdb84	Fix compilation errors from command receiver removal All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details - Remove AgentCommand import from agent.rs - Remove handle_commands() method - Remove command handling from main loop - Remove command_port validation checks	2025-11-28 13:01:36 +01:00
Christoffer Martinsson	c19ff56df8	Remove unused ZMQ command receiver (port 6131) Service control migrated to SSH, command receiver no longer needed. - Remove command_receiver Socket from ZmqHandler - Remove try_receive_command method - Remove AgentCommand enum - Remove command_port from ZmqConfig	2025-11-28 12:52:43 +01:00
Christoffer Martinsson	2f94a4b853	Add service_type field to separate data from presentation Changes: - Add service_type field to SubServiceData: 'nginx_site', 'container', 'image' - Agent sends pure data without display formatting - Dashboard checks service_type to decide presentation - Docker images now display without status icon (service_type='image') - Remove unused image_size_str from docker images tuple Clean separation: agent provides data, dashboard handles display logic.	2025-11-27 18:09:20 +01:00
Christoffer Martinsson	fac0188c6f	Change docker image display format and status Changes: - Rename docker images from 'image_node:18...' to 'I node:18...' for conciseness - Change image status from 'active' to 'inactive' for neutral informational display - Images now show with gray empty circle ○ instead of green filled circle ● Docker images are static artifacts without meaningful operational status, so using inactive status provides neutral gray display that won't trigger alerts or affect service status aggregation.	2025-11-27 17:57:24 +01:00
Christoffer Martinsson	374b126446	Reduce all command timeouts to 2-3 seconds max With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds. Changed timeouts: - smartctl: 10s → 3s (critical: multiple drives queried sequentially) - du: 5s → 2s - lsblk: 5s → 2s - systemctl list commands: 5s → 3s - systemctl show/is-active: 3s → 2s - docker commands: 5s → 3s - df, ip commands: 3s → 2s Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.	2025-11-27 16:38:54 +01:00
Christoffer Martinsson	1e0510be81	Add comprehensive timeouts to all blocking system commands Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission. Changes: - Add run_command_with_timeout() wrapper using tokio for async command execution - Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives) - Apply 5s timeout to du, lsblk, systemctl list commands - Apply 3s timeout to systemctl show/is-active, df, ip commands - Apply 2s timeout to hostname command - Use system 'timeout' command for sync operations where async not needed Critical fixes: - smartctl: Failing drives could block for 30+ seconds per drive - du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds - systemctl/docker: Commands could block indefinitely during system issues With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.	2025-11-27 16:34:08 +01:00
Christoffer Martinsson	6d6beb207d	Parse Docker image sizes to MB and sort services alphabetically All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details	2025-11-27 15:57:38 +01:00
Christoffer Martinsson	7a68da01f5	Remove debug logging for NVMe SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details	2025-11-27 15:40:16 +01:00
Christoffer Martinsson	5be67fed64	Add debug logging for NVMe SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 15:00:48 +01:00
Christoffer Martinsson	cac836601b	Add NVMe device type flag for SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 13:34:30 +01:00
Christoffer Martinsson	bd22ce265b	Use direct smartctl with CAP_SYS_RAWIO instead of sudo All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details	2025-11-27 13:22:13 +01:00
Christoffer Martinsson	bbc8b7b1cb	Add info-level logging for SMART data collection debugging All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 13:15:53 +01:00
Christoffer Martinsson	5dd8cadef3	Remove debug logging from Docker collection code All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 12:50:20 +01:00
Christoffer Martinsson	fefe30ec51	Remove sudo from docker commands - use docker group membership instead All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details Agent changes: - Changed docker ps and docker images commands to run without sudo - cm-agent user is already in docker group, so sudo is not needed - Fixes "unable to change to root gid: Operation not permitted" error - Systemd security restrictions were blocking sudo gid changes This fixes Docker container and image collection on systems with systemd security hardening enabled. Updated to version 0.1.178	2025-11-27 12:35:38 +01:00
Christoffer Martinsson	fb40cce748	Add stderr logging for Docker images command failure All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Agent changes: - Log stderr output when docker images command fails - This will show the actual error message (e.g., permission denied, docker not found) - Helps diagnose why docker images collection is failing Updated to version 0.1.177	2025-11-27 12:28:55 +01:00
Christoffer Martinsson	eaa057b284	Change Docker collection logging from debug to info level All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Agent changes: - Changed debug!() to info!() for Docker collection logs - This allows logs to show with default RUST_LOG=info setting - Added info import to tracing use statement Now logs will be visible in journalctl without needing to change log level: - "Collecting Docker sub-services for service: docker" - "Found X Docker containers" - "Found X Docker images" - "Total Docker sub-services added: X" Updated to version 0.1.176	2025-11-27 12:18:17 +01:00
Christoffer Martinsson	f23a1b5cec	Add debug logging for Docker container and image collection All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Agent changes: - Added debug logging to Docker images collection function - Log when Docker sub-services are being collected for a service - Log count of containers and images found - Log total sub-services added - Show command failure details instead of silently returning empty vec This will help diagnose why Docker images aren't showing up as sub-services on some hosts. The logs will show if the docker commands are failing or if the collection is working but data isn't being transmitted properly. Updated to version 0.1.175	2025-11-27 12:04:51 +01:00
Christoffer Martinsson	3f98f68b51	Show Docker images as sub-services under docker service All checks were successful Build and Release / build-and-release (push) Successful in 1m23s Details Agent changes: - Added get_docker_images() function to list all Docker images - Use docker images to show stored images with repository:tag and size - Display images as sub-services under docker service with size in parentheses - Skip dangling images (<none>:<none>) - Images shown with active status (always present when listed) Example display: ● docker active 139M 1MB ├─ ● docker_gitea active ├─ ○ docker_old-app inactive ├─ ● image_nginx:latest (142MB) ├─ ● image_postgres:15 (379MB) └─ ● image_gitea:latest (256MB) Updated to version 0.1.174	2025-11-27 11:43:35 +01:00
Christoffer Martinsson	3d38a7a984	Show all Docker containers as sub-services with active/inactive status All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Agent changes: - Use docker ps -a to show ALL containers (running and stopped) - Map container status: Up -> active, Exited/Created -> inactive, other -> failed - Display Docker containers as sub-services under the docker service - Each container shown with proper status indicator Example display: ● docker active 139M 1MB ├─ ● docker_gitea active ├─ ○ docker_old-app inactive └─ ● docker_immich active Updated to version 0.1.173	2025-11-27 10:56:15 +01:00
Christoffer Martinsson	b0ee0242bd	Show all Docker containers as top-level services with active/inactive status All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details Agent changes: - Changed docker ps to docker ps -a to show ALL containers (running and stopped) - Map container status: Up -> active, Exited/Created -> inactive, other -> failed - Display Docker containers as individual top-level services instead of sub-services - Each container shown as "docker_{container_name}" in service list This provides better visibility of all containers and their status directly in the services panel, making it easier to see stopped containers at a glance. Updated to version 0.1.172	2025-11-27 10:51:47 +01:00
Christoffer Martinsson	937f4ad427	Add VLAN ID display and smart parent assignment for virtual interfaces All checks were successful Build and Release / build-and-release (push) Successful in 1m43s Details Agent changes: - Parse /proc/net/vlan/config to extract VLAN IDs for interfaces - Detect primary physical interface via default route - Auto-assign primary interface as parent for virtual interfaces without explicit parent - Added vlan_id field to NetworkInterfaceData Dashboard changes: - Display VLAN ID in format "interface (vlan X): IP" - Show VLAN IDs for both nested and standalone virtual interfaces This ensures virtual interfaces (docker0, tailscale0, etc.) are properly nested under the primary physical NIC, and VLAN interfaces show their IDs. Updated to version 0.1.170	2025-11-27 09:52:45 +01:00
Christoffer Martinsson	8aefab83ae	Fix network interface display for VLANs and physical NICs All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Agent changes: - Filter out ifb* interfaces from network display - Parse @parent notation for VLAN interfaces (e.g., lan@enp0s31f6) - Show physical interfaces even without IP addresses - Only filter virtual interfaces that have no IPs - Extract parent interface relationships for proper nesting Dashboard changes: - Nest VLAN/child interfaces under their physical parent - Show physical NICs with status icons even when down - Display child interfaces grouped under parent interface - Keep standalone virtual interfaces at root level Updated to version 0.1.169	2025-11-26 23:47:16 +01:00
Christoffer Martinsson	5c6b11c794	Filter out network interfaces without IP addresses All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Remove interfaces like ifb0, dummy devices that have no IPs. Only show interfaces with at least one IPv4 or IPv6 address. Version bump to 0.1.167	2025-11-26 19:19:21 +01:00
Christoffer Martinsson	fc247bd0ad	Create dedicated network collector with physical/virtual interface grouping All checks were successful Build and Release / build-and-release (push) Successful in 1m43s Details Move network collection from NixOS collector to dedicated NetworkCollector. Add link status detection for physical interfaces (up/down). Group interfaces by physical/virtual, show status icons for physical NICs only. Down interfaces show as Inactive instead of Critical. Version bump to 0.1.165	2025-11-26 19:02:50 +01:00
Christoffer Martinsson	b7ffeaced5	Add network interface collection and display Some checks failed Build and Release / build-and-release (push) Failing after 1m32s Details Extend NixOS collector to gather network interfaces using ip command JSON output. Display all interfaces with IPv4 and IPv6 addresses in Network section above CPU metrics. Filters out loopback and link-local addresses. Version bump to 0.1.161	2025-11-26 17:41:35 +01:00
Christoffer Martinsson	3858309a5d	Fix Docker container detection with sudo permissions Some checks failed Build and Release / build-and-release (push) Failing after 1m19s Details Update systemd collector to use sudo for docker ps command to resolve permission issues when cm-agent user lacks docker group membership. This ensures Docker containers are properly discovered and displayed as sub-services under the docker service. Version: 0.1.160	2025-11-25 12:40:27 +01:00
Christoffer Martinsson	df104bf940	Remove debug prints and unused code All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details - Remove all debug println statements - Remove unused service_tracker module - Remove unused struct fields and methods - Remove empty placeholder files (cpu.rs, memory.rs, defaults.rs) - Fix all compiler warnings - Clean build with zero warnings Version bump to 0.1.159	2025-11-25 12:19:04 +01:00
Christoffer Martinsson	d5ce36ee18	Add support for additional SMART attributes All checks were successful Build and Release / build-and-release (push) Successful in 1m30s Details - Support Temperature_Case attribute for Intel SSDs - Support Media_Wearout_Indicator attribute for wear percentage - Parse wear value from column 3 (VALUE) for Media_Wearout_Indicator - Fixes temperature and wear display for Intel PHLA847000FL512DGN drives	2025-11-25 11:53:08 +01:00
Christoffer Martinsson	4f80701671	Fix NVMe serial display and improve pool health logic All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Fix physical drive serial number display in dashboard - Improve pool health calculation for arrays with multiple disks - Support proper tree symbols for multiple parity drives - Read git commit hash from /var/lib/cm-dashboard/git-commit for Build display	2025-11-25 11:44:20 +01:00
Christoffer Martinsson	267654fda4	Improve NVMe serial parsing and restructure MergerFS display All checks were successful Build and Release / build-and-release (push) Successful in 1m25s Details - Fix NVMe serial number parsing to handle whitespace variations - Move mount point to MergerFS header, remove drive count - Restructure data drives to same level as parity with Data_1, Data_2 labels - Remove "Total:" label from pool usage line - Update parity to use closing tree symbol as last item	2025-11-25 11:28:54 +01:00
Christoffer Martinsson	dc1105eefe	Display disk serial numbers instead of device names All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details - Add serial_number field to DriveData structure - Collect serial numbers from SMART data for all drives - Display truncated serial numbers (last 8 chars) in dashboard - Fix parity drive label to show status icon before "Parity:" - Fix mount point label styling to match other labels	2025-11-25 11:06:54 +01:00
Christoffer Martinsson	c9d12793ef	Replace device names with serial numbers in MergerFS pool display All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details Updates disk collector and dashboard to show drive serial numbers instead of device names (sdX) for MergerFS data/parity drives. Agent extracts serial numbers from SMART data and dashboard displays them when available, falling back to device names.	2025-11-25 10:30:37 +01:00

1 2 3 4 5 ...

285 Commits