cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	c19ff56df8	Remove unused ZMQ command receiver (port 6131) Service control migrated to SSH, command receiver no longer needed. - Remove command_receiver Socket from ZmqHandler - Remove try_receive_command method - Remove AgentCommand enum - Remove command_port from ZmqConfig	2025-11-28 12:52:43 +01:00
Christoffer Martinsson	fe2f604703	Bump version to v0.1.202 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.202	2025-11-28 12:45:25 +01:00
Christoffer Martinsson	8bfd416327	Revert to v0.1.192 - fix agent hang issue Some checks failed Build and Release / build-and-release (push) Failing after 1m8s Details v0.1.201	2025-11-28 12:42:24 +01:00
Christoffer Martinsson	85c6c624fb	Revert D-Bus usage, use systemctl commands only All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Remove zbus dependency from agent - Replace D-Bus Connection calls with systemctl show commands - Fix agent hang by eliminating blocking D-Bus operations - get_unit_property now uses systemctl show with property flags - Memory, disk usage, and nginx config queries use systemctl - Simpler, more reliable service monitoring v0.1.200	2025-11-28 12:15:04 +01:00
Christoffer Martinsson	eab3f17428	Fix agent hang by reverting service discovery to systemctl All checks were successful Build and Release / build-and-release (push) Successful in 1m31s Details The D-Bus ListUnits call in discover_services_internal() was causing the agent to hang on startup. Root cause: - D-Bus ListUnits call with complex tuple destructuring hung indefinitely - Agent never completed first collection cycle - No collector output in logs Fix: - Revert discover_services_internal() to use systemctl list-units/list-unit-files - Keep D-Bus-based property queries (WorkingDirectory, MemoryCurrent, ExecStart) - Hybrid approach: systemctl for discovery, D-Bus for individual queries External commands still used: - systemctl list-units, list-unit-files (service discovery) - smartctl (SMART data) - sudo du (directory sizes) - nginx -T (config fallback) Version bump: 0.1.198 → 0.1.199 v0.1.199	2025-11-28 11:57:31 +01:00
Christoffer Martinsson	7ad149bbe4	Replace all systemctl commands with zbus D-Bus API All checks were successful Build and Release / build-and-release (push) Successful in 1m31s Details Complete migration from systemctl subprocess calls to native D-Bus communication: Removed systemctl commands: - systemctl is-active (fallback) - use D-Bus cache from ListUnits - systemctl show --property=LoadState,ActiveState,SubState - use D-Bus cache - systemctl show --property=WorkingDirectory - use D-Bus Properties.Get - systemctl show --property=MemoryCurrent - use D-Bus Properties.Get - systemctl show nginx --property=ExecStart - use D-Bus Properties.Get Implementation details: - Added get_unit_property() helper for D-Bus property access - Made get_nginx_site_metrics() async to support D-Bus calls - Made get_nginx_sites_internal() async - Made discover_nginx_sites() async - Made get_nginx_config_from_systemd() async - Fixed RwLock guard Send issues by using scoped locks Remaining external commands: - smartctl (disk.rs) - No Rust alternative for SMART data - sudo du (systemd.rs) - Directory size measurement - nginx -T (systemd.rs) - Nginx config fallback - timeout hostname (nixos.rs) - Rare fallback only Version bump: 0.1.197 → 0.1.198 v0.1.198	2025-11-28 11:46:28 +01:00
Christoffer Martinsson	b444c88ea0	Replace external commands with native Rust APIs All checks were successful Build and Release / build-and-release (push) Successful in 1m54s Details Significant performance improvements by eliminating subprocess spawning: - Replace 'ip' commands with rtnetlink for network interface discovery - Replace 'docker ps/images' with bollard Docker API client - Replace 'systemctl list-units' with zbus D-Bus for systemd interaction - Replace 'df' with statvfs() syscall for filesystem statistics - Replace 'lsblk' with /proc/mounts parsing Add interval-based caching to collectors: - DiskCollector now respects interval_seconds configuration - SystemdCollector now respects interval_seconds configuration - CpuCollector now respects interval_seconds configuration Remove unused command communication infrastructure: - Remove port 6131 ZMQ command receiver - Clean up unused AgentCommand types Dependencies added: - rtnetlink = "0.14" - netlink-packet-route = "0.19" - bollard = "0.17" - zbus = "4.0" - nix (fs features for statvfs) v0.1.197	2025-11-28 11:27:33 +01:00
Christoffer Martinsson	317cf76bd1	Bump version to v0.1.196 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.196	2025-11-27 23:16:40 +01:00
Christoffer Martinsson	0db1a165b9	Revert "Implement cached collector architecture with configurable timeouts" This reverts commit 2740de9b54e5239acfe9d788f9e6a877f7274331.	2025-11-27 23:12:08 +01:00
Christoffer Martinsson	3c2955376d	Revert "Fix ZMQ sender blocking - move to independent thread with try_read" This reverts commit 01e1f33b66f3f33a448b95be73aabcfdd3a8c6d0.	2025-11-27 23:10:55 +01:00
Christoffer Martinsson	f09ccabc7f	Revert "Fix data duplication in cached collector architecture" This reverts commit 14618c59c61b2f8d731697f01b8388ace825a809.	2025-11-27 23:09:40 +01:00
Christoffer Martinsson	43dd5a901a	Update CLAUDE.md with correct ZMQ sender architecture	2025-11-27 22:59:38 +01:00
Christoffer Martinsson	01e1f33b66	Fix ZMQ sender blocking - move to independent thread with try_read All checks were successful Build and Release / build-and-release (push) Successful in 1m21s Details CRITICAL FIX: The previous cached collector architecture still had ZMQ sending in the main event loop, where it could block waiting for RwLock when collectors were writing. This caused the 3-8 second delays you observed. Changes: - Move ZMQ publisher to dedicated std::thread (ZMQ sockets aren't thread-safe) - Use try_read() instead of read() to avoid blocking on write locks - Send previous data if cache is locked by collector - ZMQ now sends every 2s regardless of collector timing - Remove publisher from ZmqHandler (now only handles commands) Architecture: - Collectors: Independent tokio tasks updating shared cache - ZMQ Sender: Dedicated OS thread with its own publisher socket - Main Loop: Only handles commands and notifications This ensures ZMQ transmission is NEVER blocked by slow collectors. Bump version to v0.1.195 v0.1.195	2025-11-27 22:56:58 +01:00
Christoffer Martinsson	ed6399b914	Bump version to v0.1.194 All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details v0.1.194	2025-11-27 22:46:17 +01:00
Christoffer Martinsson	14618c59c6	Fix data duplication in cached collector architecture Critical bug fix: Collectors were appending to Vecs instead of replacing them, causing duplicate entries with each collection cycle. Fixed by adding .clear() calls before populating: - Memory collector: tmpfs Vec (was showing 11+ duplicates) - Disk collector: drives and pools Vecs - Systemd collector: services Vec - Network collector: Already correct (assigns new Vec) This prevents the exponential growth of duplicate entries in the dashboard UI.	2025-11-27 22:45:44 +01:00
Christoffer Martinsson	2740de9b54	Implement cached collector architecture with configurable timeouts All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details Major architectural refactor to eliminate false "host offline" alerts: - Replace sequential blocking collectors with independent async tasks - Each collector runs at configurable interval and updates shared cache - ZMQ sender reads cache every 1-2s regardless of collector speed - Collector intervals: CPU/Memory (1-10s), Backup/NixOS (30-60s), Disk/Systemd (60-300s) All intervals now configurable via NixOS config: - collectors..interval_seconds (collection frequency per collector) - collectors..command_timeout_seconds (timeout for shell commands) - notifications.check_interval_seconds (status change detection rate) Command timeouts increased from hardcoded 2-3s to configurable 10-30s: - Disk collector: 30s (SMART operations, lsblk) - Systemd collector: 15s (systemctl, docker, du commands) - Network collector: 10s (ip route, ip addr) Benefits: - No false "offline" alerts when slow collectors take >10s - Different update rates for different metric types - Better resource management with longer timeouts - Full NixOS configuration control Bump version to v0.1.193 v0.1.193	2025-11-27 22:37:20 +01:00
Christoffer Martinsson	37f2650200	Document cached collector architecture plan Add architectural plan for separating ZMQ sending from data collection to prevent false 'host offline' alerts caused by slow collectors. Key concepts: - Shared cache (Arc<RwLock<AgentData>>) - Independent async collector tasks with different update rates - ZMQ sender always sends every 1s from cache - Fast collectors (1s), medium (5s), slow (60s) - No blocking regardless of collector speed	2025-11-27 21:49:44 +01:00
Christoffer Martinsson	833010e270	Bump version to v0.1.192 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.192	2025-11-27 18:34:53 +01:00
Christoffer Martinsson	549d9d1c72	Replace whale emoji with ASCII 'D' for performance Emoji rendering in terminals can be very slow, especially when rendered in the hot path (every frame for every docker image). The whale emoji 🐋 was causing significant rendering delays. Temporary change to ASCII 'D' to test if emoji was the performance issue.	2025-11-27 18:34:27 +01:00
Christoffer Martinsson	9b84b70581	Bump version to v0.1.191 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.191	2025-11-27 18:16:49 +01:00
Christoffer Martinsson	92c3ee3f2a	Add Docker whale icon for docker images Docker images now display with distinctive 🐋 whale icon in blue (highlight color) instead of status icons. This provides clear visual identification that these are docker images while not implying operational status.	2025-11-27 18:16:33 +01:00
Christoffer Martinsson	1be55f765d	Bump version to v0.1.190 All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.190	2025-11-27 18:09:49 +01:00
Christoffer Martinsson	2f94a4b853	Add service_type field to separate data from presentation Changes: - Add service_type field to SubServiceData: 'nginx_site', 'container', 'image' - Agent sends pure data without display formatting - Dashboard checks service_type to decide presentation - Docker images now display without status icon (service_type='image') - Remove unused image_size_str from docker images tuple Clean separation: agent provides data, dashboard handles display logic.	2025-11-27 18:09:20 +01:00
Christoffer Martinsson	ff2b43827a	Bump version to v0.1.189 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.189	2025-11-27 17:57:38 +01:00
Christoffer Martinsson	fac0188c6f	Change docker image display format and status Changes: - Rename docker images from 'image_node:18...' to 'I node:18...' for conciseness - Change image status from 'active' to 'inactive' for neutral informational display - Images now show with gray empty circle ○ instead of green filled circle ● Docker images are static artifacts without meaningful operational status, so using inactive status provides neutral gray display that won't trigger alerts or affect service status aggregation.	2025-11-27 17:57:24 +01:00
Christoffer Martinsson	6bb350f016	Bump version to v0.1.188 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.188	2025-11-27 16:39:46 +01:00
Christoffer Martinsson	374b126446	Reduce all command timeouts to 2-3 seconds max With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds. Changed timeouts: - smartctl: 10s → 3s (critical: multiple drives queried sequentially) - du: 5s → 2s - lsblk: 5s → 2s - systemctl list commands: 5s → 3s - systemctl show/is-active: 3s → 2s - docker commands: 5s → 3s - df, ip commands: 3s → 2s Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.	2025-11-27 16:38:54 +01:00
Christoffer Martinsson	76c04633b5	Bump version to v0.1.187 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.187	2025-11-27 16:34:42 +01:00
Christoffer Martinsson	1e0510be81	Add comprehensive timeouts to all blocking system commands Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission. Changes: - Add run_command_with_timeout() wrapper using tokio for async command execution - Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives) - Apply 5s timeout to du, lsblk, systemctl list commands - Apply 3s timeout to systemctl show/is-active, df, ip commands - Apply 2s timeout to hostname command - Use system 'timeout' command for sync operations where async not needed Critical fixes: - smartctl: Failing drives could block for 30+ seconds per drive - du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds - systemctl/docker: Commands could block indefinitely during system issues With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.	2025-11-27 16:34:08 +01:00
Christoffer Martinsson	9a2df906ea	Add ZMQ communication statistics tracking and display All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details v0.1.186	2025-11-27 16:14:45 +01:00
Christoffer Martinsson	6d6beb207d	Parse Docker image sizes to MB and sort services alphabetically All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details v0.1.185	2025-11-27 15:57:38 +01:00
Christoffer Martinsson	7a68da01f5	Remove debug logging for NVMe SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.184	2025-11-27 15:40:16 +01:00
Christoffer Martinsson	5be67fed64	Add debug logging for NVMe SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.183	2025-11-27 15:00:48 +01:00
Christoffer Martinsson	cac836601b	Add NVMe device type flag for SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.182	2025-11-27 13:34:30 +01:00
Christoffer Martinsson	bd22ce265b	Use direct smartctl with CAP_SYS_RAWIO instead of sudo All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.181	2025-11-27 13:22:13 +01:00
Christoffer Martinsson	bbc8b7b1cb	Add info-level logging for SMART data collection debugging All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.180	2025-11-27 13:15:53 +01:00
Christoffer Martinsson	5dd8cadef3	Remove debug logging from Docker collection code All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.179	2025-11-27 12:50:20 +01:00
Christoffer Martinsson	fefe30ec51	Remove sudo from docker commands - use docker group membership instead All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details Agent changes: - Changed docker ps and docker images commands to run without sudo - cm-agent user is already in docker group, so sudo is not needed - Fixes "unable to change to root gid: Operation not permitted" error - Systemd security restrictions were blocking sudo gid changes This fixes Docker container and image collection on systems with systemd security hardening enabled. Updated to version 0.1.178 v0.1.178	2025-11-27 12:35:38 +01:00
Christoffer Martinsson	fb40cce748	Add stderr logging for Docker images command failure All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Agent changes: - Log stderr output when docker images command fails - This will show the actual error message (e.g., permission denied, docker not found) - Helps diagnose why docker images collection is failing Updated to version 0.1.177 v0.1.177	2025-11-27 12:28:55 +01:00
Christoffer Martinsson	eaa057b284	Change Docker collection logging from debug to info level All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Agent changes: - Changed debug!() to info!() for Docker collection logs - This allows logs to show with default RUST_LOG=info setting - Added info import to tracing use statement Now logs will be visible in journalctl without needing to change log level: - "Collecting Docker sub-services for service: docker" - "Found X Docker containers" - "Found X Docker images" - "Total Docker sub-services added: X" Updated to version 0.1.176 v0.1.176	2025-11-27 12:18:17 +01:00
Christoffer Martinsson	f23a1b5cec	Add debug logging for Docker container and image collection All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Agent changes: - Added debug logging to Docker images collection function - Log when Docker sub-services are being collected for a service - Log count of containers and images found - Log total sub-services added - Show command failure details instead of silently returning empty vec This will help diagnose why Docker images aren't showing up as sub-services on some hosts. The logs will show if the docker commands are failing or if the collection is working but data isn't being transmitted properly. Updated to version 0.1.175 v0.1.175	2025-11-27 12:04:51 +01:00
Christoffer Martinsson	3f98f68b51	Show Docker images as sub-services under docker service All checks were successful Build and Release / build-and-release (push) Successful in 1m23s Details Agent changes: - Added get_docker_images() function to list all Docker images - Use docker images to show stored images with repository:tag and size - Display images as sub-services under docker service with size in parentheses - Skip dangling images (<none>:<none>) - Images shown with active status (always present when listed) Example display: ● docker active 139M 1MB ├─ ● docker_gitea active ├─ ○ docker_old-app inactive ├─ ● image_nginx:latest (142MB) ├─ ● image_postgres:15 (379MB) └─ ● image_gitea:latest (256MB) Updated to version 0.1.174 v0.1.174	2025-11-27 11:43:35 +01:00
Christoffer Martinsson	3d38a7a984	Show all Docker containers as sub-services with active/inactive status All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Agent changes: - Use docker ps -a to show ALL containers (running and stopped) - Map container status: Up -> active, Exited/Created -> inactive, other -> failed - Display Docker containers as sub-services under the docker service - Each container shown with proper status indicator Example display: ● docker active 139M 1MB ├─ ● docker_gitea active ├─ ○ docker_old-app inactive └─ ● docker_immich active Updated to version 0.1.173 v0.1.173	2025-11-27 10:56:15 +01:00
Christoffer Martinsson	b0ee0242bd	Show all Docker containers as top-level services with active/inactive status All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details Agent changes: - Changed docker ps to docker ps -a to show ALL containers (running and stopped) - Map container status: Up -> active, Exited/Created -> inactive, other -> failed - Display Docker containers as individual top-level services instead of sub-services - Each container shown as "docker_{container_name}" in service list This provides better visibility of all containers and their status directly in the services panel, making it easier to see stopped containers at a glance. Updated to version 0.1.172 v0.1.172	2025-11-27 10:51:47 +01:00
Christoffer Martinsson	8f9e9eabca	Sort virtual interfaces: VLANs first by ID, then alphabetically All checks were successful Build and Release / build-and-release (push) Successful in 1m32s Details Dashboard changes: - Sort child interfaces under physical NICs with VLANs first (by VLAN ID ascending) - Non-VLAN virtual interfaces sorted alphabetically by name - Applied same sorting to both nested children and standalone virtual interfaces Example output order: - wan (vlan 5) - lan (vlan 30) - isolan (vlan 32) - seclan (vlan 35) - br-48df2d79b46f - docker0 - tailscale0 Updated to version 0.1.171 v0.1.171	2025-11-27 10:12:59 +01:00
Christoffer Martinsson	937f4ad427	Add VLAN ID display and smart parent assignment for virtual interfaces All checks were successful Build and Release / build-and-release (push) Successful in 1m43s Details Agent changes: - Parse /proc/net/vlan/config to extract VLAN IDs for interfaces - Detect primary physical interface via default route - Auto-assign primary interface as parent for virtual interfaces without explicit parent - Added vlan_id field to NetworkInterfaceData Dashboard changes: - Display VLAN ID in format "interface (vlan X): IP" - Show VLAN IDs for both nested and standalone virtual interfaces This ensures virtual interfaces (docker0, tailscale0, etc.) are properly nested under the primary physical NIC, and VLAN interfaces show their IDs. Updated to version 0.1.170 v0.1.170	2025-11-27 09:52:45 +01:00
Christoffer Martinsson	8aefab83ae	Fix network interface display for VLANs and physical NICs All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Agent changes: - Filter out ifb* interfaces from network display - Parse @parent notation for VLAN interfaces (e.g., lan@enp0s31f6) - Show physical interfaces even without IP addresses - Only filter virtual interfaces that have no IPs - Extract parent interface relationships for proper nesting Dashboard changes: - Nest VLAN/child interfaces under their physical parent - Show physical NICs with status icons even when down - Display child interfaces grouped under parent interface - Keep standalone virtual interfaces at root level Updated to version 0.1.169 v0.1.169	2025-11-26 23:47:16 +01:00
Christoffer Martinsson	748a9f3a3b	Move Network section below RAM in system widget All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details Reordered display sections in system widget: - Network section now appears after RAM and tmpfs mounts - Improves logical grouping by placing network info between memory and storage - Updated to version 0.1.168 v0.1.168	2025-11-26 23:23:56 +01:00
Christoffer Martinsson	5c6b11c794	Filter out network interfaces without IP addresses All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Remove interfaces like ifb0, dummy devices that have no IPs. Only show interfaces with at least one IPv4 or IPv6 address. Version bump to 0.1.167 v0.1.167	2025-11-26 19:19:21 +01:00
Christoffer Martinsson	9f0aa5f806	Update network display format to match CLAUDE.md specification All checks were successful Build and Release / build-and-release (push) Successful in 1m38s Details Nest IP addresses under physical interface names. Show physical interfaces with status icon on header line. Virtual interfaces show inline with compressed IPs. Format: ● eno1: ├─ ip: 192.168.30.105 └─ tailscale0: 100.125.108.16 Version bump to 0.1.166 v0.1.166	2025-11-26 19:13:28 +01:00

1 2 3 4 5 ...

457 Commits