cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	28cfd5758f	Fix service metrics not showing - remove cache check The service_status_cache from discovery only has active_state with all detailed metrics set to None. During collection, get_service_status() was returning cached data instead of fetching fresh systemctl show data. Now always fetch fresh data to populate memory_bytes, restart_count, and uptime_seconds properly.	2025-11-28 23:15:51 +01:00
Christoffer Martinsson	5deb8cf8d8	Bump version to v0.1.206 All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details v0.1.206	2025-11-28 23:07:20 +01:00
Christoffer Martinsson	0e01813ff5	Add service metrics from systemctl (memory, uptime, restarts) Shared: - Add memory_bytes, restart_count, uptime_seconds to ServiceData Agent: - Add new fields to ServiceStatusInfo struct - Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show - Calculate uptime from start timestamp - Parse and populate new fields in ServiceData - Remove unused load_state and sub_state fields Dashboard: - Add memory_bytes, restart_count, uptime_seconds to ServiceInfo - Update header: Service, Status, RAM, Uptime, ↻ (restarts) - Format memory as MB/GB - Format uptime as Xd Xh, Xh Xm, or Xm - Show restart count with ! prefix if > 0 to indicate instability All metrics obtained from single systemctl show call - zero overhead.	2025-11-28 23:06:13 +01:00
Christoffer Martinsson	c3c9507a42	Bump version to v0.1.205 All checks were successful Build and Release / build-and-release (push) Successful in 1m22s Details v0.1.205	2025-11-28 22:37:28 +01:00
Christoffer Martinsson	4d77ffe17e	Remove RAM and Disk columns from services widget header Changed header from 4 columns to 2 columns: - Before: Service, Status, RAM, Disk - After: Service, Status Matches the removal of memory_mb and disk_gb fields.	2025-11-28 22:37:14 +01:00
Christoffer Martinsson	14f74b4cac	Bump version to v0.1.204 All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details v0.1.204	2025-11-28 14:33:19 +01:00
Christoffer Martinsson	67b686f8c7	Remove RAM and disk collection for services Complete removal of service resource metrics: Agent: - Remove memory_mb and disk_gb fields from ServiceData struct - Remove get_service_memory_usage() method - Remove get_service_disk_usage() method - Remove get_directory_size() method - Remove unused warn import Dashboard: - Remove memory_mb and disk_gb from ServiceInfo struct - Remove memory/disk display from format_parent_service_line - Remove memory/disk parsing in legacy metric path - Remove unused format_disk_size() function Service resource metrics were slow, unreliable, and never worked properly since structured data migration. Will be handled differently in the future.	2025-11-28 14:25:12 +01:00
Christoffer Martinsson	e3996fdb84	Fix compilation errors from command receiver removal All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details - Remove AgentCommand import from agent.rs - Remove handle_commands() method - Remove command handling from main loop - Remove command_port validation checks v0.1.203	2025-11-28 13:01:36 +01:00
Christoffer Martinsson	f94ca60e69	Bump version to v0.1.203 Some checks failed Build and Release / build-and-release (push) Failing after 1m36s Details	2025-11-28 12:53:56 +01:00
Christoffer Martinsson	c19ff56df8	Remove unused ZMQ command receiver (port 6131) Service control migrated to SSH, command receiver no longer needed. - Remove command_receiver Socket from ZmqHandler - Remove try_receive_command method - Remove AgentCommand enum - Remove command_port from ZmqConfig	2025-11-28 12:52:43 +01:00
Christoffer Martinsson	fe2f604703	Bump version to v0.1.202 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.202	2025-11-28 12:45:25 +01:00
Christoffer Martinsson	8bfd416327	Revert to v0.1.192 - fix agent hang issue Some checks failed Build and Release / build-and-release (push) Failing after 1m8s Details v0.1.201	2025-11-28 12:42:24 +01:00
Christoffer Martinsson	85c6c624fb	Revert D-Bus usage, use systemctl commands only All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Remove zbus dependency from agent - Replace D-Bus Connection calls with systemctl show commands - Fix agent hang by eliminating blocking D-Bus operations - get_unit_property now uses systemctl show with property flags - Memory, disk usage, and nginx config queries use systemctl - Simpler, more reliable service monitoring v0.1.200	2025-11-28 12:15:04 +01:00
Christoffer Martinsson	eab3f17428	Fix agent hang by reverting service discovery to systemctl All checks were successful Build and Release / build-and-release (push) Successful in 1m31s Details The D-Bus ListUnits call in discover_services_internal() was causing the agent to hang on startup. Root cause: - D-Bus ListUnits call with complex tuple destructuring hung indefinitely - Agent never completed first collection cycle - No collector output in logs Fix: - Revert discover_services_internal() to use systemctl list-units/list-unit-files - Keep D-Bus-based property queries (WorkingDirectory, MemoryCurrent, ExecStart) - Hybrid approach: systemctl for discovery, D-Bus for individual queries External commands still used: - systemctl list-units, list-unit-files (service discovery) - smartctl (SMART data) - sudo du (directory sizes) - nginx -T (config fallback) Version bump: 0.1.198 → 0.1.199 v0.1.199	2025-11-28 11:57:31 +01:00
Christoffer Martinsson	7ad149bbe4	Replace all systemctl commands with zbus D-Bus API All checks were successful Build and Release / build-and-release (push) Successful in 1m31s Details Complete migration from systemctl subprocess calls to native D-Bus communication: Removed systemctl commands: - systemctl is-active (fallback) - use D-Bus cache from ListUnits - systemctl show --property=LoadState,ActiveState,SubState - use D-Bus cache - systemctl show --property=WorkingDirectory - use D-Bus Properties.Get - systemctl show --property=MemoryCurrent - use D-Bus Properties.Get - systemctl show nginx --property=ExecStart - use D-Bus Properties.Get Implementation details: - Added get_unit_property() helper for D-Bus property access - Made get_nginx_site_metrics() async to support D-Bus calls - Made get_nginx_sites_internal() async - Made discover_nginx_sites() async - Made get_nginx_config_from_systemd() async - Fixed RwLock guard Send issues by using scoped locks Remaining external commands: - smartctl (disk.rs) - No Rust alternative for SMART data - sudo du (systemd.rs) - Directory size measurement - nginx -T (systemd.rs) - Nginx config fallback - timeout hostname (nixos.rs) - Rare fallback only Version bump: 0.1.197 → 0.1.198 v0.1.198	2025-11-28 11:46:28 +01:00
Christoffer Martinsson	b444c88ea0	Replace external commands with native Rust APIs All checks were successful Build and Release / build-and-release (push) Successful in 1m54s Details Significant performance improvements by eliminating subprocess spawning: - Replace 'ip' commands with rtnetlink for network interface discovery - Replace 'docker ps/images' with bollard Docker API client - Replace 'systemctl list-units' with zbus D-Bus for systemd interaction - Replace 'df' with statvfs() syscall for filesystem statistics - Replace 'lsblk' with /proc/mounts parsing Add interval-based caching to collectors: - DiskCollector now respects interval_seconds configuration - SystemdCollector now respects interval_seconds configuration - CpuCollector now respects interval_seconds configuration Remove unused command communication infrastructure: - Remove port 6131 ZMQ command receiver - Clean up unused AgentCommand types Dependencies added: - rtnetlink = "0.14" - netlink-packet-route = "0.19" - bollard = "0.17" - zbus = "4.0" - nix (fs features for statvfs) v0.1.197	2025-11-28 11:27:33 +01:00
Christoffer Martinsson	317cf76bd1	Bump version to v0.1.196 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.196	2025-11-27 23:16:40 +01:00
Christoffer Martinsson	0db1a165b9	Revert "Implement cached collector architecture with configurable timeouts" This reverts commit 2740de9b54e5239acfe9d788f9e6a877f7274331.	2025-11-27 23:12:08 +01:00
Christoffer Martinsson	3c2955376d	Revert "Fix ZMQ sender blocking - move to independent thread with try_read" This reverts commit 01e1f33b66f3f33a448b95be73aabcfdd3a8c6d0.	2025-11-27 23:10:55 +01:00
Christoffer Martinsson	f09ccabc7f	Revert "Fix data duplication in cached collector architecture" This reverts commit 14618c59c61b2f8d731697f01b8388ace825a809.	2025-11-27 23:09:40 +01:00
Christoffer Martinsson	43dd5a901a	Update CLAUDE.md with correct ZMQ sender architecture	2025-11-27 22:59:38 +01:00
Christoffer Martinsson	01e1f33b66	Fix ZMQ sender blocking - move to independent thread with try_read All checks were successful Build and Release / build-and-release (push) Successful in 1m21s Details CRITICAL FIX: The previous cached collector architecture still had ZMQ sending in the main event loop, where it could block waiting for RwLock when collectors were writing. This caused the 3-8 second delays you observed. Changes: - Move ZMQ publisher to dedicated std::thread (ZMQ sockets aren't thread-safe) - Use try_read() instead of read() to avoid blocking on write locks - Send previous data if cache is locked by collector - ZMQ now sends every 2s regardless of collector timing - Remove publisher from ZmqHandler (now only handles commands) Architecture: - Collectors: Independent tokio tasks updating shared cache - ZMQ Sender: Dedicated OS thread with its own publisher socket - Main Loop: Only handles commands and notifications This ensures ZMQ transmission is NEVER blocked by slow collectors. Bump version to v0.1.195 v0.1.195	2025-11-27 22:56:58 +01:00
Christoffer Martinsson	ed6399b914	Bump version to v0.1.194 All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details v0.1.194	2025-11-27 22:46:17 +01:00
Christoffer Martinsson	14618c59c6	Fix data duplication in cached collector architecture Critical bug fix: Collectors were appending to Vecs instead of replacing them, causing duplicate entries with each collection cycle. Fixed by adding .clear() calls before populating: - Memory collector: tmpfs Vec (was showing 11+ duplicates) - Disk collector: drives and pools Vecs - Systemd collector: services Vec - Network collector: Already correct (assigns new Vec) This prevents the exponential growth of duplicate entries in the dashboard UI.	2025-11-27 22:45:44 +01:00
Christoffer Martinsson	2740de9b54	Implement cached collector architecture with configurable timeouts All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details Major architectural refactor to eliminate false "host offline" alerts: - Replace sequential blocking collectors with independent async tasks - Each collector runs at configurable interval and updates shared cache - ZMQ sender reads cache every 1-2s regardless of collector speed - Collector intervals: CPU/Memory (1-10s), Backup/NixOS (30-60s), Disk/Systemd (60-300s) All intervals now configurable via NixOS config: - collectors..interval_seconds (collection frequency per collector) - collectors..command_timeout_seconds (timeout for shell commands) - notifications.check_interval_seconds (status change detection rate) Command timeouts increased from hardcoded 2-3s to configurable 10-30s: - Disk collector: 30s (SMART operations, lsblk) - Systemd collector: 15s (systemctl, docker, du commands) - Network collector: 10s (ip route, ip addr) Benefits: - No false "offline" alerts when slow collectors take >10s - Different update rates for different metric types - Better resource management with longer timeouts - Full NixOS configuration control Bump version to v0.1.193 v0.1.193	2025-11-27 22:37:20 +01:00
Christoffer Martinsson	37f2650200	Document cached collector architecture plan Add architectural plan for separating ZMQ sending from data collection to prevent false 'host offline' alerts caused by slow collectors. Key concepts: - Shared cache (Arc<RwLock<AgentData>>) - Independent async collector tasks with different update rates - ZMQ sender always sends every 1s from cache - Fast collectors (1s), medium (5s), slow (60s) - No blocking regardless of collector speed	2025-11-27 21:49:44 +01:00
Christoffer Martinsson	833010e270	Bump version to v0.1.192 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.192	2025-11-27 18:34:53 +01:00
Christoffer Martinsson	549d9d1c72	Replace whale emoji with ASCII 'D' for performance Emoji rendering in terminals can be very slow, especially when rendered in the hot path (every frame for every docker image). The whale emoji 🐋 was causing significant rendering delays. Temporary change to ASCII 'D' to test if emoji was the performance issue.	2025-11-27 18:34:27 +01:00
Christoffer Martinsson	9b84b70581	Bump version to v0.1.191 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.191	2025-11-27 18:16:49 +01:00
Christoffer Martinsson	92c3ee3f2a	Add Docker whale icon for docker images Docker images now display with distinctive 🐋 whale icon in blue (highlight color) instead of status icons. This provides clear visual identification that these are docker images while not implying operational status.	2025-11-27 18:16:33 +01:00
Christoffer Martinsson	1be55f765d	Bump version to v0.1.190 All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.190	2025-11-27 18:09:49 +01:00
Christoffer Martinsson	2f94a4b853	Add service_type field to separate data from presentation Changes: - Add service_type field to SubServiceData: 'nginx_site', 'container', 'image' - Agent sends pure data without display formatting - Dashboard checks service_type to decide presentation - Docker images now display without status icon (service_type='image') - Remove unused image_size_str from docker images tuple Clean separation: agent provides data, dashboard handles display logic.	2025-11-27 18:09:20 +01:00
Christoffer Martinsson	ff2b43827a	Bump version to v0.1.189 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.189	2025-11-27 17:57:38 +01:00
Christoffer Martinsson	fac0188c6f	Change docker image display format and status Changes: - Rename docker images from 'image_node:18...' to 'I node:18...' for conciseness - Change image status from 'active' to 'inactive' for neutral informational display - Images now show with gray empty circle ○ instead of green filled circle ● Docker images are static artifacts without meaningful operational status, so using inactive status provides neutral gray display that won't trigger alerts or affect service status aggregation.	2025-11-27 17:57:24 +01:00
Christoffer Martinsson	6bb350f016	Bump version to v0.1.188 All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details v0.1.188	2025-11-27 16:39:46 +01:00
Christoffer Martinsson	374b126446	Reduce all command timeouts to 2-3 seconds max With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds. Changed timeouts: - smartctl: 10s → 3s (critical: multiple drives queried sequentially) - du: 5s → 2s - lsblk: 5s → 2s - systemctl list commands: 5s → 3s - systemctl show/is-active: 3s → 2s - docker commands: 5s → 3s - df, ip commands: 3s → 2s Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.	2025-11-27 16:38:54 +01:00
Christoffer Martinsson	76c04633b5	Bump version to v0.1.187 All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.187	2025-11-27 16:34:42 +01:00
Christoffer Martinsson	1e0510be81	Add comprehensive timeouts to all blocking system commands Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission. Changes: - Add run_command_with_timeout() wrapper using tokio for async command execution - Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives) - Apply 5s timeout to du, lsblk, systemctl list commands - Apply 3s timeout to systemctl show/is-active, df, ip commands - Apply 2s timeout to hostname command - Use system 'timeout' command for sync operations where async not needed Critical fixes: - smartctl: Failing drives could block for 30+ seconds per drive - du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds - systemctl/docker: Commands could block indefinitely during system issues With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.	2025-11-27 16:34:08 +01:00
Christoffer Martinsson	9a2df906ea	Add ZMQ communication statistics tracking and display All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details v0.1.186	2025-11-27 16:14:45 +01:00
Christoffer Martinsson	6d6beb207d	Parse Docker image sizes to MB and sort services alphabetically All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details v0.1.185	2025-11-27 15:57:38 +01:00
Christoffer Martinsson	7a68da01f5	Remove debug logging for NVMe SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.184	2025-11-27 15:40:16 +01:00
Christoffer Martinsson	5be67fed64	Add debug logging for NVMe SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.183	2025-11-27 15:00:48 +01:00
Christoffer Martinsson	cac836601b	Add NVMe device type flag for SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.182	2025-11-27 13:34:30 +01:00
Christoffer Martinsson	bd22ce265b	Use direct smartctl with CAP_SYS_RAWIO instead of sudo All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details v0.1.181	2025-11-27 13:22:13 +01:00
Christoffer Martinsson	bbc8b7b1cb	Add info-level logging for SMART data collection debugging All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.180	2025-11-27 13:15:53 +01:00
Christoffer Martinsson	5dd8cadef3	Remove debug logging from Docker collection code All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details v0.1.179	2025-11-27 12:50:20 +01:00
Christoffer Martinsson	fefe30ec51	Remove sudo from docker commands - use docker group membership instead All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details Agent changes: - Changed docker ps and docker images commands to run without sudo - cm-agent user is already in docker group, so sudo is not needed - Fixes "unable to change to root gid: Operation not permitted" error - Systemd security restrictions were blocking sudo gid changes This fixes Docker container and image collection on systems with systemd security hardening enabled. Updated to version 0.1.178 v0.1.178	2025-11-27 12:35:38 +01:00
Christoffer Martinsson	fb40cce748	Add stderr logging for Docker images command failure All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Agent changes: - Log stderr output when docker images command fails - This will show the actual error message (e.g., permission denied, docker not found) - Helps diagnose why docker images collection is failing Updated to version 0.1.177 v0.1.177	2025-11-27 12:28:55 +01:00
Christoffer Martinsson	eaa057b284	Change Docker collection logging from debug to info level All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Agent changes: - Changed debug!() to info!() for Docker collection logs - This allows logs to show with default RUST_LOG=info setting - Added info import to tracing use statement Now logs will be visible in journalctl without needing to change log level: - "Collecting Docker sub-services for service: docker" - "Found X Docker containers" - "Found X Docker images" - "Total Docker sub-services added: X" Updated to version 0.1.176 v0.1.176	2025-11-27 12:18:17 +01:00
Christoffer Martinsson	f23a1b5cec	Add debug logging for Docker container and image collection All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Agent changes: - Added debug logging to Docker images collection function - Log when Docker sub-services are being collected for a service - Log count of containers and images found - Log total sub-services added - Show command failure details instead of silently returning empty vec This will help diagnose why Docker images aren't showing up as sub-services on some hosts. The logs will show if the docker commands are failing or if the collection is working but data isn't being transmitted properly. Updated to version 0.1.175 v0.1.175	2025-11-27 12:04:51 +01:00

1 2 3 4 5 ...

516 Commits