- Remove zbus dependency from agent
- Replace D-Bus Connection calls with systemctl show commands
- Fix agent hang by eliminating blocking D-Bus operations
- get_unit_property now uses systemctl show with property flags
- Memory, disk usage, and nginx config queries use systemctl
- Simpler, more reliable service monitoring
The D-Bus ListUnits call in discover_services_internal() was causing
the agent to hang on startup.
**Root cause:**
- D-Bus ListUnits call with complex tuple destructuring hung indefinitely
- Agent never completed first collection cycle
- No collector output in logs
**Fix:**
- Revert discover_services_internal() to use systemctl list-units/list-unit-files
- Keep D-Bus-based property queries (WorkingDirectory, MemoryCurrent, ExecStart)
- Hybrid approach: systemctl for discovery, D-Bus for individual queries
**External commands still used:**
- systemctl list-units, list-unit-files (service discovery)
- smartctl (SMART data)
- sudo du (directory sizes)
- nginx -T (config fallback)
Version bump: 0.1.198 → 0.1.199
Complete migration from systemctl subprocess calls to native D-Bus communication:
**Removed systemctl commands:**
- systemctl is-active (fallback) - use D-Bus cache from ListUnits
- systemctl show --property=LoadState,ActiveState,SubState - use D-Bus cache
- systemctl show --property=WorkingDirectory - use D-Bus Properties.Get
- systemctl show --property=MemoryCurrent - use D-Bus Properties.Get
- systemctl show nginx --property=ExecStart - use D-Bus Properties.Get
**Implementation details:**
- Added get_unit_property() helper for D-Bus property access
- Made get_nginx_site_metrics() async to support D-Bus calls
- Made get_nginx_sites_internal() async
- Made discover_nginx_sites() async
- Made get_nginx_config_from_systemd() async
- Fixed RwLock guard Send issues by using scoped locks
**Remaining external commands:**
- smartctl (disk.rs) - No Rust alternative for SMART data
- sudo du (systemd.rs) - Directory size measurement
- nginx -T (systemd.rs) - Nginx config fallback
- timeout hostname (nixos.rs) - Rare fallback only
Version bump: 0.1.197 → 0.1.198
CRITICAL FIX: The previous cached collector architecture still had ZMQ sending
in the main event loop, where it could block waiting for RwLock when collectors
were writing. This caused the 3-8 second delays you observed.
Changes:
- Move ZMQ publisher to dedicated std::thread (ZMQ sockets aren't thread-safe)
- Use try_read() instead of read() to avoid blocking on write locks
- Send previous data if cache is locked by collector
- ZMQ now sends every 2s regardless of collector timing
- Remove publisher from ZmqHandler (now only handles commands)
Architecture:
- Collectors: Independent tokio tasks updating shared cache
- ZMQ Sender: Dedicated OS thread with its own publisher socket
- Main Loop: Only handles commands and notifications
This ensures ZMQ transmission is NEVER blocked by slow collectors.
Bump version to v0.1.195
Critical bug fix: Collectors were appending to Vecs instead of replacing them,
causing duplicate entries with each collection cycle.
Fixed by adding .clear() calls before populating:
- Memory collector: tmpfs Vec (was showing 11+ duplicates)
- Disk collector: drives and pools Vecs
- Systemd collector: services Vec
- Network collector: Already correct (assigns new Vec)
This prevents the exponential growth of duplicate entries in the dashboard UI.
Major architectural refactor to eliminate false "host offline" alerts:
- Replace sequential blocking collectors with independent async tasks
- Each collector runs at configurable interval and updates shared cache
- ZMQ sender reads cache every 1-2s regardless of collector speed
- Collector intervals: CPU/Memory (1-10s), Backup/NixOS (30-60s), Disk/Systemd (60-300s)
All intervals now configurable via NixOS config:
- collectors.*.interval_seconds (collection frequency per collector)
- collectors.*.command_timeout_seconds (timeout for shell commands)
- notifications.check_interval_seconds (status change detection rate)
Command timeouts increased from hardcoded 2-3s to configurable 10-30s:
- Disk collector: 30s (SMART operations, lsblk)
- Systemd collector: 15s (systemctl, docker, du commands)
- Network collector: 10s (ip route, ip addr)
Benefits:
- No false "offline" alerts when slow collectors take >10s
- Different update rates for different metric types
- Better resource management with longer timeouts
- Full NixOS configuration control
Bump version to v0.1.193
Add architectural plan for separating ZMQ sending from data collection to prevent false 'host offline' alerts caused by slow collectors.
Key concepts:
- Shared cache (Arc<RwLock<AgentData>>)
- Independent async collector tasks with different update rates
- ZMQ sender always sends every 1s from cache
- Fast collectors (1s), medium (5s), slow (60s)
- No blocking regardless of collector speed
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.