299 Commits

Author SHA1 Message Date
3c2955376d Revert "Fix ZMQ sender blocking - move to independent thread with try_read"
This reverts commit 01e1f33b66f3f33a448b95be73aabcfdd3a8c6d0.
2025-11-27 23:10:55 +01:00
f09ccabc7f Revert "Fix data duplication in cached collector architecture"
This reverts commit 14618c59c61b2f8d731697f01b8388ace825a809.
2025-11-27 23:09:40 +01:00
01e1f33b66 Fix ZMQ sender blocking - move to independent thread with try_read
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
CRITICAL FIX: The previous cached collector architecture still had ZMQ sending
in the main event loop, where it could block waiting for RwLock when collectors
were writing. This caused the 3-8 second delays you observed.

Changes:
- Move ZMQ publisher to dedicated std::thread (ZMQ sockets aren't thread-safe)
- Use try_read() instead of read() to avoid blocking on write locks
- Send previous data if cache is locked by collector
- ZMQ now sends every 2s regardless of collector timing
- Remove publisher from ZmqHandler (now only handles commands)

Architecture:
- Collectors: Independent tokio tasks updating shared cache
- ZMQ Sender: Dedicated OS thread with its own publisher socket
- Main Loop: Only handles commands and notifications

This ensures ZMQ transmission is NEVER blocked by slow collectors.

Bump version to v0.1.195
2025-11-27 22:56:58 +01:00
ed6399b914 Bump version to v0.1.194
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
2025-11-27 22:46:17 +01:00
2740de9b54 Implement cached collector architecture with configurable timeouts
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Major architectural refactor to eliminate false "host offline" alerts:

- Replace sequential blocking collectors with independent async tasks
- Each collector runs at configurable interval and updates shared cache
- ZMQ sender reads cache every 1-2s regardless of collector speed
- Collector intervals: CPU/Memory (1-10s), Backup/NixOS (30-60s), Disk/Systemd (60-300s)

All intervals now configurable via NixOS config:
- collectors.*.interval_seconds (collection frequency per collector)
- collectors.*.command_timeout_seconds (timeout for shell commands)
- notifications.check_interval_seconds (status change detection rate)

Command timeouts increased from hardcoded 2-3s to configurable 10-30s:
- Disk collector: 30s (SMART operations, lsblk)
- Systemd collector: 15s (systemctl, docker, du commands)
- Network collector: 10s (ip route, ip addr)

Benefits:
- No false "offline" alerts when slow collectors take >10s
- Different update rates for different metric types
- Better resource management with longer timeouts
- Full NixOS configuration control

Bump version to v0.1.193
2025-11-27 22:37:20 +01:00
833010e270 Bump version to v0.1.192
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-27 18:34:53 +01:00
549d9d1c72 Replace whale emoji with ASCII 'D' for performance
Emoji rendering in terminals can be very slow, especially when rendered in the hot path (every frame for every docker image). The whale emoji 🐋 was causing significant rendering delays.

Temporary change to ASCII 'D' to test if emoji was the performance issue.
2025-11-27 18:34:27 +01:00
9b84b70581 Bump version to v0.1.191
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-27 18:16:49 +01:00
92c3ee3f2a Add Docker whale icon for docker images
Docker images now display with distinctive 🐋 whale icon in blue (highlight color) instead of status icons. This provides clear visual identification that these are docker images while not implying operational status.
2025-11-27 18:16:33 +01:00
1be55f765d Bump version to v0.1.190
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-27 18:09:49 +01:00
2f94a4b853 Add service_type field to separate data from presentation
Changes:
- Add service_type field to SubServiceData: 'nginx_site', 'container', 'image'
- Agent sends pure data without display formatting
- Dashboard checks service_type to decide presentation
- Docker images now display without status icon (service_type='image')
- Remove unused image_size_str from docker images tuple

Clean separation: agent provides data, dashboard handles display logic.
2025-11-27 18:09:20 +01:00
ff2b43827a Bump version to v0.1.189
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 17:57:38 +01:00
6bb350f016 Bump version to v0.1.188
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-27 16:39:46 +01:00
76c04633b5 Bump version to v0.1.187
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 16:34:42 +01:00
9a2df906ea Add ZMQ communication statistics tracking and display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
2025-11-27 16:14:45 +01:00
6d6beb207d Parse Docker image sizes to MB and sort services alphabetically
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
2025-11-27 15:57:38 +01:00
7a68da01f5 Remove debug logging for NVMe SMART collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-27 15:40:16 +01:00
5be67fed64 Add debug logging for NVMe SMART data collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 15:00:48 +01:00
cac836601b Add NVMe device type flag for SMART data collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 13:34:30 +01:00
bd22ce265b Use direct smartctl with CAP_SYS_RAWIO instead of sudo
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-27 13:22:13 +01:00
bbc8b7b1cb Add info-level logging for SMART data collection debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 13:15:53 +01:00
5dd8cadef3 Remove debug logging from Docker collection code
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 12:50:20 +01:00
fefe30ec51 Remove sudo from docker commands - use docker group membership instead
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Agent changes:
- Changed docker ps and docker images commands to run without sudo
- cm-agent user is already in docker group, so sudo is not needed
- Fixes "unable to change to root gid: Operation not permitted" error
- Systemd security restrictions were blocking sudo gid changes

This fixes Docker container and image collection on systems with
systemd security hardening enabled.

Updated to version 0.1.178
2025-11-27 12:35:38 +01:00
fb40cce748 Add stderr logging for Docker images command failure
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent changes:
- Log stderr output when docker images command fails
- This will show the actual error message (e.g., permission denied, docker not found)
- Helps diagnose why docker images collection is failing

Updated to version 0.1.177
2025-11-27 12:28:55 +01:00
eaa057b284 Change Docker collection logging from debug to info level
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Agent changes:
- Changed debug!() to info!() for Docker collection logs
- This allows logs to show with default RUST_LOG=info setting
- Added info import to tracing use statement

Now logs will be visible in journalctl without needing to change log level:
- "Collecting Docker sub-services for service: docker"
- "Found X Docker containers"
- "Found X Docker images"
- "Total Docker sub-services added: X"

Updated to version 0.1.176
2025-11-27 12:18:17 +01:00
f23a1b5cec Add debug logging for Docker container and image collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Agent changes:
- Added debug logging to Docker images collection function
- Log when Docker sub-services are being collected for a service
- Log count of containers and images found
- Log total sub-services added
- Show command failure details instead of silently returning empty vec

This will help diagnose why Docker images aren't showing up as sub-services
on some hosts. The logs will show if the docker commands are failing or if
the collection is working but data isn't being transmitted properly.

Updated to version 0.1.175
2025-11-27 12:04:51 +01:00
3f98f68b51 Show Docker images as sub-services under docker service
All checks were successful
Build and Release / build-and-release (push) Successful in 1m23s
Agent changes:
- Added get_docker_images() function to list all Docker images
- Use docker images to show stored images with repository:tag and size
- Display images as sub-services under docker service with size in parentheses
- Skip dangling images (<none>:<none>)
- Images shown with active status (always present when listed)

Example display:
● docker                      active     139M     1MB
  ├─ ● docker_gitea           active
  ├─ ○ docker_old-app         inactive
  ├─ ● image_nginx:latest     (142MB)
  ├─ ● image_postgres:15      (379MB)
  └─ ● image_gitea:latest     (256MB)

Updated to version 0.1.174
2025-11-27 11:43:35 +01:00
3d38a7a984 Show all Docker containers as sub-services with active/inactive status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent changes:
- Use docker ps -a to show ALL containers (running and stopped)
- Map container status: Up -> active, Exited/Created -> inactive, other -> failed
- Display Docker containers as sub-services under the docker service
- Each container shown with proper status indicator

Example display:
● docker                 active     139M     1MB
  ├─ ● docker_gitea      active
  ├─ ○ docker_old-app    inactive
  └─ ● docker_immich     active

Updated to version 0.1.173
2025-11-27 10:56:15 +01:00
b0ee0242bd Show all Docker containers as top-level services with active/inactive status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Agent changes:
- Changed docker ps to docker ps -a to show ALL containers (running and stopped)
- Map container status: Up -> active, Exited/Created -> inactive, other -> failed
- Display Docker containers as individual top-level services instead of sub-services
- Each container shown as "docker_{container_name}" in service list

This provides better visibility of all containers and their status directly in the
services panel, making it easier to see stopped containers at a glance.

Updated to version 0.1.172
2025-11-27 10:51:47 +01:00
8f9e9eabca Sort virtual interfaces: VLANs first by ID, then alphabetically
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
Dashboard changes:
- Sort child interfaces under physical NICs with VLANs first (by VLAN ID ascending)
- Non-VLAN virtual interfaces sorted alphabetically by name
- Applied same sorting to both nested children and standalone virtual interfaces

Example output order:
- wan (vlan 5)
- lan (vlan 30)
- isolan (vlan 32)
- seclan (vlan 35)
- br-48df2d79b46f
- docker0
- tailscale0

Updated to version 0.1.171
2025-11-27 10:12:59 +01:00
937f4ad427 Add VLAN ID display and smart parent assignment for virtual interfaces
All checks were successful
Build and Release / build-and-release (push) Successful in 1m43s
Agent changes:
- Parse /proc/net/vlan/config to extract VLAN IDs for interfaces
- Detect primary physical interface via default route
- Auto-assign primary interface as parent for virtual interfaces without explicit parent
- Added vlan_id field to NetworkInterfaceData

Dashboard changes:
- Display VLAN ID in format "interface (vlan X): IP"
- Show VLAN IDs for both nested and standalone virtual interfaces

This ensures virtual interfaces (docker0, tailscale0, etc.) are properly nested
under the primary physical NIC, and VLAN interfaces show their IDs.

Updated to version 0.1.170
2025-11-27 09:52:45 +01:00
8aefab83ae Fix network interface display for VLANs and physical NICs
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Agent changes:
- Filter out ifb* interfaces from network display
- Parse @parent notation for VLAN interfaces (e.g., lan@enp0s31f6)
- Show physical interfaces even without IP addresses
- Only filter virtual interfaces that have no IPs
- Extract parent interface relationships for proper nesting

Dashboard changes:
- Nest VLAN/child interfaces under their physical parent
- Show physical NICs with status icons even when down
- Display child interfaces grouped under parent interface
- Keep standalone virtual interfaces at root level

Updated to version 0.1.169
2025-11-26 23:47:16 +01:00
748a9f3a3b Move Network section below RAM in system widget
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Reordered display sections in system widget:
- Network section now appears after RAM and tmpfs mounts
- Improves logical grouping by placing network info between memory and storage
- Updated to version 0.1.168
2025-11-26 23:23:56 +01:00
5c6b11c794 Filter out network interfaces without IP addresses
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Remove interfaces like ifb0, dummy devices that have no IPs. Only show interfaces with at least one IPv4 or IPv6 address.

Version bump to 0.1.167
2025-11-26 19:19:21 +01:00
9f0aa5f806 Update network display format to match CLAUDE.md specification
All checks were successful
Build and Release / build-and-release (push) Successful in 1m38s
Nest IP addresses under physical interface names. Show physical interfaces with status icon on header line. Virtual interfaces show inline with compressed IPs.

Format:
● eno1:
  ├─ ip: 192.168.30.105
  └─ tailscale0: 100.125.108.16

Version bump to 0.1.166
2025-11-26 19:13:28 +01:00
fc247bd0ad Create dedicated network collector with physical/virtual interface grouping
All checks were successful
Build and Release / build-and-release (push) Successful in 1m43s
Move network collection from NixOS collector to dedicated NetworkCollector. Add link status detection for physical interfaces (up/down). Group interfaces by physical/virtual, show status icons for physical NICs only. Down interfaces show as Inactive instead of Critical.

Version bump to 0.1.165
2025-11-26 19:02:50 +01:00
00fe8c28ab Remove status icon from network interface display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Network interfaces now display without status icons since there's no meaningful status to show. Just shows interface name and IP addresses with subnet compression.

Version bump to 0.1.164
2025-11-26 18:15:01 +01:00
fbbb4a4cfb Add subnet compression for IP address display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Compress IPv4 addresses from same subnet to save space. Shows first IP in full (192.168.30.1) and subsequent IPs in same subnet with only last octet (100, 142).

Version bump to 0.1.163
2025-11-26 18:10:08 +01:00
53e1d8bbce Version bump to 0.1.162
All checks were successful
Build and Release / build-and-release (push) Successful in 1m44s
2025-11-26 18:01:31 +01:00
b7ffeaced5 Add network interface collection and display
Some checks failed
Build and Release / build-and-release (push) Failing after 1m32s
Extend NixOS collector to gather network interfaces using ip command JSON output. Display all interfaces with IPv4 and IPv6 addresses in Network section above CPU metrics. Filters out loopback and link-local addresses.

Version bump to 0.1.161
2025-11-26 17:41:35 +01:00
3858309a5d Fix Docker container detection with sudo permissions
Some checks failed
Build and Release / build-and-release (push) Failing after 1m19s
Update systemd collector to use sudo for docker ps command to resolve
permission issues when cm-agent user lacks docker group membership.
This ensures Docker containers are properly discovered and displayed
as sub-services under the docker service.

Version: 0.1.160
2025-11-25 12:40:27 +01:00
df104bf940 Remove debug prints and unused code
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Remove all debug println statements
- Remove unused service_tracker module
- Remove unused struct fields and methods
- Remove empty placeholder files (cpu.rs, memory.rs, defaults.rs)
- Fix all compiler warnings
- Clean build with zero warnings

Version bump to 0.1.159
2025-11-25 12:19:04 +01:00
d5ce36ee18 Add support for additional SMART attributes
All checks were successful
Build and Release / build-and-release (push) Successful in 1m30s
- Support Temperature_Case attribute for Intel SSDs
- Support Media_Wearout_Indicator attribute for wear percentage
- Parse wear value from column 3 (VALUE) for Media_Wearout_Indicator
- Fixes temperature and wear display for Intel PHLA847000FL512DGN drives
2025-11-25 11:53:08 +01:00
4f80701671 Fix NVMe serial display and improve pool health logic
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
- Fix physical drive serial number display in dashboard
- Improve pool health calculation for arrays with multiple disks
- Support proper tree symbols for multiple parity drives
- Read git commit hash from /var/lib/cm-dashboard/git-commit for Build display
2025-11-25 11:44:20 +01:00
267654fda4 Improve NVMe serial parsing and restructure MergerFS display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m25s
- Fix NVMe serial number parsing to handle whitespace variations
- Move mount point to MergerFS header, remove drive count
- Restructure data drives to same level as parity with Data_1, Data_2 labels
- Remove "Total:" label from pool usage line
- Update parity to use closing tree symbol as last item
2025-11-25 11:28:54 +01:00
dc1105eefe Display disk serial numbers instead of device names
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
- Add serial_number field to DriveData structure
- Collect serial numbers from SMART data for all drives
- Display truncated serial numbers (last 8 chars) in dashboard
- Fix parity drive label to show status icon before "Parity:"
- Fix mount point label styling to match other labels
2025-11-25 11:06:54 +01:00
c9d12793ef Replace device names with serial numbers in MergerFS pool display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Updates disk collector and dashboard to show drive serial numbers
instead of device names (sdX) for MergerFS data/parity drives.
Agent extracts serial numbers from SMART data and dashboard
displays them when available, falling back to device names.
2025-11-25 10:30:37 +01:00
8f80015273 Fix dashboard storage pool label styling
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Replace non-existent Typography::primary() with Typography::secondary() for
MergerFS pool labels following existing UI patterns.
2025-11-25 10:16:26 +01:00
7a95a9d762 Add MergerFS pool display to dashboard matching CLAUDE.md format
All checks were successful
Build and Release / build-and-release (push) Successful in 2m32s
Updated the dashboard system widget to properly display MergerFS storage
pools in the exact format described in CLAUDE.md:

- Pool header showing "mergerfs (2+1):" format
- Total usage line: "├─ Total: ● 63% 2355.2GB/3686.4GB"
- Data Disks section with tree structure
- Individual drive entries: "│  ├─ ● sdb T: 24°C W: 5%"
- Parity drives section: "├─ Parity: ● sdc T: 24°C W: 5%"
- Mount point footer: "└─ Mount: /srv/media"

The dashboard now processes both data_drives and parity_drives arrays from
the agent data correctly and renders the complete MergerFS pool hierarchy
with proper status indicators, temperatures, and wear levels.

Storage display now matches the enhanced tree structure format specified
in documentation with correct Unicode tree characters and spacing.
2025-11-25 09:12:13 +01:00
7b11db990c Restore complete MergerFS and SnapRAID functionality to disk collector
All checks were successful
Build and Release / build-and-release (push) Successful in 1m17s
Updated the disk collector to include all missing functionality from the
previous string-based implementation while working with the new structured
JSON data architecture:

- MergerFS pool discovery from /proc/mounts parsing
- SnapRAID parity drive detection via mount path heuristics
- Drive categorization (data vs parity) based on path analysis
- Numeric mergerfs reference resolution (1:2 -> /mnt/disk paths)
- Pool health calculation based on member drive SMART status
- Complete SMART data integration for temperatures and wear levels
- Proper exclusion of pool member drives from physical drive grouping

The implementation replicates the exact logic from the old code while
adapting to structured AgentData output format. All mergerfs and snapraid
monitoring capabilities are fully restored.
2025-11-25 08:37:32 +01:00