213 Commits

Author SHA1 Message Date
98ed17947d Add nftables WAN open ports as sub-services
All checks were successful
Build and Release / build-and-release (push) Successful in 1m53s
Display open external ports from nftables firewall rules as sub-services
grouped by protocol. Only shows WAN incoming ports by filtering input chain
rules and excluding private network sources.

- Parse nftables ruleset for accept rules with dport in input chain
- Filter out internal network traffic (192.168.x, 10.x, 172.16.x, loopback)
- Extract single ports and port sets from rules
- Group and display as "TCP: 22, 80, 443" and "UDP: 53, 123"
- Update version to v0.1.247
2025-12-04 12:50:10 +01:00
1cb6abf58a Replace Transmission with qBittorrent for torrent statistics
All checks were successful
Build and Release / build-and-release (push) Successful in 1m27s
Update collector to use qBittorrent Web API instead of Transmission RPC.
Query qBittorrent through VPN namespace using existing passwordless sudo
permissions for ip netns exec commands.

- Change service name from transmission-vpn to openvpn-vpn-download
- Replace get_transmission_stats() with get_qbittorrent_stats()
- Use curl through VPN namespace to access qBittorrent API at localhost:8080
- Parse qBittorrent JSON response for state, dlspeed, upspeed
- Count active torrents (downloading, uploading, stalledDL, stalledUP)
- Update version to v0.1.246
2025-12-02 23:31:56 +01:00
477724b4f4 Unify sub-service display formatting for Info status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m39s
Change docker images to use name field for all data instead of metrics,
matching the pattern used by torrent stats and VPN routes. Increase display
width for Status::Info sub-services from 18 to 50 characters to accommodate
longer informational text without truncation.

- Docker images now show: "image-name size: 994.0 MB" in name field
- Torrent stats show: "17 active, ↓ 2.5 MB/s, ↑ 1.2 MB/s" in name field
- Remove fixed-width padding for Info status sub-services
- Update version to v0.1.245
2025-12-02 11:36:27 +01:00
7a3ed17952 Add torrent statistics to transmission-vpn service
All checks were successful
Build and Release / build-and-release (push) Successful in 1m47s
Implement aggregate torrent statistics display for transmission-vpn service
via Transmission RPC API. Shows active torrent count and total download/upload
speeds. Change VPN route label from "ip:" to "route:" for clarity.

- Add get_transmission_stats() method to query Transmission RPC
- Display format: "X active, ↓ MB/s, ↑ MB/s"
- Update version to v0.1.244
2025-12-02 11:12:14 +01:00
7e1962a168 Remove ZMQ debug packet counter from display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m23s
- Remove ZMQ stats display from system widget
- Remove update_zmq_stats method
- Remove zmq_packets_received and zmq_last_packet_age fields
- Clean up display to only show essential information
2025-12-01 19:42:05 +01:00
5bb7d6cf57 Fix CPU model extraction for newer Intel generations
All checks were successful
Build and Release / build-and-release (push) Successful in 1m24s
- Handle 12th/13th Gen Intel format (e.g., "12th Gen Intel(R) Core(TM) i7-12700K")
- Extract full model including suffix (i7-12700K instead of truncated name)
- Simplify pattern matching logic
- Reduce fallback truncation to 15 chars
2025-12-01 19:35:03 +01:00
7a0dc27846 Extract CPU model number only to save display space
All checks were successful
Build and Release / build-and-release (push) Successful in 1m35s
- Parse Intel models (i3/i5/i7/i9-XXXX) from full name
- Parse AMD Ryzen models (Ryzen X XXXX) from full name
- Display format: "i7-9700 (8 cores)" instead of full CPU name
- Reduces CPU section width significantly
2025-12-01 19:23:26 +01:00
5c3ac8b15e Remove Docker image icon and use Status::Info
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Docker images now use Status::Info like VPN IP.
No "D" prefix, no status icon - just name and metrics.
All informational sub-services handled consistently.

Version: v0.1.239
2025-12-01 18:45:28 +01:00
bdfff942f7 Remove VPN external IP logging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Clean up debug logging for production.

Version: v0.1.238
2025-12-01 15:16:34 +01:00
47ab1e387d Add Status::Info for informational sub-services
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent uses Status enum to control display:
- Status::Info: no icon, no status text (VPN IP)
- Other statuses: icon + text (containers, nginx sites)

Dashboard checks status, no hardcoded service_type exceptions.

Version: v0.1.237
2025-12-01 15:11:16 +01:00
966ba27b1e Remove status icon from VPN IP and change to lowercase
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Change display from "IP: X.X.X.X" to "ip: X.X.X.X".
Remove status icon for vpn_route service type.

Version: v0.1.236
2025-12-01 14:40:21 +01:00
6c6c9144bd Add info-level logging for VPN external IP debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Change debug to info/warn logging to diagnose VPN IP query issues.
Use exact service name match instead of contains.

Version: v0.1.235
2025-12-01 14:30:25 +01:00
3fdcec8047 Add sudo for VPN namespace access
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Use sudo to access vpn namespace for external IP query.
Requires corresponding sudo permission in NixOS config.

Version: v0.1.234
2025-12-01 14:14:41 +01:00
1fcaf4a670 Fix VPN namespace name for external IP query
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Use correct namespace name 'vpn' instead of 'openvpn-namespace'.

Version: v0.1.233
2025-12-01 14:09:07 +01:00
885e19f7fd Add external IP display for OpenVPN connections
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Display VPN external IP as sub-service under openvpn-vpn-connection.
Query external IP through openvpn-namespace using curl ifconfig.me.

Version: v0.1.232
2025-12-01 13:49:54 +01:00
a7b69b8ae7 Fix duplicate data by clearing vectors before collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Collectors now clear their target vectors (tmpfs, drives, pools, services)
before populating to prevent duplicates when updating cached AgentData.

- Clear tmpfs list in memory collector
- Clear drives and pools in disk collector
- Clear services in systemd collector
- Bump version to v0.1.231
2025-12-01 13:21:26 +01:00
2d290f40b2 Fix data caching to prevent empty broadcasts
All checks were successful
Build and Release / build-and-release (push) Successful in 1m33s
CRITICAL FIX: Collectors now update cached AgentData instead of
creating new empty data each cycle. This prevents the dashboard
from seeing flashing/disappearing data.

- Add cached_agent_data field to Agent struct
- Update cached data when collectors run
- Always broadcast the full cached data every 2s
- Only individual collectors respect their intervals
- Bump version to v0.1.230
2025-12-01 13:14:53 +01:00
ad1fcaa27b Fix collector interval timing to prevent excessive SMART checks
All checks were successful
Build and Release / build-and-release (push) Successful in 1m46s
Collectors now respect their configured intervals instead of running
every transmission cycle (2s). This prevents disk SMART checks from
running every 2 seconds, which was causing constant disk activity.

- Add TimedCollector wrapper with interval tracking
- Only collect from collectors whose interval has elapsed
- Disk collector now properly runs every 300s instead of every 2s
- Bump version to v0.1.229
2025-12-01 13:03:45 +01:00
60ab4d4f9e Fix service panel column width calculation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Replace hardcoded terminal width thresholds with dynamic calculation
based on actual column requirements. Column visibility now adapts
correctly at 58, 52, 43, and 34 character widths instead of the
previous arbitrary 80, 60, 45 thresholds.

- Add width constants for each column (NAME=23, STATUS=10, etc)
- Calculate cumulative widths dynamically for each layout tier
- Ensure header and data formatting use consistent width values
- Fix service name truncation to respect calculated column width
2025-11-30 12:09:44 +01:00
67034c84b9 Add responsive column visibility to service panel
All checks were successful
Build and Release / build-and-release (push) Successful in 1m47s
Service panel now dynamically shows/hides columns based on terminal width:
- ≥80 chars: All columns (Name, Status, RAM, Uptime, Restarts)
- ≥60 chars: Hide Restarts only
- ≥45 chars: Hide Uptime and Restarts
- <45 chars: Minimal (Name and Status only)

Improves dashboard usability on smaller terminal sizes.
2025-11-30 10:50:08 +01:00
c62c7fa698 Remove debug logging from disk collector
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Removed all debug! statements from disk collector to reduce log noise.

Bump version to v0.1.226
2025-11-30 00:44:38 +01:00
0b1d8c0a73 Fix Data_3 showing as unknown by handling smartctl warning exit codes
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Root cause: sda's temperature exceeded threshold in the past, causing
smartctl to return exit code 32 (warning: "Attributes have been <= threshold
in the past"). The agent checked output.status.success() and rejected the
entire output as failed, even though the data (serial, temperature, health)
was perfectly valid.

Smartctl exit codes are bit flags for informational warnings:
- Exit 0: No warnings
- Exit 32 (bit 5): Attributes were at/below threshold in past
- Exit 64 (bit 6): Error log has entries
- etc.

The output data is valid regardless of these warning flags.

Solution: Parse output as long as it's not empty, ignore exit code.
Only return UNKNOWN if output is actually empty (command truly failed).

Result: Data_3 will now show "ZDZ4VE0B T: 31°C" instead of "? Data_3: sda"

Bump version to v0.1.225
2025-11-30 00:35:19 +01:00
c77aa6eaaa Fix Data_3 timeout by removing sequential SMART during pool detection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m34s
Root cause: SMART data was collected TWICE:
1. Sequential collection during pool detection in get_drive_info_for_path()
   using problematic tokio::task::block_in_place() nesting
2. Parallel collection in get_smart_data_for_drives() (v0.1.223)

The sequential collection happened FIRST during pool detection, causing
sda (Data_3) to timeout due to:
- Bad async nesting: block_in_place() wrapping block_on()
- Sequential execution causing runtime issues
- sda being third in sequence, runtime degraded by then

Solution: Remove SMART collection from get_drive_info_for_path().
Pool drive temperatures are populated later from the parallel SMART
collection which properly uses futures::join_all.

Benefits:
- Eliminates problematic async nesting
- All SMART queries happen once in parallel only
- sda/Data_3 should now show serial (ZDZ4VE0B) and temperature

Bump version to v0.1.224
2025-11-30 00:14:25 +01:00
8a0e68f0e3 Fix Data_3 timeout by parallelizing SMART collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Root cause: SMART data was collected sequentially, one drive at a time.
With 5 drives taking ~500ms each, total collection time was 2.5+ seconds.
When disk collector runs every 1 second, this caused overlapping
collections creating resource contention. The last drive (sda/Data_3)
would timeout due to the drive being accessed by the previous collection.

Solution: Query all drives in parallel using futures::join_all. Now all
drives get their SMART data collected simultaneously with independent
3-second timeouts, eliminating contention and reducing total collection
time from 2.5+ seconds to ~500ms (the slowest single drive).

Benefits:
- All drives complete in ~500ms instead of 2.5+ seconds
- No overlapping collections causing resource contention
- Each drive gets full 3-second timeout window
- sda/Data_3 should now show temperature and serial number

Bump version to v0.1.223
2025-11-29 23:51:43 +01:00
2d653fe9ae Fix empty Storage section by configuring stdio pipes
All checks were successful
Build and Release / build-and-release (push) Successful in 1m15s
Root cause: run_command_with_timeout() was calling cmd.spawn() without
configuring stdout/stderr pipes. This caused command output to go to
journald instead of being captured by wait_with_output(). The disk
collector received empty output and failed silently.

Solution: Configure stdout(Stdio::piped()) and stderr(Stdio::piped())
before spawning commands. This ensures wait_with_output() can properly
capture command output.

Fixes: Empty Storage section, lsblk output appearing in journald
Bump version to v0.1.222
2025-11-29 23:25:17 +01:00
caba78004e Fix empty Storage section by properly aliasing command types
All checks were successful
Build and Release / build-and-release (push) Successful in 2m6s
v0.1.220 broke disk collector by changing the import from
std::process::Command to tokio::process::Command, but lines 193 and
767 explicitly used std::process::Command::new() which silently failed.

Solution: Import both as aliases (TokioCommand/StdCommand) and use
appropriate type for each operation - async commands use TokioCommand
with run_command_with_timeout, sync commands use StdCommand with
system timeout wrapper.

Fixes: Empty Storage section after v0.1.220 deployment
Bump version to v0.1.221
2025-11-29 21:29:33 +01:00
77bf08a978 Fix blocking smartctl commands with proper async/timeout handling
All checks were successful
Build and Release / build-and-release (push) Successful in 2m2s
- Changed disk collector to use tokio::process::Command instead of std::process::Command
- Updated run_command_with_timeout to properly kill processes on timeout
- Fixes issue where smartctl hangs on problematic drives (/dev/sda) freezing entire agent
- Timeout now force-kills hung processes using kill -9, preventing orphaned smartctl processes

This resolves the issue where Data_3 showed unknown status because smartctl was hanging
indefinitely trying to read from a problematic drive, blocking the entire collector.

Bump version to v0.1.220

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 21:09:04 +01:00
40f3ff66d8 Show archive count range to detect inconsistencies
- Display single number if all services have same count
- Display min-max range if counts differ (indicates problem)
2025-11-29 17:59:24 +01:00
620d1f10b6 Show archive count per service instead of total sum 2025-11-29 17:51:01 +01:00
977200fff3 Move archive count to Usage line in backup display 2025-11-29 17:44:05 +01:00
f5913dbd43 Add archive count to backup disk display 2025-11-29 17:41:11 +01:00
faa30a7839 Sort backup repositories and disks for stable display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
- Sort repositories alphabetically before rendering
- Sort backup disks by serial number
- Prevents display jumping between different orderings on updates
- Consistent display order across refreshes

Bump version to v0.1.214

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:15:17 +01:00
afb8d68e03 Implement multi-disk backup support
- Update BackupData structure to support multiple backup disks
- Scan /var/lib/backup/status/ directory for all status files
- Calculate status icons for backup and disk usage
- Aggregate repository status from all disks
- Update dashboard to display all backup disks with per-disk status
- Display repository list with count and aggregated status
2025-11-29 16:44:50 +01:00
5e08b34280 Move C-state name cleaning to agent for smaller JSON
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
- Agent now extracts "C" + digits pattern (C3, C10) using char parsing
- Removes suffixes like "_ACPI", "_MWAIT" at source
- Reduces JSON payload size over ZMQ
- No regex dependency - uses fast char iteration (~1μs overhead)
- Robust fallback to original name if pattern not found
- Dashboard simplified to use clean names directly

Bump version to v0.1.212

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 14:05:55 +01:00
0d8284b69c Clean C-state display to show only CX format
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
- Strip suffixes like "_ACPI" from C-state names
- Display changes from "C3_ACPI:51%" to "C3:51%"
- Cleaner, more concise presentation

Bump version to v0.1.211

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 13:34:01 +01:00
d84690cb3b Move transmission interval to ZMQ config section
All checks were successful
Build and Release / build-and-release (push) Successful in 1m43s
- Changed code to use zmq.transmission_interval_seconds instead of top-level collection_interval_seconds
- Removed collection_interval_seconds from AgentConfig
- Updated validation to check zmq.transmission_interval_seconds
- Improves config organization by grouping all ZMQ settings together

Bump version to v0.1.210

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 13:31:39 +01:00
7c030b33d6 Show top 3 C-states with usage percentages
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
- Changed CpuData.cstate from String to Vec<CStateInfo>
- Added CStateInfo struct with name and percent fields
- Collector calculates percentage for each C-state based on accumulated time
- Sorts and returns top 3 C-states by usage
- Dashboard displays: "C10:79% C8:10% C6:8%"

Provides better visibility into CPU idle state distribution.

Bump version to v0.1.209

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 23:45:46 +01:00
c6817537a8 Replace CPU frequency with C-state monitoring
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
- Changed CpuData.frequency_mhz to CpuData.cstate (String)
- Implemented collect_cstate() to read CPU idle depth from sysfs
- Finds deepest C-state with most accumulated time (C0-C10)
- Updated dashboard to display C-state instead of frequency
- More accurate indicator of CPU activity vs power management

Bump version to v0.1.208

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 23:30:14 +01:00
28cfd5758f Fix service metrics not showing - remove cache check
The service_status_cache from discovery only has active_state with
all detailed metrics set to None. During collection, get_service_status()
was returning cached data instead of fetching fresh systemctl show data.

Now always fetch fresh data to populate memory_bytes, restart_count,
and uptime_seconds properly.
2025-11-28 23:15:51 +01:00
0e01813ff5 Add service metrics from systemctl (memory, uptime, restarts)
Shared:
- Add memory_bytes, restart_count, uptime_seconds to ServiceData

Agent:
- Add new fields to ServiceStatusInfo struct
- Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show
- Calculate uptime from start timestamp
- Parse and populate new fields in ServiceData
- Remove unused load_state and sub_state fields

Dashboard:
- Add memory_bytes, restart_count, uptime_seconds to ServiceInfo
- Update header: Service, Status, RAM, Uptime, ↻ (restarts)
- Format memory as MB/GB
- Format uptime as Xd Xh, Xh Xm, or Xm
- Show restart count with ! prefix if > 0 to indicate instability

All metrics obtained from single systemctl show call - zero overhead.
2025-11-28 23:06:13 +01:00
4d77ffe17e Remove RAM and Disk columns from services widget header
Changed header from 4 columns to 2 columns:
- Before: Service, Status, RAM, Disk
- After: Service, Status

Matches the removal of memory_mb and disk_gb fields.
2025-11-28 22:37:14 +01:00
e3996fdb84 Fix compilation errors from command receiver removal
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
- Remove AgentCommand import from agent.rs
- Remove handle_commands() method
- Remove command handling from main loop
- Remove command_port validation checks
2025-11-28 13:01:36 +01:00
549d9d1c72 Replace whale emoji with ASCII 'D' for performance
Emoji rendering in terminals can be very slow, especially when rendered in the hot path (every frame for every docker image). The whale emoji 🐋 was causing significant rendering delays.

Temporary change to ASCII 'D' to test if emoji was the performance issue.
2025-11-27 18:34:27 +01:00
92c3ee3f2a Add Docker whale icon for docker images
Docker images now display with distinctive 🐋 whale icon in blue (highlight color) instead of status icons. This provides clear visual identification that these are docker images while not implying operational status.
2025-11-27 18:16:33 +01:00
2f94a4b853 Add service_type field to separate data from presentation
Changes:
- Add service_type field to SubServiceData: 'nginx_site', 'container', 'image'
- Agent sends pure data without display formatting
- Dashboard checks service_type to decide presentation
- Docker images now display without status icon (service_type='image')
- Remove unused image_size_str from docker images tuple

Clean separation: agent provides data, dashboard handles display logic.
2025-11-27 18:09:20 +01:00
fac0188c6f Change docker image display format and status
Changes:
- Rename docker images from 'image_node:18...' to 'I node:18...' for conciseness
- Change image status from 'active' to 'inactive' for neutral informational display
- Images now show with gray empty circle ○ instead of green filled circle ●

Docker images are static artifacts without meaningful operational status, so using inactive status provides neutral gray display that won't trigger alerts or affect service status aggregation.
2025-11-27 17:57:24 +01:00
374b126446 Reduce all command timeouts to 2-3 seconds max
With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds.

Changed timeouts:
- smartctl: 10s → 3s (critical: multiple drives queried sequentially)
- du: 5s → 2s
- lsblk: 5s → 2s
- systemctl list commands: 5s → 3s
- systemctl show/is-active: 3s → 2s
- docker commands: 5s → 3s
- df, ip commands: 3s → 2s

Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.
2025-11-27 16:38:54 +01:00
1e0510be81 Add comprehensive timeouts to all blocking system commands
Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission.

Changes:
- Add run_command_with_timeout() wrapper using tokio for async command execution
- Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives)
- Apply 5s timeout to du, lsblk, systemctl list commands
- Apply 3s timeout to systemctl show/is-active, df, ip commands
- Apply 2s timeout to hostname command
- Use system 'timeout' command for sync operations where async not needed

Critical fixes:
- smartctl: Failing drives could block for 30+ seconds per drive
- du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds
- systemctl/docker: Commands could block indefinitely during system issues

With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.
2025-11-27 16:34:08 +01:00
9a2df906ea Add ZMQ communication statistics tracking and display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
2025-11-27 16:14:45 +01:00
6d6beb207d Parse Docker image sizes to MB and sort services alphabetically
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
2025-11-27 15:57:38 +01:00