Compare commits

..

111 Commits

Author SHA1 Message Date
5c3ac8b15e Remove Docker image icon and use Status::Info
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Docker images now use Status::Info like VPN IP.
No "D" prefix, no status icon - just name and metrics.
All informational sub-services handled consistently.

Version: v0.1.239
2025-12-01 18:45:28 +01:00
bdfff942f7 Remove VPN external IP logging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Clean up debug logging for production.

Version: v0.1.238
2025-12-01 15:16:34 +01:00
47ab1e387d Add Status::Info for informational sub-services
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent uses Status enum to control display:
- Status::Info: no icon, no status text (VPN IP)
- Other statuses: icon + text (containers, nginx sites)

Dashboard checks status, no hardcoded service_type exceptions.

Version: v0.1.237
2025-12-01 15:11:16 +01:00
966ba27b1e Remove status icon from VPN IP and change to lowercase
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Change display from "IP: X.X.X.X" to "ip: X.X.X.X".
Remove status icon for vpn_route service type.

Version: v0.1.236
2025-12-01 14:40:21 +01:00
6c6c9144bd Add info-level logging for VPN external IP debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Change debug to info/warn logging to diagnose VPN IP query issues.
Use exact service name match instead of contains.

Version: v0.1.235
2025-12-01 14:30:25 +01:00
3fdcec8047 Add sudo for VPN namespace access
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Use sudo to access vpn namespace for external IP query.
Requires corresponding sudo permission in NixOS config.

Version: v0.1.234
2025-12-01 14:14:41 +01:00
1fcaf4a670 Fix VPN namespace name for external IP query
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Use correct namespace name 'vpn' instead of 'openvpn-namespace'.

Version: v0.1.233
2025-12-01 14:09:07 +01:00
885e19f7fd Add external IP display for OpenVPN connections
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Display VPN external IP as sub-service under openvpn-vpn-connection.
Query external IP through openvpn-namespace using curl ifconfig.me.

Version: v0.1.232
2025-12-01 13:49:54 +01:00
a7b69b8ae7 Fix duplicate data by clearing vectors before collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Collectors now clear their target vectors (tmpfs, drives, pools, services)
before populating to prevent duplicates when updating cached AgentData.

- Clear tmpfs list in memory collector
- Clear drives and pools in disk collector
- Clear services in systemd collector
- Bump version to v0.1.231
2025-12-01 13:21:26 +01:00
2d290f40b2 Fix data caching to prevent empty broadcasts
All checks were successful
Build and Release / build-and-release (push) Successful in 1m33s
CRITICAL FIX: Collectors now update cached AgentData instead of
creating new empty data each cycle. This prevents the dashboard
from seeing flashing/disappearing data.

- Add cached_agent_data field to Agent struct
- Update cached data when collectors run
- Always broadcast the full cached data every 2s
- Only individual collectors respect their intervals
- Bump version to v0.1.230
2025-12-01 13:14:53 +01:00
ad1fcaa27b Fix collector interval timing to prevent excessive SMART checks
All checks were successful
Build and Release / build-and-release (push) Successful in 1m46s
Collectors now respect their configured intervals instead of running
every transmission cycle (2s). This prevents disk SMART checks from
running every 2 seconds, which was causing constant disk activity.

- Add TimedCollector wrapper with interval tracking
- Only collect from collectors whose interval has elapsed
- Disk collector now properly runs every 300s instead of every 2s
- Bump version to v0.1.229
2025-12-01 13:03:45 +01:00
60ab4d4f9e Fix service panel column width calculation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Replace hardcoded terminal width thresholds with dynamic calculation
based on actual column requirements. Column visibility now adapts
correctly at 58, 52, 43, and 34 character widths instead of the
previous arbitrary 80, 60, 45 thresholds.

- Add width constants for each column (NAME=23, STATUS=10, etc)
- Calculate cumulative widths dynamically for each layout tier
- Ensure header and data formatting use consistent width values
- Fix service name truncation to respect calculated column width
2025-11-30 12:09:44 +01:00
67034c84b9 Add responsive column visibility to service panel
All checks were successful
Build and Release / build-and-release (push) Successful in 1m47s
Service panel now dynamically shows/hides columns based on terminal width:
- ≥80 chars: All columns (Name, Status, RAM, Uptime, Restarts)
- ≥60 chars: Hide Restarts only
- ≥45 chars: Hide Uptime and Restarts
- <45 chars: Minimal (Name and Status only)

Improves dashboard usability on smaller terminal sizes.
2025-11-30 10:50:08 +01:00
c62c7fa698 Remove debug logging from disk collector
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Removed all debug! statements from disk collector to reduce log noise.

Bump version to v0.1.226
2025-11-30 00:44:38 +01:00
0b1d8c0a73 Fix Data_3 showing as unknown by handling smartctl warning exit codes
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Root cause: sda's temperature exceeded threshold in the past, causing
smartctl to return exit code 32 (warning: "Attributes have been <= threshold
in the past"). The agent checked output.status.success() and rejected the
entire output as failed, even though the data (serial, temperature, health)
was perfectly valid.

Smartctl exit codes are bit flags for informational warnings:
- Exit 0: No warnings
- Exit 32 (bit 5): Attributes were at/below threshold in past
- Exit 64 (bit 6): Error log has entries
- etc.

The output data is valid regardless of these warning flags.

Solution: Parse output as long as it's not empty, ignore exit code.
Only return UNKNOWN if output is actually empty (command truly failed).

Result: Data_3 will now show "ZDZ4VE0B T: 31°C" instead of "? Data_3: sda"

Bump version to v0.1.225
2025-11-30 00:35:19 +01:00
c77aa6eaaa Fix Data_3 timeout by removing sequential SMART during pool detection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m34s
Root cause: SMART data was collected TWICE:
1. Sequential collection during pool detection in get_drive_info_for_path()
   using problematic tokio::task::block_in_place() nesting
2. Parallel collection in get_smart_data_for_drives() (v0.1.223)

The sequential collection happened FIRST during pool detection, causing
sda (Data_3) to timeout due to:
- Bad async nesting: block_in_place() wrapping block_on()
- Sequential execution causing runtime issues
- sda being third in sequence, runtime degraded by then

Solution: Remove SMART collection from get_drive_info_for_path().
Pool drive temperatures are populated later from the parallel SMART
collection which properly uses futures::join_all.

Benefits:
- Eliminates problematic async nesting
- All SMART queries happen once in parallel only
- sda/Data_3 should now show serial (ZDZ4VE0B) and temperature

Bump version to v0.1.224
2025-11-30 00:14:25 +01:00
8a0e68f0e3 Fix Data_3 timeout by parallelizing SMART collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Root cause: SMART data was collected sequentially, one drive at a time.
With 5 drives taking ~500ms each, total collection time was 2.5+ seconds.
When disk collector runs every 1 second, this caused overlapping
collections creating resource contention. The last drive (sda/Data_3)
would timeout due to the drive being accessed by the previous collection.

Solution: Query all drives in parallel using futures::join_all. Now all
drives get their SMART data collected simultaneously with independent
3-second timeouts, eliminating contention and reducing total collection
time from 2.5+ seconds to ~500ms (the slowest single drive).

Benefits:
- All drives complete in ~500ms instead of 2.5+ seconds
- No overlapping collections causing resource contention
- Each drive gets full 3-second timeout window
- sda/Data_3 should now show temperature and serial number

Bump version to v0.1.223
2025-11-29 23:51:43 +01:00
2d653fe9ae Fix empty Storage section by configuring stdio pipes
All checks were successful
Build and Release / build-and-release (push) Successful in 1m15s
Root cause: run_command_with_timeout() was calling cmd.spawn() without
configuring stdout/stderr pipes. This caused command output to go to
journald instead of being captured by wait_with_output(). The disk
collector received empty output and failed silently.

Solution: Configure stdout(Stdio::piped()) and stderr(Stdio::piped())
before spawning commands. This ensures wait_with_output() can properly
capture command output.

Fixes: Empty Storage section, lsblk output appearing in journald
Bump version to v0.1.222
2025-11-29 23:25:17 +01:00
caba78004e Fix empty Storage section by properly aliasing command types
All checks were successful
Build and Release / build-and-release (push) Successful in 2m6s
v0.1.220 broke disk collector by changing the import from
std::process::Command to tokio::process::Command, but lines 193 and
767 explicitly used std::process::Command::new() which silently failed.

Solution: Import both as aliases (TokioCommand/StdCommand) and use
appropriate type for each operation - async commands use TokioCommand
with run_command_with_timeout, sync commands use StdCommand with
system timeout wrapper.

Fixes: Empty Storage section after v0.1.220 deployment
Bump version to v0.1.221
2025-11-29 21:29:33 +01:00
77bf08a978 Fix blocking smartctl commands with proper async/timeout handling
All checks were successful
Build and Release / build-and-release (push) Successful in 2m2s
- Changed disk collector to use tokio::process::Command instead of std::process::Command
- Updated run_command_with_timeout to properly kill processes on timeout
- Fixes issue where smartctl hangs on problematic drives (/dev/sda) freezing entire agent
- Timeout now force-kills hung processes using kill -9, preventing orphaned smartctl processes

This resolves the issue where Data_3 showed unknown status because smartctl was hanging
indefinitely trying to read from a problematic drive, blocking the entire collector.

Bump version to v0.1.220

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 21:09:04 +01:00
929870f8b6 Bump version to v0.1.219
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
2025-11-29 18:35:14 +01:00
7aae852b7b Bump version to v0.1.218
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-29 17:59:33 +01:00
40f3ff66d8 Show archive count range to detect inconsistencies
- Display single number if all services have same count
- Display min-max range if counts differ (indicates problem)
2025-11-29 17:59:24 +01:00
1c1beddb55 Bump version to v0.1.217
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
2025-11-29 17:51:13 +01:00
620d1f10b6 Show archive count per service instead of total sum 2025-11-29 17:51:01 +01:00
a0d571a40e Bump version to v0.1.216
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-29 17:44:12 +01:00
977200fff3 Move archive count to Usage line in backup display 2025-11-29 17:44:05 +01:00
d692de5f83 Bump version to v0.1.215
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
2025-11-29 17:41:49 +01:00
f5913dbd43 Add archive count to backup disk display 2025-11-29 17:41:11 +01:00
faa30a7839 Sort backup repositories and disks for stable display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
- Sort repositories alphabetically before rendering
- Sort backup disks by serial number
- Prevents display jumping between different orderings on updates
- Consistent display order across refreshes

Bump version to v0.1.214

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:15:17 +01:00
6e4a42799f Bump version to v0.1.213
All checks were successful
Build and Release / build-and-release (push) Successful in 1m22s
2025-11-29 16:46:16 +01:00
afb8d68e03 Implement multi-disk backup support
- Update BackupData structure to support multiple backup disks
- Scan /var/lib/backup/status/ directory for all status files
- Calculate status icons for backup and disk usage
- Aggregate repository status from all disks
- Update dashboard to display all backup disks with per-disk status
- Display repository list with count and aggregated status
2025-11-29 16:44:50 +01:00
5e08b34280 Move C-state name cleaning to agent for smaller JSON
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
- Agent now extracts "C" + digits pattern (C3, C10) using char parsing
- Removes suffixes like "_ACPI", "_MWAIT" at source
- Reduces JSON payload size over ZMQ
- No regex dependency - uses fast char iteration (~1μs overhead)
- Robust fallback to original name if pattern not found
- Dashboard simplified to use clean names directly

Bump version to v0.1.212

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 14:05:55 +01:00
0d8284b69c Clean C-state display to show only CX format
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
- Strip suffixes like "_ACPI" from C-state names
- Display changes from "C3_ACPI:51%" to "C3:51%"
- Cleaner, more concise presentation

Bump version to v0.1.211

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 13:34:01 +01:00
d84690cb3b Move transmission interval to ZMQ config section
All checks were successful
Build and Release / build-and-release (push) Successful in 1m43s
- Changed code to use zmq.transmission_interval_seconds instead of top-level collection_interval_seconds
- Removed collection_interval_seconds from AgentConfig
- Updated validation to check zmq.transmission_interval_seconds
- Improves config organization by grouping all ZMQ settings together

Bump version to v0.1.210

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 13:31:39 +01:00
7c030b33d6 Show top 3 C-states with usage percentages
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
- Changed CpuData.cstate from String to Vec<CStateInfo>
- Added CStateInfo struct with name and percent fields
- Collector calculates percentage for each C-state based on accumulated time
- Sorts and returns top 3 C-states by usage
- Dashboard displays: "C10:79% C8:10% C6:8%"

Provides better visibility into CPU idle state distribution.

Bump version to v0.1.209

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 23:45:46 +01:00
c6817537a8 Replace CPU frequency with C-state monitoring
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
- Changed CpuData.frequency_mhz to CpuData.cstate (String)
- Implemented collect_cstate() to read CPU idle depth from sysfs
- Finds deepest C-state with most accumulated time (C0-C10)
- Updated dashboard to display C-state instead of frequency
- More accurate indicator of CPU activity vs power management

Bump version to v0.1.208

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 23:30:14 +01:00
2189d34b16 Bump version to v0.1.207
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-28 23:16:33 +01:00
28cfd5758f Fix service metrics not showing - remove cache check
The service_status_cache from discovery only has active_state with
all detailed metrics set to None. During collection, get_service_status()
was returning cached data instead of fetching fresh systemctl show data.

Now always fetch fresh data to populate memory_bytes, restart_count,
and uptime_seconds properly.
2025-11-28 23:15:51 +01:00
5deb8cf8d8 Bump version to v0.1.206
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
2025-11-28 23:07:20 +01:00
0e01813ff5 Add service metrics from systemctl (memory, uptime, restarts)
Shared:
- Add memory_bytes, restart_count, uptime_seconds to ServiceData

Agent:
- Add new fields to ServiceStatusInfo struct
- Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show
- Calculate uptime from start timestamp
- Parse and populate new fields in ServiceData
- Remove unused load_state and sub_state fields

Dashboard:
- Add memory_bytes, restart_count, uptime_seconds to ServiceInfo
- Update header: Service, Status, RAM, Uptime, ↻ (restarts)
- Format memory as MB/GB
- Format uptime as Xd Xh, Xh Xm, or Xm
- Show restart count with ! prefix if > 0 to indicate instability

All metrics obtained from single systemctl show call - zero overhead.
2025-11-28 23:06:13 +01:00
c3c9507a42 Bump version to v0.1.205
All checks were successful
Build and Release / build-and-release (push) Successful in 1m22s
2025-11-28 22:37:28 +01:00
4d77ffe17e Remove RAM and Disk columns from services widget header
Changed header from 4 columns to 2 columns:
- Before: Service, Status, RAM, Disk
- After: Service, Status

Matches the removal of memory_mb and disk_gb fields.
2025-11-28 22:37:14 +01:00
14f74b4cac Bump version to v0.1.204
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
2025-11-28 14:33:19 +01:00
67b686f8c7 Remove RAM and disk collection for services
Complete removal of service resource metrics:

Agent:
- Remove memory_mb and disk_gb fields from ServiceData struct
- Remove get_service_memory_usage() method
- Remove get_service_disk_usage() method
- Remove get_directory_size() method
- Remove unused warn import

Dashboard:
- Remove memory_mb and disk_gb from ServiceInfo struct
- Remove memory/disk display from format_parent_service_line
- Remove memory/disk parsing in legacy metric path
- Remove unused format_disk_size() function

Service resource metrics were slow, unreliable, and never worked
properly since structured data migration. Will be handled differently
in the future.
2025-11-28 14:25:12 +01:00
e3996fdb84 Fix compilation errors from command receiver removal
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
- Remove AgentCommand import from agent.rs
- Remove handle_commands() method
- Remove command handling from main loop
- Remove command_port validation checks
2025-11-28 13:01:36 +01:00
f94ca60e69 Bump version to v0.1.203
Some checks failed
Build and Release / build-and-release (push) Failing after 1m36s
2025-11-28 12:53:56 +01:00
c19ff56df8 Remove unused ZMQ command receiver (port 6131)
Service control migrated to SSH, command receiver no longer needed.
- Remove command_receiver Socket from ZmqHandler
- Remove try_receive_command method
- Remove AgentCommand enum
- Remove command_port from ZmqConfig
2025-11-28 12:52:43 +01:00
fe2f604703 Bump version to v0.1.202
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-28 12:45:25 +01:00
8bfd416327 Revert to v0.1.192 - fix agent hang issue
Some checks failed
Build and Release / build-and-release (push) Failing after 1m8s
2025-11-28 12:42:24 +01:00
85c6c624fb Revert D-Bus usage, use systemctl commands only
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
- Remove zbus dependency from agent
- Replace D-Bus Connection calls with systemctl show commands
- Fix agent hang by eliminating blocking D-Bus operations
- get_unit_property now uses systemctl show with property flags
- Memory, disk usage, and nginx config queries use systemctl
- Simpler, more reliable service monitoring
2025-11-28 12:15:04 +01:00
eab3f17428 Fix agent hang by reverting service discovery to systemctl
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
The D-Bus ListUnits call in discover_services_internal() was causing
the agent to hang on startup.

**Root cause:**
- D-Bus ListUnits call with complex tuple destructuring hung indefinitely
- Agent never completed first collection cycle
- No collector output in logs

**Fix:**
- Revert discover_services_internal() to use systemctl list-units/list-unit-files
- Keep D-Bus-based property queries (WorkingDirectory, MemoryCurrent, ExecStart)
- Hybrid approach: systemctl for discovery, D-Bus for individual queries

**External commands still used:**
- systemctl list-units, list-unit-files (service discovery)
- smartctl (SMART data)
- sudo du (directory sizes)
- nginx -T (config fallback)

Version bump: 0.1.198 → 0.1.199
2025-11-28 11:57:31 +01:00
7ad149bbe4 Replace all systemctl commands with zbus D-Bus API
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Complete migration from systemctl subprocess calls to native D-Bus communication:

**Removed systemctl commands:**
- systemctl is-active (fallback) - use D-Bus cache from ListUnits
- systemctl show --property=LoadState,ActiveState,SubState - use D-Bus cache
- systemctl show --property=WorkingDirectory - use D-Bus Properties.Get
- systemctl show --property=MemoryCurrent - use D-Bus Properties.Get
- systemctl show nginx --property=ExecStart - use D-Bus Properties.Get

**Implementation details:**
- Added get_unit_property() helper for D-Bus property access
- Made get_nginx_site_metrics() async to support D-Bus calls
- Made get_nginx_sites_internal() async
- Made discover_nginx_sites() async
- Made get_nginx_config_from_systemd() async
- Fixed RwLock guard Send issues by using scoped locks

**Remaining external commands:**
- smartctl (disk.rs) - No Rust alternative for SMART data
- sudo du (systemd.rs) - Directory size measurement
- nginx -T (systemd.rs) - Nginx config fallback
- timeout hostname (nixos.rs) - Rare fallback only

Version bump: 0.1.197 → 0.1.198
2025-11-28 11:46:28 +01:00
b444c88ea0 Replace external commands with native Rust APIs
All checks were successful
Build and Release / build-and-release (push) Successful in 1m54s
Significant performance improvements by eliminating subprocess spawning:

- Replace 'ip' commands with rtnetlink for network interface discovery
- Replace 'docker ps/images' with bollard Docker API client
- Replace 'systemctl list-units' with zbus D-Bus for systemd interaction
- Replace 'df' with statvfs() syscall for filesystem statistics
- Replace 'lsblk' with /proc/mounts parsing

Add interval-based caching to collectors:
- DiskCollector now respects interval_seconds configuration
- SystemdCollector now respects interval_seconds configuration
- CpuCollector now respects interval_seconds configuration

Remove unused command communication infrastructure:
- Remove port 6131 ZMQ command receiver
- Clean up unused AgentCommand types

Dependencies added:
- rtnetlink = "0.14"
- netlink-packet-route = "0.19"
- bollard = "0.17"
- zbus = "4.0"
- nix (fs features for statvfs)
2025-11-28 11:27:33 +01:00
317cf76bd1 Bump version to v0.1.196
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 23:16:40 +01:00
0db1a165b9 Revert "Implement cached collector architecture with configurable timeouts"
This reverts commit 2740de9b54.
2025-11-27 23:12:08 +01:00
3c2955376d Revert "Fix ZMQ sender blocking - move to independent thread with try_read"
This reverts commit 01e1f33b66.
2025-11-27 23:10:55 +01:00
f09ccabc7f Revert "Fix data duplication in cached collector architecture"
This reverts commit 14618c59c6.
2025-11-27 23:09:40 +01:00
43dd5a901a Update CLAUDE.md with correct ZMQ sender architecture 2025-11-27 22:59:38 +01:00
01e1f33b66 Fix ZMQ sender blocking - move to independent thread with try_read
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
CRITICAL FIX: The previous cached collector architecture still had ZMQ sending
in the main event loop, where it could block waiting for RwLock when collectors
were writing. This caused the 3-8 second delays you observed.

Changes:
- Move ZMQ publisher to dedicated std::thread (ZMQ sockets aren't thread-safe)
- Use try_read() instead of read() to avoid blocking on write locks
- Send previous data if cache is locked by collector
- ZMQ now sends every 2s regardless of collector timing
- Remove publisher from ZmqHandler (now only handles commands)

Architecture:
- Collectors: Independent tokio tasks updating shared cache
- ZMQ Sender: Dedicated OS thread with its own publisher socket
- Main Loop: Only handles commands and notifications

This ensures ZMQ transmission is NEVER blocked by slow collectors.

Bump version to v0.1.195
2025-11-27 22:56:58 +01:00
ed6399b914 Bump version to v0.1.194
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
2025-11-27 22:46:17 +01:00
14618c59c6 Fix data duplication in cached collector architecture
Critical bug fix: Collectors were appending to Vecs instead of replacing them,
causing duplicate entries with each collection cycle.

Fixed by adding .clear() calls before populating:
- Memory collector: tmpfs Vec (was showing 11+ duplicates)
- Disk collector: drives and pools Vecs
- Systemd collector: services Vec
- Network collector: Already correct (assigns new Vec)

This prevents the exponential growth of duplicate entries in the dashboard UI.
2025-11-27 22:45:44 +01:00
2740de9b54 Implement cached collector architecture with configurable timeouts
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Major architectural refactor to eliminate false "host offline" alerts:

- Replace sequential blocking collectors with independent async tasks
- Each collector runs at configurable interval and updates shared cache
- ZMQ sender reads cache every 1-2s regardless of collector speed
- Collector intervals: CPU/Memory (1-10s), Backup/NixOS (30-60s), Disk/Systemd (60-300s)

All intervals now configurable via NixOS config:
- collectors.*.interval_seconds (collection frequency per collector)
- collectors.*.command_timeout_seconds (timeout for shell commands)
- notifications.check_interval_seconds (status change detection rate)

Command timeouts increased from hardcoded 2-3s to configurable 10-30s:
- Disk collector: 30s (SMART operations, lsblk)
- Systemd collector: 15s (systemctl, docker, du commands)
- Network collector: 10s (ip route, ip addr)

Benefits:
- No false "offline" alerts when slow collectors take >10s
- Different update rates for different metric types
- Better resource management with longer timeouts
- Full NixOS configuration control

Bump version to v0.1.193
2025-11-27 22:37:20 +01:00
37f2650200 Document cached collector architecture plan
Add architectural plan for separating ZMQ sending from data collection to prevent false 'host offline' alerts caused by slow collectors.

Key concepts:
- Shared cache (Arc<RwLock<AgentData>>)
- Independent async collector tasks with different update rates
- ZMQ sender always sends every 1s from cache
- Fast collectors (1s), medium (5s), slow (60s)
- No blocking regardless of collector speed
2025-11-27 21:49:44 +01:00
833010e270 Bump version to v0.1.192
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-27 18:34:53 +01:00
549d9d1c72 Replace whale emoji with ASCII 'D' for performance
Emoji rendering in terminals can be very slow, especially when rendered in the hot path (every frame for every docker image). The whale emoji 🐋 was causing significant rendering delays.

Temporary change to ASCII 'D' to test if emoji was the performance issue.
2025-11-27 18:34:27 +01:00
9b84b70581 Bump version to v0.1.191
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-27 18:16:49 +01:00
92c3ee3f2a Add Docker whale icon for docker images
Docker images now display with distinctive 🐋 whale icon in blue (highlight color) instead of status icons. This provides clear visual identification that these are docker images while not implying operational status.
2025-11-27 18:16:33 +01:00
1be55f765d Bump version to v0.1.190
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-27 18:09:49 +01:00
2f94a4b853 Add service_type field to separate data from presentation
Changes:
- Add service_type field to SubServiceData: 'nginx_site', 'container', 'image'
- Agent sends pure data without display formatting
- Dashboard checks service_type to decide presentation
- Docker images now display without status icon (service_type='image')
- Remove unused image_size_str from docker images tuple

Clean separation: agent provides data, dashboard handles display logic.
2025-11-27 18:09:20 +01:00
ff2b43827a Bump version to v0.1.189
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 17:57:38 +01:00
fac0188c6f Change docker image display format and status
Changes:
- Rename docker images from 'image_node:18...' to 'I node:18...' for conciseness
- Change image status from 'active' to 'inactive' for neutral informational display
- Images now show with gray empty circle ○ instead of green filled circle ●

Docker images are static artifacts without meaningful operational status, so using inactive status provides neutral gray display that won't trigger alerts or affect service status aggregation.
2025-11-27 17:57:24 +01:00
6bb350f016 Bump version to v0.1.188
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-27 16:39:46 +01:00
374b126446 Reduce all command timeouts to 2-3 seconds max
With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds.

Changed timeouts:
- smartctl: 10s → 3s (critical: multiple drives queried sequentially)
- du: 5s → 2s
- lsblk: 5s → 2s
- systemctl list commands: 5s → 3s
- systemctl show/is-active: 3s → 2s
- docker commands: 5s → 3s
- df, ip commands: 3s → 2s

Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.
2025-11-27 16:38:54 +01:00
76c04633b5 Bump version to v0.1.187
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 16:34:42 +01:00
1e0510be81 Add comprehensive timeouts to all blocking system commands
Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission.

Changes:
- Add run_command_with_timeout() wrapper using tokio for async command execution
- Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives)
- Apply 5s timeout to du, lsblk, systemctl list commands
- Apply 3s timeout to systemctl show/is-active, df, ip commands
- Apply 2s timeout to hostname command
- Use system 'timeout' command for sync operations where async not needed

Critical fixes:
- smartctl: Failing drives could block for 30+ seconds per drive
- du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds
- systemctl/docker: Commands could block indefinitely during system issues

With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.
2025-11-27 16:34:08 +01:00
9a2df906ea Add ZMQ communication statistics tracking and display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
2025-11-27 16:14:45 +01:00
6d6beb207d Parse Docker image sizes to MB and sort services alphabetically
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
2025-11-27 15:57:38 +01:00
7a68da01f5 Remove debug logging for NVMe SMART collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-27 15:40:16 +01:00
5be67fed64 Add debug logging for NVMe SMART data collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 15:00:48 +01:00
cac836601b Add NVMe device type flag for SMART data collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 13:34:30 +01:00
bd22ce265b Use direct smartctl with CAP_SYS_RAWIO instead of sudo
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
2025-11-27 13:22:13 +01:00
bbc8b7b1cb Add info-level logging for SMART data collection debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 13:15:53 +01:00
5dd8cadef3 Remove debug logging from Docker collection code
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 12:50:20 +01:00
fefe30ec51 Remove sudo from docker commands - use docker group membership instead
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Agent changes:
- Changed docker ps and docker images commands to run without sudo
- cm-agent user is already in docker group, so sudo is not needed
- Fixes "unable to change to root gid: Operation not permitted" error
- Systemd security restrictions were blocking sudo gid changes

This fixes Docker container and image collection on systems with
systemd security hardening enabled.

Updated to version 0.1.178
2025-11-27 12:35:38 +01:00
fb40cce748 Add stderr logging for Docker images command failure
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent changes:
- Log stderr output when docker images command fails
- This will show the actual error message (e.g., permission denied, docker not found)
- Helps diagnose why docker images collection is failing

Updated to version 0.1.177
2025-11-27 12:28:55 +01:00
eaa057b284 Change Docker collection logging from debug to info level
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Agent changes:
- Changed debug!() to info!() for Docker collection logs
- This allows logs to show with default RUST_LOG=info setting
- Added info import to tracing use statement

Now logs will be visible in journalctl without needing to change log level:
- "Collecting Docker sub-services for service: docker"
- "Found X Docker containers"
- "Found X Docker images"
- "Total Docker sub-services added: X"

Updated to version 0.1.176
2025-11-27 12:18:17 +01:00
f23a1b5cec Add debug logging for Docker container and image collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Agent changes:
- Added debug logging to Docker images collection function
- Log when Docker sub-services are being collected for a service
- Log count of containers and images found
- Log total sub-services added
- Show command failure details instead of silently returning empty vec

This will help diagnose why Docker images aren't showing up as sub-services
on some hosts. The logs will show if the docker commands are failing or if
the collection is working but data isn't being transmitted properly.

Updated to version 0.1.175
2025-11-27 12:04:51 +01:00
3f98f68b51 Show Docker images as sub-services under docker service
All checks were successful
Build and Release / build-and-release (push) Successful in 1m23s
Agent changes:
- Added get_docker_images() function to list all Docker images
- Use docker images to show stored images with repository:tag and size
- Display images as sub-services under docker service with size in parentheses
- Skip dangling images (<none>:<none>)
- Images shown with active status (always present when listed)

Example display:
● docker                      active     139M     1MB
  ├─ ● docker_gitea           active
  ├─ ○ docker_old-app         inactive
  ├─ ● image_nginx:latest     (142MB)
  ├─ ● image_postgres:15      (379MB)
  └─ ● image_gitea:latest     (256MB)

Updated to version 0.1.174
2025-11-27 11:43:35 +01:00
3d38a7a984 Show all Docker containers as sub-services with active/inactive status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent changes:
- Use docker ps -a to show ALL containers (running and stopped)
- Map container status: Up -> active, Exited/Created -> inactive, other -> failed
- Display Docker containers as sub-services under the docker service
- Each container shown with proper status indicator

Example display:
● docker                 active     139M     1MB
  ├─ ● docker_gitea      active
  ├─ ○ docker_old-app    inactive
  └─ ● docker_immich     active

Updated to version 0.1.173
2025-11-27 10:56:15 +01:00
b0ee0242bd Show all Docker containers as top-level services with active/inactive status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Agent changes:
- Changed docker ps to docker ps -a to show ALL containers (running and stopped)
- Map container status: Up -> active, Exited/Created -> inactive, other -> failed
- Display Docker containers as individual top-level services instead of sub-services
- Each container shown as "docker_{container_name}" in service list

This provides better visibility of all containers and their status directly in the
services panel, making it easier to see stopped containers at a glance.

Updated to version 0.1.172
2025-11-27 10:51:47 +01:00
8f9e9eabca Sort virtual interfaces: VLANs first by ID, then alphabetically
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
Dashboard changes:
- Sort child interfaces under physical NICs with VLANs first (by VLAN ID ascending)
- Non-VLAN virtual interfaces sorted alphabetically by name
- Applied same sorting to both nested children and standalone virtual interfaces

Example output order:
- wan (vlan 5)
- lan (vlan 30)
- isolan (vlan 32)
- seclan (vlan 35)
- br-48df2d79b46f
- docker0
- tailscale0

Updated to version 0.1.171
2025-11-27 10:12:59 +01:00
937f4ad427 Add VLAN ID display and smart parent assignment for virtual interfaces
All checks were successful
Build and Release / build-and-release (push) Successful in 1m43s
Agent changes:
- Parse /proc/net/vlan/config to extract VLAN IDs for interfaces
- Detect primary physical interface via default route
- Auto-assign primary interface as parent for virtual interfaces without explicit parent
- Added vlan_id field to NetworkInterfaceData

Dashboard changes:
- Display VLAN ID in format "interface (vlan X): IP"
- Show VLAN IDs for both nested and standalone virtual interfaces

This ensures virtual interfaces (docker0, tailscale0, etc.) are properly nested
under the primary physical NIC, and VLAN interfaces show their IDs.

Updated to version 0.1.170
2025-11-27 09:52:45 +01:00
8aefab83ae Fix network interface display for VLANs and physical NICs
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Agent changes:
- Filter out ifb* interfaces from network display
- Parse @parent notation for VLAN interfaces (e.g., lan@enp0s31f6)
- Show physical interfaces even without IP addresses
- Only filter virtual interfaces that have no IPs
- Extract parent interface relationships for proper nesting

Dashboard changes:
- Nest VLAN/child interfaces under their physical parent
- Show physical NICs with status icons even when down
- Display child interfaces grouped under parent interface
- Keep standalone virtual interfaces at root level

Updated to version 0.1.169
2025-11-26 23:47:16 +01:00
748a9f3a3b Move Network section below RAM in system widget
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
Reordered display sections in system widget:
- Network section now appears after RAM and tmpfs mounts
- Improves logical grouping by placing network info between memory and storage
- Updated to version 0.1.168
2025-11-26 23:23:56 +01:00
5c6b11c794 Filter out network interfaces without IP addresses
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Remove interfaces like ifb0, dummy devices that have no IPs. Only show interfaces with at least one IPv4 or IPv6 address.

Version bump to 0.1.167
2025-11-26 19:19:21 +01:00
9f0aa5f806 Update network display format to match CLAUDE.md specification
All checks were successful
Build and Release / build-and-release (push) Successful in 1m38s
Nest IP addresses under physical interface names. Show physical interfaces with status icon on header line. Virtual interfaces show inline with compressed IPs.

Format:
● eno1:
  ├─ ip: 192.168.30.105
  └─ tailscale0: 100.125.108.16

Version bump to 0.1.166
2025-11-26 19:13:28 +01:00
fc247bd0ad Create dedicated network collector with physical/virtual interface grouping
All checks were successful
Build and Release / build-and-release (push) Successful in 1m43s
Move network collection from NixOS collector to dedicated NetworkCollector. Add link status detection for physical interfaces (up/down). Group interfaces by physical/virtual, show status icons for physical NICs only. Down interfaces show as Inactive instead of Critical.

Version bump to 0.1.165
2025-11-26 19:02:50 +01:00
00fe8c28ab Remove status icon from network interface display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Network interfaces now display without status icons since there's no meaningful status to show. Just shows interface name and IP addresses with subnet compression.

Version bump to 0.1.164
2025-11-26 18:15:01 +01:00
fbbb4a4cfb Add subnet compression for IP address display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Compress IPv4 addresses from same subnet to save space. Shows first IP in full (192.168.30.1) and subsequent IPs in same subnet with only last octet (100, 142).

Version bump to 0.1.163
2025-11-26 18:10:08 +01:00
53e1d8bbce Version bump to 0.1.162
All checks were successful
Build and Release / build-and-release (push) Successful in 1m44s
2025-11-26 18:01:31 +01:00
1b9fecea98 Fix nixosbox file path in release workflow
Some checks failed
Build and Release / build-and-release (push) Has been cancelled
Correct path from hosts/services/cm-dashboard.nix to services/cm-dashboard.nix
2025-11-26 17:55:28 +01:00
b7ffeaced5 Add network interface collection and display
Some checks failed
Build and Release / build-and-release (push) Failing after 1m32s
Extend NixOS collector to gather network interfaces using ip command JSON output. Display all interfaces with IPv4 and IPv6 addresses in Network section above CPU metrics. Filters out loopback and link-local addresses.

Version bump to 0.1.161
2025-11-26 17:41:35 +01:00
3858309a5d Fix Docker container detection with sudo permissions
Some checks failed
Build and Release / build-and-release (push) Failing after 1m19s
Update systemd collector to use sudo for docker ps command to resolve
permission issues when cm-agent user lacks docker group membership.
This ensures Docker containers are properly discovered and displayed
as sub-services under the docker service.

Version: 0.1.160
2025-11-25 12:40:27 +01:00
df104bf940 Remove debug prints and unused code
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Remove all debug println statements
- Remove unused service_tracker module
- Remove unused struct fields and methods
- Remove empty placeholder files (cpu.rs, memory.rs, defaults.rs)
- Fix all compiler warnings
- Clean build with zero warnings

Version bump to 0.1.159
2025-11-25 12:19:04 +01:00
d5ce36ee18 Add support for additional SMART attributes
All checks were successful
Build and Release / build-and-release (push) Successful in 1m30s
- Support Temperature_Case attribute for Intel SSDs
- Support Media_Wearout_Indicator attribute for wear percentage
- Parse wear value from column 3 (VALUE) for Media_Wearout_Indicator
- Fixes temperature and wear display for Intel PHLA847000FL512DGN drives
2025-11-25 11:53:08 +01:00
4f80701671 Fix NVMe serial display and improve pool health logic
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
- Fix physical drive serial number display in dashboard
- Improve pool health calculation for arrays with multiple disks
- Support proper tree symbols for multiple parity drives
- Read git commit hash from /var/lib/cm-dashboard/git-commit for Build display
2025-11-25 11:44:20 +01:00
267654fda4 Improve NVMe serial parsing and restructure MergerFS display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m25s
- Fix NVMe serial number parsing to handle whitespace variations
- Move mount point to MergerFS header, remove drive count
- Restructure data drives to same level as parity with Data_1, Data_2 labels
- Remove "Total:" label from pool usage line
- Update parity to use closing tree symbol as last item
2025-11-25 11:28:54 +01:00
dc1105eefe Display disk serial numbers instead of device names
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
- Add serial_number field to DriveData structure
- Collect serial numbers from SMART data for all drives
- Display truncated serial numbers (last 8 chars) in dashboard
- Fix parity drive label to show status icon before "Parity:"
- Fix mount point label styling to match other labels
2025-11-25 11:06:54 +01:00
c9d12793ef Replace device names with serial numbers in MergerFS pool display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Updates disk collector and dashboard to show drive serial numbers
instead of device names (sdX) for MergerFS data/parity drives.
Agent extracts serial numbers from SMART data and dashboard
displays them when available, falling back to device names.
2025-11-25 10:30:37 +01:00
8f80015273 Fix dashboard storage pool label styling
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Replace non-existent Typography::primary() with Typography::secondary() for
MergerFS pool labels following existing UI patterns.
2025-11-25 10:16:26 +01:00
34 changed files with 1886 additions and 2186 deletions

View File

@@ -113,13 +113,13 @@ jobs:
NIX_HASH="sha256-$(python3 -c "import base64, binascii; print(base64.b64encode(binascii.unhexlify('$NEW_HASH')).decode())")" NIX_HASH="sha256-$(python3 -c "import base64, binascii; print(base64.b64encode(binascii.unhexlify('$NEW_HASH')).decode())")"
# Update the NixOS configuration # Update the NixOS configuration
sed -i "s|version = \"v[^\"]*\"|version = \"$VERSION\"|" hosts/services/cm-dashboard.nix sed -i "s|version = \"v[^\"]*\"|version = \"$VERSION\"|" services/cm-dashboard.nix
sed -i "s|sha256 = \"sha256-[^\"]*\"|sha256 = \"$NIX_HASH\"|" hosts/services/cm-dashboard.nix sed -i "s|sha256 = \"sha256-[^\"]*\"|sha256 = \"$NIX_HASH\"|" services/cm-dashboard.nix
# Commit and push changes # Commit and push changes
git config user.name "Gitea Actions" git config user.name "Gitea Actions"
git config user.email "actions@gitea.cmtec.se" git config user.email "actions@gitea.cmtec.se"
git add hosts/services/cm-dashboard.nix git add services/cm-dashboard.nix
git commit -m "Auto-update cm-dashboard to $VERSION git commit -m "Auto-update cm-dashboard to $VERSION
- Update version to $VERSION with automated release - Update version to $VERSION with automated release

127
CLAUDE.md
View File

@@ -304,32 +304,39 @@ exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
### Display Format ### Display Format
``` ```
Network:
● eno1:
├─ ip: 192.168.30.105
└─ tailscale0: 100.125.108.16
● eno2:
└─ ip: 192.168.32.105
CPU: CPU:
● Load: 0.23 0.21 0.13 ● Load: 0.23 0.21 0.13
└─ Freq: 1048 MHz └─ Freq: 1048 MHz
RAM: RAM:
● Usage: 25% 5.8GB/23.3GB ● Usage: 25% 5.8GB/23.3GB
├─ ● /tmp: 2% 0.5GB/2GB ├─ ● /tmp: 2% 0.5GB/2GB
└─ ● /var/tmp: 0% 0GB/1.0GB └─ ● /var/tmp: 0% 0GB/1.0GB
Storage: Storage:
mergerfs (2+1): 844B9A25 T: 25C W: 4%
├─ Total: ● 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C W: 5%
│ └─ ● sdd T: 27°C W: 5%
├─ Parity: ● sdc T: 24°C W: 5%
└─ Mount: /srv/media
● nvme0n1 T: 25C W: 4%
├─ ● /: 55% 250.5GB/456.4GB ├─ ● /: 55% 250.5GB/456.4GB
└─ ● /boot: 26% 0.3GB/1.0GB └─ ● /boot: 26% 0.3GB/1.0GB
● mergerfs /srv/media:
├─ ● 63% 2355.2GB/3686.4GB
├─ ● Data_1: WDZQ8H8D T: 28°C
├─ ● Data_2: GGA04461 T: 28°C
└─ ● Parity: WDZS8RY0 T: 29°C
Backup: Backup:
● Repo: 4
├─ getea
├─ vaultwarden
├─ mysql
└─ immich
● W800639Y W: 2%
├─ ● Backup: 2025-11-29T04:00:01.324623
└─ ● Usage: 8% 70GB/916GB
● WD-WCC7K1234567 T: 32°C W: 12% ● WD-WCC7K1234567 T: 32°C W: 12%
├─ Last: 2h ago (12.3GB) ├─ ● Backup: 2025-11-29T04:00:01.324623
├─ Next: in 22h
└─ ● Usage: 45% 678GB/1.5TB └─ ● Usage: 45% 678GB/1.5TB
``` ```
@@ -361,98 +368,6 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
- ✅ "Restructure storage widget with improved layout" - ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values" - ✅ "Update CPU thresholds to production values"
## Completed Architecture Migration (v0.1.131)
## ✅ COMPLETE MONITORING SYSTEM RESTORATION (v0.1.141)
**🎉 SUCCESS: All Issues Fixed - Complete Functional Monitoring System**
### ✅ Completed Implementation (v0.1.141)
**All Major Issues Resolved:**
```
✅ Data Collection: Agent collects structured data correctly
✅ Storage Display: Perfect format with correct mount points and temperature/wear
✅ Status Evaluation: All metrics properly evaluated against thresholds
✅ Notifications: Working email alerts on status changes
✅ Thresholds: All collectors using configured thresholds for status calculation
✅ Build Information: NixOS version displayed correctly
✅ Mount Point Consistency: Stable, sorted display order
```
### ✅ All Phases Completed Successfully
#### ✅ Phase 1: Storage Display - COMPLETED
- ✅ Use `lsblk` instead of `findmnt` (eliminated `/nix/store` bind mount issue)
- ✅ Add `sudo smartctl` for permissions (SMART data collection working)
- ✅ Fix NVMe SMART parsing (`Temperature:` and `Percentage Used:` fields)
- ✅ Consistent filesystem/tmpfs sorting (no more random order swapping)
-**VERIFIED**: Dashboard shows `● nvme0n1 T: 28°C W: 1%` correctly
#### ✅ Phase 2: Status Evaluation System - COMPLETED
-**CPU Status**: Load averages and temperature evaluated against `HysteresisThresholds`
-**Memory Status**: Usage percentage evaluated against thresholds
-**Storage Status**: Drive temperature, health, and filesystem usage evaluated
-**Service Status**: Service states properly tracked and evaluated
-**Status Fields**: All AgentData structures include status information
-**Threshold Integration**: All collectors use their configured thresholds
#### ✅ Phase 3: Notification System - COMPLETED
-**Status Change Detection**: Agent tracks status between collection cycles
-**Email Notifications**: Alerts sent on degradation (OK→Warning/Critical, Warning→Critical)
-**Notification Content**: Detailed alerts with metric values and timestamps
-**NotificationManager Integration**: Fully restored and operational
-**Maintenance Mode**: `/tmp/cm-maintenance` file support maintained
#### ✅ Phase 4: Integration & Testing - COMPLETED
-**AgentData Status Fields**: All structured data includes status evaluation
-**Status Processing**: Agent applies thresholds at collection time
-**End-to-End Flow**: Collection → Evaluation → Notification → Display
-**Dynamic Versioning**: Agent version from `CARGO_PKG_VERSION`
-**Build Information**: NixOS generation display restored
### ✅ Final Architecture - WORKING
**Complete Operational Flow:**
```
Collectors → AgentData (with Status) → NotificationManager → Email Alerts
↘ ↗
ZMQ → Dashboard → Perfect Display
```
**Operational Components:**
1.**Collectors**: Populate AgentData with metrics AND status evaluation
2.**Status Evaluation**: `HysteresisThresholds.evaluate()` applied per collector
3.**Notifications**: Email alerts on status change detection
4.**Display**: Correct mount points, temperature, wear, and build information
### ✅ Success Criteria - ALL MET
**Display Requirements:**
- ✅ Dashboard shows `● nvme0n1 T: 28°C W: 1%` format perfectly
- ✅ Mount points show `/` and `/boot` (not `root`/`boot`)
- ✅ Build information shows actual NixOS version (not "unknown")
- ✅ Consistent sorting eliminates random order changes
**Monitoring Requirements:**
- ✅ High CPU load triggers Warning/Critical status and email alert
- ✅ High memory usage triggers Warning/Critical status and email alert
- ✅ High disk temperature triggers Warning/Critical status and email alert
- ✅ Failed services trigger Warning/Critical status and email alert
- ✅ Maintenance mode suppresses notifications as expected
### 🚀 Production Ready
**CM Dashboard v0.1.141 is a complete, functional infrastructure monitoring system:**
- **Real-time Monitoring**: All system components with 1-second intervals
- **Intelligent Alerting**: Email notifications on threshold violations
- **Perfect Display**: Accurate mount points, temperatures, and system information
- **Status-Aware**: All metrics evaluated against configurable thresholds
- **Production Ready**: Full monitoring capabilities restored
**The monitoring system is fully operational and ready for production use.**
## Implementation Rules ## Implementation Rules
1. **Agent Status Authority**: Agent calculates status for each metric using thresholds 1. **Agent Status Authority**: Agent calculates status for each metric using thresholds

48
Cargo.lock generated
View File

@@ -279,7 +279,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]] [[package]]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.151" version = "0.1.239"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"chrono", "chrono",
@@ -301,7 +301,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.151" version = "0.1.239"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"async-trait", "async-trait",
@@ -309,6 +309,7 @@ dependencies = [
"chrono-tz", "chrono-tz",
"clap", "clap",
"cm-dashboard-shared", "cm-dashboard-shared",
"futures",
"gethostname", "gethostname",
"lettre", "lettre",
"reqwest", "reqwest",
@@ -324,7 +325,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.151" version = "0.1.239"
dependencies = [ dependencies = [
"chrono", "chrono",
"serde", "serde",
@@ -552,6 +553,21 @@ dependencies = [
"percent-encoding", "percent-encoding",
] ]
[[package]]
name = "futures"
version = "0.3.31"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "65bc07b1a8bc7c85c5f2e110c476c7389b4554ba72af57d8445ea63a576b0876"
dependencies = [
"futures-channel",
"futures-core",
"futures-executor",
"futures-io",
"futures-sink",
"futures-task",
"futures-util",
]
[[package]] [[package]]
name = "futures-channel" name = "futures-channel"
version = "0.3.31" version = "0.3.31"
@@ -559,6 +575,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2dff15bf788c671c1934e366d07e30c1814a8ef514e1af724a602e8a2fbe1b10" checksum = "2dff15bf788c671c1934e366d07e30c1814a8ef514e1af724a602e8a2fbe1b10"
dependencies = [ dependencies = [
"futures-core", "futures-core",
"futures-sink",
] ]
[[package]] [[package]]
@@ -567,12 +584,34 @@ version = "0.3.31"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "05f29059c0c2090612e8d742178b0580d2dc940c837851ad723096f87af6663e" checksum = "05f29059c0c2090612e8d742178b0580d2dc940c837851ad723096f87af6663e"
[[package]]
name = "futures-executor"
version = "0.3.31"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e28d1d997f585e54aebc3f97d39e72338912123a67330d723fdbb564d646c9f"
dependencies = [
"futures-core",
"futures-task",
"futures-util",
]
[[package]] [[package]]
name = "futures-io" name = "futures-io"
version = "0.3.31" version = "0.3.31"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e5c1b78ca4aae1ac06c48a526a655760685149f0d465d21f37abfe57ce075c6" checksum = "9e5c1b78ca4aae1ac06c48a526a655760685149f0d465d21f37abfe57ce075c6"
[[package]]
name = "futures-macro"
version = "0.3.31"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "162ee34ebcb7c64a8abebc059ce0fee27c2262618d7b60ed8faf72fef13c3650"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]] [[package]]
name = "futures-sink" name = "futures-sink"
version = "0.3.31" version = "0.3.31"
@@ -591,8 +630,11 @@ version = "0.3.31"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9fa08315bb612088cc391249efdc3bc77536f16c91f6cf495e6fbe85b20a4a81" checksum = "9fa08315bb612088cc391249efdc3bc77536f16c91f6cf495e6fbe85b20a4a81"
dependencies = [ dependencies = [
"futures-channel",
"futures-core", "futures-core",
"futures-io", "futures-io",
"futures-macro",
"futures-sink",
"futures-task", "futures-task",
"memchr", "memchr",
"pin-project-lite", "pin-project-lite",

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.152" version = "0.1.239"
edition = "2021" edition = "2021"
[dependencies] [dependencies]
@@ -20,4 +20,5 @@ gethostname = { workspace = true }
chrono-tz = "0.8" chrono-tz = "0.8"
toml = { workspace = true } toml = { workspace = true }
async-trait = "0.1" async-trait = "0.1"
reqwest = { version = "0.11", features = ["json", "blocking"] } reqwest = { version = "0.11", features = ["json", "blocking"] }
futures = "0.3"

View File

@@ -1,10 +1,10 @@
use anyhow::Result; use anyhow::Result;
use gethostname::gethostname; use gethostname::gethostname;
use std::time::Duration; use std::time::{Duration, Instant};
use tokio::time::interval; use tokio::time::interval;
use tracing::{debug, error, info}; use tracing::{debug, error, info};
use crate::communication::{AgentCommand, ZmqHandler}; use crate::communication::ZmqHandler;
use crate::config::AgentConfig; use crate::config::AgentConfig;
use crate::collectors::{ use crate::collectors::{
Collector, Collector,
@@ -12,21 +12,29 @@ use crate::collectors::{
cpu::CpuCollector, cpu::CpuCollector,
disk::DiskCollector, disk::DiskCollector,
memory::MemoryCollector, memory::MemoryCollector,
network::NetworkCollector,
nixos::NixOSCollector, nixos::NixOSCollector,
systemd::SystemdCollector, systemd::SystemdCollector,
}; };
use crate::notifications::NotificationManager; use crate::notifications::NotificationManager;
use crate::service_tracker::UserStoppedServiceTracker;
use cm_dashboard_shared::AgentData; use cm_dashboard_shared::AgentData;
/// Wrapper for collectors with timing information
struct TimedCollector {
collector: Box<dyn Collector>,
interval: Duration,
last_collection: Option<Instant>,
name: String,
}
pub struct Agent { pub struct Agent {
hostname: String, hostname: String,
config: AgentConfig, config: AgentConfig,
zmq_handler: ZmqHandler, zmq_handler: ZmqHandler,
collectors: Vec<Box<dyn Collector>>, collectors: Vec<TimedCollector>,
notification_manager: NotificationManager, notification_manager: NotificationManager,
service_tracker: UserStoppedServiceTracker,
previous_status: Option<SystemStatus>, previous_status: Option<SystemStatus>,
cached_agent_data: AgentData,
} }
/// Track system component status for change detection /// Track system component status for change detection
@@ -56,32 +64,78 @@ impl Agent {
config.zmq.publisher_port config.zmq.publisher_port
); );
// Initialize collectors // Initialize collectors with timing information
let mut collectors: Vec<Box<dyn Collector>> = Vec::new(); let mut collectors: Vec<TimedCollector> = Vec::new();
// Add enabled collectors // Add enabled collectors
if config.collectors.cpu.enabled { if config.collectors.cpu.enabled {
collectors.push(Box::new(CpuCollector::new(config.collectors.cpu.clone()))); collectors.push(TimedCollector {
collector: Box::new(CpuCollector::new(config.collectors.cpu.clone())),
interval: Duration::from_secs(config.collectors.cpu.interval_seconds),
last_collection: None,
name: "CPU".to_string(),
});
info!("CPU collector initialized with {}s interval", config.collectors.cpu.interval_seconds);
} }
if config.collectors.memory.enabled { if config.collectors.memory.enabled {
collectors.push(Box::new(MemoryCollector::new(config.collectors.memory.clone()))); collectors.push(TimedCollector {
collector: Box::new(MemoryCollector::new(config.collectors.memory.clone())),
interval: Duration::from_secs(config.collectors.memory.interval_seconds),
last_collection: None,
name: "Memory".to_string(),
});
info!("Memory collector initialized with {}s interval", config.collectors.memory.interval_seconds);
} }
if config.collectors.disk.enabled { if config.collectors.disk.enabled {
collectors.push(Box::new(DiskCollector::new(config.collectors.disk.clone()))); collectors.push(TimedCollector {
collector: Box::new(DiskCollector::new(config.collectors.disk.clone())),
interval: Duration::from_secs(config.collectors.disk.interval_seconds),
last_collection: None,
name: "Disk".to_string(),
});
info!("Disk collector initialized with {}s interval", config.collectors.disk.interval_seconds);
} }
if config.collectors.systemd.enabled { if config.collectors.systemd.enabled {
collectors.push(Box::new(SystemdCollector::new(config.collectors.systemd.clone()))); collectors.push(TimedCollector {
collector: Box::new(SystemdCollector::new(config.collectors.systemd.clone())),
interval: Duration::from_secs(config.collectors.systemd.interval_seconds),
last_collection: None,
name: "Systemd".to_string(),
});
info!("Systemd collector initialized with {}s interval", config.collectors.systemd.interval_seconds);
} }
if config.collectors.backup.enabled { if config.collectors.backup.enabled {
collectors.push(Box::new(BackupCollector::new())); collectors.push(TimedCollector {
collector: Box::new(BackupCollector::new()),
interval: Duration::from_secs(config.collectors.backup.interval_seconds),
last_collection: None,
name: "Backup".to_string(),
});
info!("Backup collector initialized with {}s interval", config.collectors.backup.interval_seconds);
} }
if config.collectors.network.enabled {
collectors.push(TimedCollector {
collector: Box::new(NetworkCollector::new(config.collectors.network.clone())),
interval: Duration::from_secs(config.collectors.network.interval_seconds),
last_collection: None,
name: "Network".to_string(),
});
info!("Network collector initialized with {}s interval", config.collectors.network.interval_seconds);
}
if config.collectors.nixos.enabled { if config.collectors.nixos.enabled {
collectors.push(Box::new(NixOSCollector::new(config.collectors.nixos.clone()))); collectors.push(TimedCollector {
collector: Box::new(NixOSCollector::new(config.collectors.nixos.clone())),
interval: Duration::from_secs(config.collectors.nixos.interval_seconds),
last_collection: None,
name: "NixOS".to_string(),
});
info!("NixOS collector initialized with {}s interval", config.collectors.nixos.interval_seconds);
} }
info!("Initialized {} collectors", collectors.len()); info!("Initialized {} collectors", collectors.len());
@@ -90,9 +144,8 @@ impl Agent {
let notification_manager = NotificationManager::new(&config.notifications, &hostname)?; let notification_manager = NotificationManager::new(&config.notifications, &hostname)?;
info!("Notification manager initialized"); info!("Notification manager initialized");
// Initialize service tracker // Initialize cached agent data
let service_tracker = UserStoppedServiceTracker::new(); let cached_agent_data = AgentData::new(hostname.clone(), env!("CARGO_PKG_VERSION").to_string());
info!("Service tracker initialized");
Ok(Self { Ok(Self {
hostname, hostname,
@@ -100,8 +153,8 @@ impl Agent {
zmq_handler, zmq_handler,
collectors, collectors,
notification_manager, notification_manager,
service_tracker,
previous_status: None, previous_status: None,
cached_agent_data,
}) })
} }
@@ -116,7 +169,7 @@ impl Agent {
// Set up intervals // Set up intervals
let mut transmission_interval = interval(Duration::from_secs( let mut transmission_interval = interval(Duration::from_secs(
self.config.collection_interval_seconds, self.config.zmq.transmission_interval_seconds,
)); ));
let mut notification_interval = interval(Duration::from_secs(30)); // Check notifications every 30s let mut notification_interval = interval(Duration::from_secs(30)); // Check notifications every 30s
@@ -136,12 +189,6 @@ impl Agent {
// NOTE: With structured data, we might need to implement status tracking differently // NOTE: With structured data, we might need to implement status tracking differently
// For now, we skip this until status evaluation is migrated // For now, we skip this until status evaluation is migrated
} }
// Handle incoming commands (check periodically)
_ = tokio::time::sleep(Duration::from_millis(100)) => {
if let Err(e) = self.handle_commands().await {
error!("Error handling commands: {}", e);
}
}
_ = &mut shutdown_rx => { _ = &mut shutdown_rx => {
info!("Shutdown signal received, stopping agent loop"); info!("Shutdown signal received, stopping agent loop");
break; break;
@@ -157,24 +204,47 @@ impl Agent {
async fn collect_and_broadcast(&mut self) -> Result<()> { async fn collect_and_broadcast(&mut self) -> Result<()> {
debug!("Starting structured data collection"); debug!("Starting structured data collection");
// Initialize empty AgentData // Collect data from collectors whose intervals have elapsed
let mut agent_data = AgentData::new(self.hostname.clone(), env!("CARGO_PKG_VERSION").to_string()); // Update cached_agent_data with new data
let now = Instant::now();
for timed_collector in &mut self.collectors {
let should_collect = match timed_collector.last_collection {
None => true, // First collection
Some(last_time) => now.duration_since(last_time) >= timed_collector.interval,
};
// Collect data from all collectors if should_collect {
for collector in &self.collectors { if let Err(e) = timed_collector.collector.collect_structured(&mut self.cached_agent_data).await {
if let Err(e) = collector.collect_structured(&mut agent_data).await { error!("Collector {} failed: {}", timed_collector.name, e);
error!("Collector failed: {}", e); // Update last_collection time even on failure to prevent immediate retries
// Continue with other collectors even if one fails timed_collector.last_collection = Some(now);
} else {
timed_collector.last_collection = Some(now);
debug!(
"Collected from {} ({}s interval)",
timed_collector.name,
timed_collector.interval.as_secs()
);
}
} }
} }
// Update timestamp on cached data
self.cached_agent_data.timestamp = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_secs();
// Clone for notification check (to avoid borrow issues)
let agent_data_snapshot = self.cached_agent_data.clone();
// Check for status changes and send notifications // Check for status changes and send notifications
if let Err(e) = self.check_status_changes_and_notify(&agent_data).await { if let Err(e) = self.check_status_changes_and_notify(&agent_data_snapshot).await {
error!("Failed to check status changes: {}", e); error!("Failed to check status changes: {}", e);
} }
// Broadcast the structured data via ZMQ // Broadcast the cached structured data via ZMQ
if let Err(e) = self.zmq_handler.publish_agent_data(&agent_data).await { if let Err(e) = self.zmq_handler.publish_agent_data(&agent_data_snapshot).await {
error!("Failed to broadcast agent data: {}", e); error!("Failed to broadcast agent data: {}", e);
} else { } else {
debug!("Successfully broadcast structured agent data"); debug!("Successfully broadcast structured agent data");
@@ -261,36 +331,4 @@ impl Agent {
Ok(()) Ok(())
} }
/// Handle incoming commands from dashboard
async fn handle_commands(&mut self) -> Result<()> {
// Try to receive a command (non-blocking)
if let Ok(Some(command)) = self.zmq_handler.try_receive_command() {
info!("Received command: {:?}", command);
match command {
AgentCommand::CollectNow => {
info!("Received immediate collection request");
if let Err(e) = self.collect_and_broadcast().await {
error!("Failed to collect on demand: {}", e);
}
}
AgentCommand::SetInterval { seconds } => {
info!("Received interval change request: {}s", seconds);
// Note: This would require more complex handling to update the interval
// For now, just acknowledge
}
AgentCommand::ToggleCollector { name, enabled } => {
info!("Received collector toggle request: {} -> {}", name, enabled);
// Note: This would require more complex handling to enable/disable collectors
// For now, just acknowledge
}
AgentCommand::Ping => {
info!("Received ping command");
// Maybe send back a pong or status
}
}
}
Ok(())
}
} }

View File

@@ -1,37 +1,66 @@
use async_trait::async_trait; use async_trait::async_trait;
use chrono::{NaiveDateTime, DateTime}; use cm_dashboard_shared::{AgentData, BackupData, BackupDiskData, Status};
use cm_dashboard_shared::{AgentData, BackupData, BackupDiskData};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use std::collections::HashMap; use std::collections::{HashMap, HashSet};
use std::fs; use std::fs;
use std::path::Path; use std::path::{Path, PathBuf};
use tracing::debug; use tracing::{debug, warn};
use super::{Collector, CollectorError}; use super::{Collector, CollectorError};
/// Backup collector that reads backup status from TOML files with structured data output /// Backup collector that reads backup status from TOML files with structured data output
pub struct BackupCollector { pub struct BackupCollector {
/// Path to backup status file /// Directory containing backup status files
status_file_path: String, status_dir: String,
} }
impl BackupCollector { impl BackupCollector {
pub fn new() -> Self { pub fn new() -> Self {
Self { Self {
status_file_path: "/var/lib/backup/backup-status.toml".to_string(), status_dir: "/var/lib/backup/status".to_string(),
} }
} }
/// Read backup status from TOML file /// Scan directory for all backup status files
async fn read_backup_status(&self) -> Result<Option<BackupStatusToml>, CollectorError> { async fn scan_status_files(&self) -> Result<Vec<PathBuf>, CollectorError> {
if !Path::new(&self.status_file_path).exists() { let status_path = Path::new(&self.status_dir);
debug!("Backup status file not found: {}", self.status_file_path);
return Ok(None); if !status_path.exists() {
debug!("Backup status directory not found: {}", self.status_dir);
return Ok(Vec::new());
} }
let content = fs::read_to_string(&self.status_file_path) let mut status_files = Vec::new();
match fs::read_dir(status_path) {
Ok(entries) => {
for entry in entries {
if let Ok(entry) = entry {
let path = entry.path();
if path.is_file() {
if let Some(filename) = path.file_name().and_then(|n| n.to_str()) {
if filename.starts_with("backup-status-") && filename.ends_with(".toml") {
status_files.push(path);
}
}
}
}
}
}
Err(e) => {
warn!("Failed to read backup status directory: {}", e);
return Ok(Vec::new());
}
}
Ok(status_files)
}
/// Read a single backup status file
async fn read_status_file(&self, path: &Path) -> Result<BackupStatusToml, CollectorError> {
let content = fs::read_to_string(path)
.map_err(|e| CollectorError::SystemRead { .map_err(|e| CollectorError::SystemRead {
path: self.status_file_path.clone(), path: path.to_string_lossy().to_string(),
error: e.to_string(), error: e.to_string(),
})?; })?;
@@ -41,66 +70,122 @@ impl BackupCollector {
error: format!("Failed to parse backup status TOML: {}", e), error: format!("Failed to parse backup status TOML: {}", e),
})?; })?;
Ok(Some(status)) Ok(status)
}
/// Calculate backup status from TOML status field
fn calculate_backup_status(status_str: &str) -> Status {
match status_str.to_lowercase().as_str() {
"success" => Status::Ok,
"warning" => Status::Warning,
"failed" | "error" => Status::Critical,
_ => Status::Unknown,
}
}
/// Calculate usage status from disk usage percentage
fn calculate_usage_status(usage_percent: f32) -> Status {
if usage_percent < 80.0 {
Status::Ok
} else if usage_percent < 90.0 {
Status::Warning
} else {
Status::Critical
}
} }
/// Convert BackupStatusToml to BackupData and populate AgentData /// Convert BackupStatusToml to BackupData and populate AgentData
async fn populate_backup_data(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> { async fn populate_backup_data(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> {
if let Some(backup_status) = self.read_backup_status().await? { let status_files = self.scan_status_files().await?;
// Use raw start_time string from TOML
// Extract disk information if status_files.is_empty() {
let repository_disk = if let Some(disk_space) = &backup_status.disk_space { debug!("No backup status files found");
Some(BackupDiskData {
serial: backup_status.disk_serial_number.clone().unwrap_or_else(|| "Unknown".to_string()),
usage_percent: disk_space.usage_percent as f32,
used_gb: disk_space.used_gb as f32,
total_gb: disk_space.total_gb as f32,
wear_percent: backup_status.disk_wear_percent,
temperature_celsius: None, // Not available in current TOML
})
} else if let Some(serial) = &backup_status.disk_serial_number {
// Fallback: create minimal disk info if we have serial but no disk_space
Some(BackupDiskData {
serial: serial.clone(),
usage_percent: 0.0,
used_gb: 0.0,
total_gb: 0.0,
wear_percent: backup_status.disk_wear_percent,
temperature_celsius: None,
})
} else {
None
};
// Calculate total repository size from services
let total_size_gb = backup_status.services
.values()
.map(|service| service.repo_size_bytes as f32 / (1024.0 * 1024.0 * 1024.0))
.sum::<f32>();
let backup_data = BackupData {
status: backup_status.status,
total_size_gb: Some(total_size_gb),
repository_health: Some("ok".to_string()), // Derive from status if needed
repository_disk,
last_backup_size_gb: None, // Not available in current TOML format
start_time_raw: Some(backup_status.start_time),
};
agent_data.backup = backup_data;
} else {
// No backup status available - set default values
agent_data.backup = BackupData { agent_data.backup = BackupData {
status: "unavailable".to_string(), repositories: Vec::new(),
total_size_gb: None, repository_status: Status::Unknown,
repository_health: None, disks: Vec::new(),
repository_disk: None,
last_backup_size_gb: None,
start_time_raw: None,
}; };
return Ok(());
} }
let mut all_repositories = HashSet::new();
let mut disks = Vec::new();
let mut worst_status = Status::Ok;
for status_file in status_files {
match self.read_status_file(&status_file).await {
Ok(backup_status) => {
// Collect all service names
for service_name in backup_status.services.keys() {
all_repositories.insert(service_name.clone());
}
// Calculate backup status
let backup_status_enum = Self::calculate_backup_status(&backup_status.status);
// Calculate usage status from disk space
let (usage_percent, used_gb, total_gb, usage_status) = if let Some(disk_space) = &backup_status.disk_space {
let usage_pct = disk_space.usage_percent as f32;
(
usage_pct,
disk_space.used_gb as f32,
disk_space.total_gb as f32,
Self::calculate_usage_status(usage_pct),
)
} else {
(0.0, 0.0, 0.0, Status::Unknown)
};
// Update worst status
worst_status = worst_status.max(backup_status_enum).max(usage_status);
// Build service list for this disk
let services: Vec<String> = backup_status.services.keys().cloned().collect();
// Get min and max archive counts to detect inconsistencies
let archives_min: i64 = backup_status.services.values()
.map(|service| service.archive_count)
.min()
.unwrap_or(0);
let archives_max: i64 = backup_status.services.values()
.map(|service| service.archive_count)
.max()
.unwrap_or(0);
// Create disk data
let disk_data = BackupDiskData {
serial: backup_status.disk_serial_number.unwrap_or_else(|| "Unknown".to_string()),
product_name: backup_status.disk_product_name,
wear_percent: backup_status.disk_wear_percent,
temperature_celsius: None, // Not available in current TOML
last_backup_time: Some(backup_status.start_time),
backup_status: backup_status_enum,
disk_usage_percent: usage_percent,
disk_used_gb: used_gb,
disk_total_gb: total_gb,
usage_status,
services,
archives_min,
archives_max,
};
disks.push(disk_data);
}
Err(e) => {
warn!("Failed to read backup status file {:?}: {}", status_file, e);
}
}
}
let repositories: Vec<String> = all_repositories.into_iter().collect();
agent_data.backup = BackupData {
repositories,
repository_status: worst_status,
disks,
};
Ok(()) Ok(())
} }
} }

View File

@@ -119,36 +119,69 @@ impl CpuCollector {
utils::parse_u64(content.trim()) utils::parse_u64(content.trim())
} }
/// Collect CPU frequency and populate AgentData /// Collect CPU C-state (idle depth) and populate AgentData with top 3 C-states by usage
async fn collect_frequency(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> { async fn collect_cstate(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> {
// Try scaling frequency first (more accurate for current frequency) // Read C-state usage from first CPU (representative of overall system)
if let Ok(freq) = // C-states indicate CPU idle depth: C1=light sleep, C6=deep sleep, C10=deepest
utils::read_proc_file("/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq")
{
if let Ok(freq_khz) = utils::parse_u64(freq.trim()) {
let freq_mhz = freq_khz as f32 / 1000.0;
agent_data.system.cpu.frequency_mhz = freq_mhz;
return Ok(());
}
}
// Fallback: parse /proc/cpuinfo for base frequency let mut cstate_times: Vec<(String, u64)> = Vec::new();
if let Ok(content) = utils::read_proc_file("/proc/cpuinfo") { let mut total_time: u64 = 0;
for line in content.lines() {
if line.starts_with("cpu MHz") { // Collect all C-state times from CPU0
if let Some(freq_str) = line.split(':').nth(1) { for state_num in 0..=10 {
if let Ok(freq_mhz) = utils::parse_f32(freq_str) { let time_path = format!("/sys/devices/system/cpu/cpu0/cpuidle/state{}/time", state_num);
agent_data.system.cpu.frequency_mhz = freq_mhz; let name_path = format!("/sys/devices/system/cpu/cpu0/cpuidle/state{}/name", state_num);
return Ok(());
if let Ok(time_str) = utils::read_proc_file(&time_path) {
if let Ok(time) = utils::parse_u64(time_str.trim()) {
if let Ok(name) = utils::read_proc_file(&name_path) {
let state_name = name.trim();
// Skip POLL state (not real idle)
if state_name != "POLL" && time > 0 {
// Extract "C" + digits pattern (C3, C10, etc.) to reduce JSON size
// Handles formats like "C3_ACPI", "C10_MWAIT", etc.
let clean_name = if let Some(c_pos) = state_name.find('C') {
let rest = &state_name[c_pos + 1..];
let digit_count = rest.chars().take_while(|c| c.is_ascii_digit()).count();
if digit_count > 0 {
state_name[c_pos..c_pos + 1 + digit_count].to_string()
} else {
state_name.to_string()
}
} else {
state_name.to_string()
};
cstate_times.push((clean_name, time));
total_time += time;
} }
} }
break; // Only need first CPU entry
} }
} else {
// No more states available
break;
} }
} }
debug!("CPU frequency not available"); // Sort by time descending to get top 3
// Leave frequency as 0.0 if not available cstate_times.sort_by(|a, b| b.1.cmp(&a.1));
// Calculate percentages for top 3 and populate AgentData
agent_data.system.cpu.cstates = cstate_times
.iter()
.take(3)
.map(|(name, time)| {
let percent = if total_time > 0 {
(*time as f32 / total_time as f32) * 100.0
} else {
0.0
};
cm_dashboard_shared::CStateInfo {
name: name.clone(),
percent,
}
})
.collect();
Ok(()) Ok(())
} }
} }
@@ -165,8 +198,8 @@ impl Collector for CpuCollector {
// Collect temperature (optional) // Collect temperature (optional)
self.collect_temperature(agent_data).await?; self.collect_temperature(agent_data).await?;
// Collect frequency (optional) // Collect C-state (CPU idle depth)
self.collect_frequency(agent_data).await?; self.collect_cstate(agent_data).await?;
let duration = start.elapsed(); let duration = start.elapsed();
debug!("CPU collection completed in {:?}", duration); debug!("CPU collection completed in {:?}", duration);

View File

@@ -3,10 +3,9 @@ use async_trait::async_trait;
use cm_dashboard_shared::{AgentData, DriveData, FilesystemData, PoolData, HysteresisThresholds, Status}; use cm_dashboard_shared::{AgentData, DriveData, FilesystemData, PoolData, HysteresisThresholds, Status};
use crate::config::DiskConfig; use crate::config::DiskConfig;
use std::process::Command; use tokio::process::Command as TokioCommand;
use std::time::Instant; use std::process::Command as StdCommand;
use std::collections::HashMap; use std::collections::HashMap;
use tracing::debug;
use super::{Collector, CollectorError}; use super::{Collector, CollectorError};
@@ -19,10 +18,8 @@ pub struct DiskCollector {
/// A physical drive with its filesystems /// A physical drive with its filesystems
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
struct PhysicalDrive { struct PhysicalDrive {
name: String, // e.g., "nvme0n1", "sda" name: String, // e.g., "nvme0n1", "sda"
health: String, // SMART health status health: String, // SMART health status
temperature_celsius: Option<f32>, // Drive temperature
wear_percent: Option<f32>, // SSD wear level
filesystems: Vec<Filesystem>, // mounted filesystems on this drive filesystems: Vec<Filesystem>, // mounted filesystems on this drive
} }
@@ -69,18 +66,25 @@ impl DiskCollector {
/// Collect all storage data and populate AgentData /// Collect all storage data and populate AgentData
async fn collect_storage_data(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> { async fn collect_storage_data(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> {
let start_time = Instant::now(); // Clear drives and pools to prevent duplicates when updating cached data
debug!("Starting clean storage collection"); agent_data.system.storage.drives.clear();
agent_data.system.storage.pools.clear();
// Step 1: Get mount points and their backing devices // Step 1: Get mount points and their backing devices
let mount_devices = self.get_mount_devices().await?; let mount_devices = self.get_mount_devices().await?;
// Step 2: Get filesystem usage for each mount point using df // Step 2: Get filesystem usage for each mount point using df
let filesystem_usage = self.get_filesystem_usage(&mount_devices).map_err(|e| CollectorError::Parse { let mut filesystem_usage = self.get_filesystem_usage(&mount_devices).map_err(|e| CollectorError::Parse {
value: "filesystem usage".to_string(), value: "filesystem usage".to_string(),
error: format!("Failed to get filesystem usage: {}", e), error: format!("Failed to get filesystem usage: {}", e),
})?; })?;
// Step 2.5: Add MergerFS mount points that weren't in lsblk output
self.add_mergerfs_filesystem_usage(&mut filesystem_usage).map_err(|e| CollectorError::Parse {
value: "mergerfs filesystem usage".to_string(),
error: format!("Failed to get mergerfs filesystem usage: {}", e),
})?;
// Step 3: Detect MergerFS pools // Step 3: Detect MergerFS pools
let mergerfs_pools = self.detect_mergerfs_pools(&filesystem_usage).map_err(|e| CollectorError::Parse { let mergerfs_pools = self.detect_mergerfs_pools(&filesystem_usage).map_err(|e| CollectorError::Parse {
value: "mergerfs pools".to_string(), value: "mergerfs pools".to_string(),
@@ -100,17 +104,17 @@ impl DiskCollector {
self.populate_drives_data(&physical_drives, &smart_data, agent_data)?; self.populate_drives_data(&physical_drives, &smart_data, agent_data)?;
self.populate_pools_data(&mergerfs_pools, &smart_data, agent_data)?; self.populate_pools_data(&mergerfs_pools, &smart_data, agent_data)?;
let elapsed = start_time.elapsed();
debug!("Storage collection completed in {:?}", elapsed);
Ok(()) Ok(())
} }
/// Get block devices and their mount points using lsblk /// Get block devices and their mount points using lsblk
async fn get_mount_devices(&self) -> Result<HashMap<String, String>, CollectorError> { async fn get_mount_devices(&self) -> Result<HashMap<String, String>, CollectorError> {
let output = Command::new("lsblk") use super::run_command_with_timeout;
.args(&["-rn", "-o", "NAME,MOUNTPOINT"])
.output() let mut cmd = TokioCommand::new("lsblk");
cmd.args(&["-rn", "-o", "NAME,MOUNTPOINT"]);
let output = run_command_with_timeout(cmd, 2).await
.map_err(|e| CollectorError::SystemRead { .map_err(|e| CollectorError::SystemRead {
path: "block devices".to_string(), path: "block devices".to_string(),
error: e.to_string(), error: e.to_string(),
@@ -134,7 +138,6 @@ impl DiskCollector {
} }
} }
debug!("Found {} mounted block devices", mount_devices.len());
Ok(mount_devices) Ok(mount_devices)
} }
@@ -147,8 +150,8 @@ impl DiskCollector {
Ok((total, used)) => { Ok((total, used)) => {
filesystem_usage.insert(mount_point.clone(), (total, used)); filesystem_usage.insert(mount_point.clone(), (total, used));
} }
Err(e) => { Err(_e) => {
debug!("Failed to get filesystem info for {}: {}", mount_point, e); // Silently skip filesystems we can't read
} }
} }
} }
@@ -156,10 +159,32 @@ impl DiskCollector {
Ok(filesystem_usage) Ok(filesystem_usage)
} }
/// Add filesystem usage for MergerFS mount points that aren't in lsblk
fn add_mergerfs_filesystem_usage(&self, filesystem_usage: &mut HashMap<String, (u64, u64)>) -> anyhow::Result<()> {
let mounts_content = std::fs::read_to_string("/proc/mounts")
.map_err(|e| anyhow::anyhow!("Failed to read /proc/mounts: {}", e))?;
for line in mounts_content.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 3 && parts[2] == "fuse.mergerfs" {
let mount_point = parts[1].to_string();
// Only add if we don't already have usage data for this mount point
if !filesystem_usage.contains_key(&mount_point) {
if let Ok((total, used)) = self.get_filesystem_info(&mount_point) {
filesystem_usage.insert(mount_point, (total, used));
}
}
}
}
Ok(())
}
/// Get filesystem info for a single mount point /// Get filesystem info for a single mount point
fn get_filesystem_info(&self, mount_point: &str) -> Result<(u64, u64), CollectorError> { fn get_filesystem_info(&self, mount_point: &str) -> Result<(u64, u64), CollectorError> {
let output = Command::new("df") let output = StdCommand::new("timeout")
.args(&["--block-size=1", mount_point]) .args(&["2", "df", "--block-size=1", mount_point])
.output() .output()
.map_err(|e| CollectorError::SystemRead { .map_err(|e| CollectorError::SystemRead {
path: format!("df {}", mount_point), path: format!("df {}", mount_point),
@@ -221,9 +246,8 @@ impl DiskCollector {
} else { } else {
mount_point.trim_start_matches('/').replace('/', "_") mount_point.trim_start_matches('/').replace('/', "_")
}; };
if pool_name.is_empty() { if pool_name.is_empty() {
debug!("Skipping mergerfs pool with empty name: {}", mount_point);
continue; continue;
} }
@@ -251,8 +275,7 @@ impl DiskCollector {
// Categorize as data vs parity drives // Categorize as data vs parity drives
let (data_drives, parity_drives) = match self.categorize_pool_drives(&all_member_paths) { let (data_drives, parity_drives) = match self.categorize_pool_drives(&all_member_paths) {
Ok(drives) => drives, Ok(drives) => drives,
Err(e) => { Err(_e) => {
debug!("Failed to categorize drives for pool {}: {}. Skipping.", mount_point, e);
continue; continue;
} }
}; };
@@ -267,8 +290,7 @@ impl DiskCollector {
}); });
} }
} }
debug!("Found {} mergerfs pools", pools.len());
Ok(pools) Ok(pools)
} }
@@ -321,8 +343,6 @@ impl DiskCollector {
let physical_drive = PhysicalDrive { let physical_drive = PhysicalDrive {
name: drive_name, name: drive_name,
health: "UNKNOWN".to_string(), // Will be updated with SMART data health: "UNKNOWN".to_string(), // Will be updated with SMART data
temperature_celsius: None,
wear_percent: None,
filesystems, filesystems,
}; };
physical_drives.push(physical_drive); physical_drives.push(physical_drive);
@@ -357,10 +377,10 @@ impl DiskCollector {
device.to_string() device.to_string()
} }
/// Get SMART data for drives /// Get SMART data for drives in parallel
async fn get_smart_data_for_drives(&self, physical_drives: &[PhysicalDrive], mergerfs_pools: &[MergerfsPool]) -> HashMap<String, SmartData> { async fn get_smart_data_for_drives(&self, physical_drives: &[PhysicalDrive], mergerfs_pools: &[MergerfsPool]) -> HashMap<String, SmartData> {
let mut smart_data = HashMap::new(); use futures::future::join_all;
// Collect all drive names // Collect all drive names
let mut all_drives = std::collections::HashSet::new(); let mut all_drives = std::collections::HashSet::new();
for drive in physical_drives { for drive in physical_drives {
@@ -375,9 +395,24 @@ impl DiskCollector {
} }
} }
// Get SMART data for each drive // Collect SMART data for all drives in parallel
for drive_name in all_drives { let futures: Vec<_> = all_drives
if let Ok(data) = self.get_smart_data(&drive_name).await { .iter()
.map(|drive_name| {
let drive = drive_name.clone();
async move {
let result = self.get_smart_data(&drive).await;
(drive, result)
}
})
.collect();
let results = join_all(futures).await;
// Build HashMap from results
let mut smart_data = HashMap::new();
for (drive_name, result) in results {
if let Ok(data) = result {
smart_data.insert(drive_name, data); smart_data.insert(drive_name, data);
} }
} }
@@ -387,32 +422,39 @@ impl DiskCollector {
/// Get SMART data for a single drive /// Get SMART data for a single drive
async fn get_smart_data(&self, drive_name: &str) -> Result<SmartData, CollectorError> { async fn get_smart_data(&self, drive_name: &str) -> Result<SmartData, CollectorError> {
let output = Command::new("sudo") use super::run_command_with_timeout;
.args(&["smartctl", "-a", &format!("/dev/{}", drive_name)])
.output() // Use direct smartctl (no sudo) - service has CAP_SYS_RAWIO and CAP_SYS_ADMIN capabilities
// For NVMe drives, specify device type explicitly
let mut cmd = TokioCommand::new("smartctl");
if drive_name.starts_with("nvme") {
cmd.args(&["-d", "nvme", "-a", &format!("/dev/{}", drive_name)]);
} else {
cmd.args(&["-a", &format!("/dev/{}", drive_name)]);
}
let output = run_command_with_timeout(cmd, 3).await
.map_err(|e| CollectorError::SystemRead { .map_err(|e| CollectorError::SystemRead {
path: format!("SMART data for {}", drive_name), path: format!("SMART data for {}", drive_name),
error: e.to_string(), error: e.to_string(),
})?; })?;
let output_str = String::from_utf8_lossy(&output.stdout); let output_str = String::from_utf8_lossy(&output.stdout);
let error_str = String::from_utf8_lossy(&output.stderr);
// Note: smartctl returns non-zero exit codes for warnings (like exit code 32
// Debug logging for SMART command results // for "temperature was high in the past"), but the output data is still valid.
debug!("SMART output for {}: status={}, stdout_len={}, stderr={}", // Only check if we got any output at all, don't reject based on exit code.
drive_name, output.status, output_str.len(), error_str); if output_str.is_empty() {
if !output.status.success() {
debug!("SMART command failed for {}: {}", drive_name, error_str);
// Return unknown data rather than failing completely
return Ok(SmartData { return Ok(SmartData {
health: "UNKNOWN".to_string(), health: "UNKNOWN".to_string(),
serial_number: None,
temperature_celsius: None, temperature_celsius: None,
wear_percent: None, wear_percent: None,
}); });
} }
let mut health = "UNKNOWN".to_string(); let mut health = "UNKNOWN".to_string();
let mut serial_number = None;
let mut temperature = None; let mut temperature = None;
let mut wear_percent = None; let mut wear_percent = None;
@@ -425,8 +467,21 @@ impl DiskCollector {
} }
} }
// Serial number parsing (both SATA and NVMe)
if line.contains("Serial Number:") {
if let Some(serial_part) = line.split("Serial Number:").nth(1) {
let serial_str = serial_part.trim();
if !serial_str.is_empty() {
// Take first whitespace-separated token
if let Some(serial) = serial_str.split_whitespace().next() {
serial_number = Some(serial.to_string());
}
}
}
}
// Temperature parsing for different drive types // Temperature parsing for different drive types
if line.contains("Temperature_Celsius") || line.contains("Airflow_Temperature_Cel") { if line.contains("Temperature_Celsius") || line.contains("Airflow_Temperature_Cel") || line.contains("Temperature_Case") {
// Traditional SATA drives: attribute table format // Traditional SATA drives: attribute table format
if let Some(temp_str) = line.split_whitespace().nth(9) { if let Some(temp_str) = line.split_whitespace().nth(9) {
if let Ok(temp) = temp_str.parse::<f32>() { if let Ok(temp) = temp_str.parse::<f32>() {
@@ -444,7 +499,15 @@ impl DiskCollector {
} }
// Wear level parsing for SSDs // Wear level parsing for SSDs
if line.contains("Wear_Leveling_Count") || line.contains("SSD_Life_Left") { if line.contains("Media_Wearout_Indicator") {
// Media_Wearout_Indicator stores remaining life % in column 3 (VALUE)
if let Some(wear_str) = line.split_whitespace().nth(3) {
if let Ok(remaining) = wear_str.parse::<f32>() {
wear_percent = Some(100.0 - remaining); // Convert remaining life to wear
}
}
} else if line.contains("Wear_Leveling_Count") || line.contains("SSD_Life_Left") {
// Other wear attributes store value in column 9 (RAW_VALUE)
if let Some(wear_str) = line.split_whitespace().nth(9) { if let Some(wear_str) = line.split_whitespace().nth(9) {
if let Ok(wear) = wear_str.parse::<f32>() { if let Ok(wear) = wear_str.parse::<f32>() {
wear_percent = Some(100.0 - wear); // Convert remaining life to wear wear_percent = Some(100.0 - wear); // Convert remaining life to wear
@@ -467,6 +530,7 @@ impl DiskCollector {
Ok(SmartData { Ok(SmartData {
health, health,
serial_number,
temperature_celsius: temperature, temperature_celsius: temperature,
wear_percent, wear_percent,
}) })
@@ -492,6 +556,7 @@ impl DiskCollector {
agent_data.system.storage.drives.push(DriveData { agent_data.system.storage.drives.push(DriveData {
name: drive.name.clone(), name: drive.name.clone(),
serial_number: smart.and_then(|s| s.serial_number.clone()),
health: smart.map(|s| s.health.clone()).unwrap_or_else(|| drive.health.clone()), health: smart.map(|s| s.health.clone()).unwrap_or_else(|| drive.health.clone()),
temperature_celsius: smart.and_then(|s| s.temperature_celsius), temperature_celsius: smart.and_then(|s| s.temperature_celsius),
wear_percent: smart.and_then(|s| s.wear_percent), wear_percent: smart.and_then(|s| s.wear_percent),
@@ -511,8 +576,8 @@ impl DiskCollector {
/// Populate pools data into AgentData /// Populate pools data into AgentData
fn populate_pools_data(&self, mergerfs_pools: &[MergerfsPool], smart_data: &HashMap<String, SmartData>, agent_data: &mut AgentData) -> Result<(), CollectorError> { fn populate_pools_data(&self, mergerfs_pools: &[MergerfsPool], smart_data: &HashMap<String, SmartData>, agent_data: &mut AgentData) -> Result<(), CollectorError> {
for pool in mergerfs_pools { for pool in mergerfs_pools {
// Calculate pool health based on member drive health // Calculate pool health and statuses based on member drive health
let (pool_health, data_drive_data, parity_drive_data) = self.calculate_pool_health(pool, smart_data); let (pool_health, health_status, usage_status, data_drive_data, parity_drive_data) = self.calculate_pool_health(pool, smart_data);
let pool_data = PoolData { let pool_data = PoolData {
name: pool.name.clone(), name: pool.name.clone(),
@@ -526,6 +591,8 @@ impl DiskCollector {
total_gb: pool.total_bytes as f32 / (1024.0 * 1024.0 * 1024.0), total_gb: pool.total_bytes as f32 / (1024.0 * 1024.0 * 1024.0),
data_drives: data_drive_data, data_drives: data_drive_data,
parity_drives: parity_drive_data, parity_drives: parity_drive_data,
health_status,
usage_status,
}; };
agent_data.system.storage.pools.push(pool_data); agent_data.system.storage.pools.push(pool_data);
@@ -535,7 +602,7 @@ impl DiskCollector {
} }
/// Calculate pool health based on member drive status /// Calculate pool health based on member drive status
fn calculate_pool_health(&self, pool: &MergerfsPool, smart_data: &HashMap<String, SmartData>) -> (String, Vec<cm_dashboard_shared::PoolDriveData>, Vec<cm_dashboard_shared::PoolDriveData>) { fn calculate_pool_health(&self, pool: &MergerfsPool, smart_data: &HashMap<String, SmartData>) -> (String, cm_dashboard_shared::Status, cm_dashboard_shared::Status, Vec<cm_dashboard_shared::PoolDriveData>, Vec<cm_dashboard_shared::PoolDriveData>) {
let mut failed_data = 0; let mut failed_data = 0;
let mut failed_parity = 0; let mut failed_parity = 0;
@@ -543,16 +610,24 @@ impl DiskCollector {
let data_drive_data: Vec<cm_dashboard_shared::PoolDriveData> = pool.data_drives.iter().map(|d| { let data_drive_data: Vec<cm_dashboard_shared::PoolDriveData> = pool.data_drives.iter().map(|d| {
let smart = smart_data.get(&d.name); let smart = smart_data.get(&d.name);
let health = smart.map(|s| s.health.clone()).unwrap_or_else(|| "UNKNOWN".to_string()); let health = smart.map(|s| s.health.clone()).unwrap_or_else(|| "UNKNOWN".to_string());
let temperature = smart.and_then(|s| s.temperature_celsius).or(d.temperature_celsius);
if health == "FAILED" { if health == "FAILED" {
failed_data += 1; failed_data += 1;
} }
// Calculate drive statuses using config thresholds
let health_status = self.calculate_health_status(&health);
let temperature_status = temperature.map(|t| self.temperature_thresholds.evaluate(t)).unwrap_or(cm_dashboard_shared::Status::Unknown);
cm_dashboard_shared::PoolDriveData { cm_dashboard_shared::PoolDriveData {
name: d.name.clone(), name: d.name.clone(),
temperature_celsius: smart.and_then(|s| s.temperature_celsius).or(d.temperature_celsius), serial_number: smart.and_then(|s| s.serial_number.clone()),
temperature_celsius: temperature,
health, health,
wear_percent: smart.and_then(|s| s.wear_percent), wear_percent: smart.and_then(|s| s.wear_percent),
health_status,
temperature_status,
} }
}).collect(); }).collect();
@@ -560,27 +635,57 @@ impl DiskCollector {
let parity_drive_data: Vec<cm_dashboard_shared::PoolDriveData> = pool.parity_drives.iter().map(|d| { let parity_drive_data: Vec<cm_dashboard_shared::PoolDriveData> = pool.parity_drives.iter().map(|d| {
let smart = smart_data.get(&d.name); let smart = smart_data.get(&d.name);
let health = smart.map(|s| s.health.clone()).unwrap_or_else(|| "UNKNOWN".to_string()); let health = smart.map(|s| s.health.clone()).unwrap_or_else(|| "UNKNOWN".to_string());
let temperature = smart.and_then(|s| s.temperature_celsius).or(d.temperature_celsius);
if health == "FAILED" { if health == "FAILED" {
failed_parity += 1; failed_parity += 1;
} }
// Calculate drive statuses using config thresholds
let health_status = self.calculate_health_status(&health);
let temperature_status = temperature.map(|t| self.temperature_thresholds.evaluate(t)).unwrap_or(cm_dashboard_shared::Status::Unknown);
cm_dashboard_shared::PoolDriveData { cm_dashboard_shared::PoolDriveData {
name: d.name.clone(), name: d.name.clone(),
temperature_celsius: smart.and_then(|s| s.temperature_celsius).or(d.temperature_celsius), serial_number: smart.and_then(|s| s.serial_number.clone()),
temperature_celsius: temperature,
health, health,
wear_percent: smart.and_then(|s| s.wear_percent), wear_percent: smart.and_then(|s| s.wear_percent),
health_status,
temperature_status,
} }
}).collect(); }).collect();
// Calculate overall pool health // Calculate overall pool health string and status
let pool_health = match (failed_data, failed_parity) { // SnapRAID logic: can tolerate up to N parity drive failures (where N = number of parity drives)
(0, 0) => "healthy".to_string(), // If data drives fail AND we've lost parity protection, that's critical
(1, 0) | (0, 1) => "degraded".to_string(), // One failure is degraded but recoverable let (pool_health, health_status) = if failed_data == 0 && failed_parity == 0 {
_ => "critical".to_string(), // Multiple failures are critical ("healthy".to_string(), cm_dashboard_shared::Status::Ok)
} else if failed_data == 0 && failed_parity > 0 {
// Parity failed but no data loss - degraded (reduced protection)
("degraded".to_string(), cm_dashboard_shared::Status::Warning)
} else if failed_data == 1 && failed_parity == 0 {
// One data drive failed, parity intact - degraded (recoverable)
("degraded".to_string(), cm_dashboard_shared::Status::Warning)
} else {
// Multiple data drives failed OR data+parity failed = data loss risk
("critical".to_string(), cm_dashboard_shared::Status::Critical)
}; };
(pool_health, data_drive_data, parity_drive_data) // Calculate pool usage status using config thresholds
let usage_percent = if pool.total_bytes > 0 {
(pool.used_bytes as f32 / pool.total_bytes as f32) * 100.0
} else { 0.0 };
let usage_status = if usage_percent >= self.config.usage_critical_percent {
cm_dashboard_shared::Status::Critical
} else if usage_percent >= self.config.usage_warning_percent {
cm_dashboard_shared::Status::Warning
} else {
cm_dashboard_shared::Status::Ok
};
(pool_health, health_status, usage_status, data_drive_data, parity_drive_data)
} }
/// Calculate filesystem usage status /// Calculate filesystem usage status
@@ -665,9 +770,9 @@ impl DiskCollector {
/// Get drive information for a mount path /// Get drive information for a mount path
fn get_drive_info_for_path(&self, path: &str) -> anyhow::Result<PoolDrive> { fn get_drive_info_for_path(&self, path: &str) -> anyhow::Result<PoolDrive> {
// Use lsblk to find the backing device // Use lsblk to find the backing device with timeout
let output = Command::new("lsblk") let output = StdCommand::new("timeout")
.args(&["-rn", "-o", "NAME,MOUNTPOINT"]) .args(&["2", "lsblk", "-rn", "-o", "NAME,MOUNTPOINT"])
.output() .output()
.map_err(|e| anyhow::anyhow!("Failed to run lsblk: {}", e))?; .map_err(|e| anyhow::anyhow!("Failed to run lsblk: {}", e))?;
@@ -688,20 +793,13 @@ impl DiskCollector {
// Extract base device name (e.g., "sda1" -> "sda") // Extract base device name (e.g., "sda1" -> "sda")
let base_device = self.extract_base_device(&format!("/dev/{}", device)); let base_device = self.extract_base_device(&format!("/dev/{}", device));
// Get temperature from SMART data if available // Temperature will be filled in later from parallel SMART collection
let temperature = if let Ok(smart_data) = tokio::task::block_in_place(|| { // Don't collect it here to avoid sequential blocking with problematic async nesting
tokio::runtime::Handle::current().block_on(self.get_smart_data(&base_device))
}) {
smart_data.temperature_celsius
} else {
None
};
Ok(PoolDrive { Ok(PoolDrive {
name: base_device, name: base_device,
mount_point: path.to_string(), mount_point: path.to_string(),
temperature_celsius: temperature, temperature_celsius: None,
}) })
} }
@@ -749,6 +847,7 @@ impl Collector for DiskCollector {
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
struct SmartData { struct SmartData {
health: String, health: String,
serial_number: Option<String>,
temperature_celsius: Option<f32>, temperature_celsius: Option<f32>,
wear_percent: Option<f32>, wear_percent: Option<f32>,
} }

View File

@@ -105,12 +105,12 @@ impl MemoryCollector {
return Ok(()); return Ok(());
} }
// Get usage data for all tmpfs mounts at once using df // Get usage data for all tmpfs mounts at once using df (with 2 second timeout)
let mut df_args = vec!["df", "--output=target,size,used", "--block-size=1"]; let mut df_args = vec!["2", "df", "--output=target,size,used", "--block-size=1"];
df_args.extend(tmpfs_mounts.iter().map(|s| s.as_str())); df_args.extend(tmpfs_mounts.iter().map(|s| s.as_str()));
let df_output = std::process::Command::new(df_args[0]) let df_output = std::process::Command::new("timeout")
.args(&df_args[1..]) .args(&df_args[..])
.output() .output()
.map_err(|e| CollectorError::SystemRead { .map_err(|e| CollectorError::SystemRead {
path: "tmpfs mounts".to_string(), path: "tmpfs mounts".to_string(),
@@ -200,13 +200,16 @@ impl Collector for MemoryCollector {
debug!("Collecting memory metrics"); debug!("Collecting memory metrics");
let start = std::time::Instant::now(); let start = std::time::Instant::now();
// Clear tmpfs list to prevent duplicates when updating cached data
agent_data.system.memory.tmpfs.clear();
// Parse memory info from /proc/meminfo // Parse memory info from /proc/meminfo
let info = self.parse_meminfo().await?; let info = self.parse_meminfo().await?;
// Populate memory data directly // Populate memory data directly
self.populate_memory_data(&info, agent_data).await?; self.populate_memory_data(&info, agent_data).await?;
// Collect tmpfs data // Collect tmpfs data
self.populate_tmpfs_data(agent_data).await?; self.populate_tmpfs_data(agent_data).await?;
let duration = start.elapsed(); let duration = start.elapsed();

View File

@@ -1,17 +1,51 @@
use async_trait::async_trait; use async_trait::async_trait;
use cm_dashboard_shared::{AgentData}; use cm_dashboard_shared::{AgentData};
use std::process::Output;
use std::time::Duration;
pub mod backup; pub mod backup;
pub mod cpu; pub mod cpu;
pub mod disk; pub mod disk;
pub mod error; pub mod error;
pub mod memory; pub mod memory;
pub mod network;
pub mod nixos; pub mod nixos;
pub mod systemd; pub mod systemd;
pub use error::CollectorError; pub use error::CollectorError;
/// Run a command with a timeout to prevent blocking
/// Properly kills the process if timeout is exceeded
pub async fn run_command_with_timeout(mut cmd: tokio::process::Command, timeout_secs: u64) -> std::io::Result<Output> {
use tokio::time::timeout;
use std::process::Stdio;
let timeout_duration = Duration::from_secs(timeout_secs);
// Configure stdio to capture output
cmd.stdout(Stdio::piped());
cmd.stderr(Stdio::piped());
let child = cmd.spawn()?;
let pid = child.id();
match timeout(timeout_duration, child.wait_with_output()).await {
Ok(result) => result,
Err(_) => {
// Timeout - force kill the process using system kill command
if let Some(process_id) = pid {
let _ = tokio::process::Command::new("kill")
.args(&["-9", &process_id.to_string()])
.output()
.await;
}
Err(std::io::Error::new(
std::io::ErrorKind::TimedOut,
format!("Command timed out after {} seconds", timeout_secs)
))
}
}
}
/// Base trait for all collectors with direct structured data output /// Base trait for all collectors with direct structured data output
#[async_trait] #[async_trait]

View File

@@ -0,0 +1,224 @@
use async_trait::async_trait;
use cm_dashboard_shared::{AgentData, NetworkInterfaceData, Status};
use std::process::Command;
use tracing::debug;
use super::{Collector, CollectorError};
use crate::config::NetworkConfig;
/// Network interface collector with physical/virtual classification and link status
pub struct NetworkCollector {
_config: NetworkConfig,
}
impl NetworkCollector {
pub fn new(config: NetworkConfig) -> Self {
Self { _config: config }
}
/// Check if interface is physical (not virtual)
fn is_physical_interface(name: &str) -> bool {
// Physical interface patterns
matches!(
&name[..],
s if s.starts_with("eth")
|| s.starts_with("ens")
|| s.starts_with("enp")
|| s.starts_with("wlan")
|| s.starts_with("wlp")
|| s.starts_with("eno")
|| s.starts_with("enx")
)
}
/// Get link status for an interface
fn get_link_status(interface: &str) -> Status {
let operstate_path = format!("/sys/class/net/{}/operstate", interface);
match std::fs::read_to_string(&operstate_path) {
Ok(state) => {
let state = state.trim();
match state {
"up" => Status::Ok,
"down" => Status::Inactive,
"unknown" => Status::Warning,
_ => Status::Unknown,
}
}
Err(_) => Status::Unknown,
}
}
/// Get the primary physical interface (the one with default route)
fn get_primary_physical_interface() -> Option<String> {
match Command::new("timeout").args(["2", "ip", "route", "show", "default"]).output() {
Ok(output) if output.status.success() => {
let output_str = String::from_utf8_lossy(&output.stdout);
// Parse: "default via 192.168.1.1 dev eno1 ..."
for line in output_str.lines() {
if line.starts_with("default") {
if let Some(dev_pos) = line.find(" dev ") {
let after_dev = &line[dev_pos + 5..];
if let Some(space_pos) = after_dev.find(' ') {
let interface = &after_dev[..space_pos];
// Only return if it's a physical interface
if Self::is_physical_interface(interface) {
return Some(interface.to_string());
}
} else {
// No space after interface name (end of line)
let interface = after_dev.trim();
if Self::is_physical_interface(interface) {
return Some(interface.to_string());
}
}
}
}
}
None
}
_ => None,
}
}
/// Parse VLAN configuration from /proc/net/vlan/config
/// Returns a map of interface name -> VLAN ID
fn parse_vlan_config() -> std::collections::HashMap<String, u16> {
let mut vlan_map = std::collections::HashMap::new();
if let Ok(contents) = std::fs::read_to_string("/proc/net/vlan/config") {
for line in contents.lines().skip(2) { // Skip header lines
let parts: Vec<&str> = line.split('|').collect();
if parts.len() >= 2 {
let interface_name = parts[0].trim();
let vlan_id_str = parts[1].trim();
if let Ok(vlan_id) = vlan_id_str.parse::<u16>() {
vlan_map.insert(interface_name.to_string(), vlan_id);
}
}
}
}
vlan_map
}
/// Collect network interfaces using ip command
async fn collect_interfaces(&self) -> Vec<NetworkInterfaceData> {
let mut interfaces = Vec::new();
// Parse VLAN configuration
let vlan_map = Self::parse_vlan_config();
match Command::new("timeout").args(["2", "ip", "-j", "addr"]).output() {
Ok(output) if output.status.success() => {
let json_str = String::from_utf8_lossy(&output.stdout);
if let Ok(json_data) = serde_json::from_str::<serde_json::Value>(&json_str) {
if let Some(ifaces) = json_data.as_array() {
for iface in ifaces {
let name = iface["ifname"].as_str().unwrap_or("").to_string();
// Skip loopback, empty names, and ifb* interfaces
if name.is_empty() || name == "lo" || name.starts_with("ifb") {
continue;
}
// Parse parent interface from @parent notation (e.g., lan@enp0s31f6)
let (interface_name, parent_interface) = if let Some(at_pos) = name.find('@') {
let (child, parent) = name.split_at(at_pos);
(child.to_string(), Some(parent[1..].to_string()))
} else {
(name.clone(), None)
};
let mut ipv4_addresses = Vec::new();
let mut ipv6_addresses = Vec::new();
// Extract IP addresses
if let Some(addr_info) = iface["addr_info"].as_array() {
for addr in addr_info {
if let Some(family) = addr["family"].as_str() {
if let Some(local) = addr["local"].as_str() {
match family {
"inet" => ipv4_addresses.push(local.to_string()),
"inet6" => {
// Skip link-local IPv6 addresses (fe80::)
if !local.starts_with("fe80:") {
ipv6_addresses.push(local.to_string());
}
}
_ => {}
}
}
}
}
}
// Determine if physical and get status
let is_physical = Self::is_physical_interface(&interface_name);
// Only filter out virtual interfaces without IPs
// Physical interfaces should always be shown even if down/no IPs
if !is_physical && ipv4_addresses.is_empty() && ipv6_addresses.is_empty() {
continue;
}
let link_status = if is_physical {
Self::get_link_status(&name)
} else {
Status::Unknown // Virtual interfaces don't have meaningful link status
};
// Look up VLAN ID from the map (use original name before @ parsing)
let vlan_id = vlan_map.get(&name).copied();
interfaces.push(NetworkInterfaceData {
name: interface_name,
ipv4_addresses,
ipv6_addresses,
is_physical,
link_status,
parent_interface,
vlan_id,
});
}
}
}
}
Err(e) => {
debug!("Failed to execute ip command: {}", e);
}
Ok(output) => {
debug!("ip command failed with status: {}", output.status);
}
}
// Assign primary physical interface as parent to virtual interfaces without explicit parent
let primary_interface = Self::get_primary_physical_interface();
if let Some(primary) = primary_interface {
for interface in interfaces.iter_mut() {
// Only assign parent to virtual interfaces that don't already have one
if !interface.is_physical && interface.parent_interface.is_none() {
interface.parent_interface = Some(primary.clone());
}
}
}
interfaces
}
}
#[async_trait]
impl Collector for NetworkCollector {
async fn collect_structured(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> {
debug!("Collecting network interface data");
// Collect all network interfaces
let interfaces = self.collect_interfaces().await;
agent_data.system.network.interfaces = interfaces;
Ok(())
}
}

View File

@@ -5,21 +5,18 @@ use std::process::Command;
use tracing::debug; use tracing::debug;
use super::{Collector, CollectorError}; use super::{Collector, CollectorError};
use crate::config::NixOSConfig;
/// NixOS system information collector with structured data output /// NixOS system information collector with structured data output
/// ///
/// This collector gathers NixOS-specific information like: /// This collector gathers NixOS-specific information like:
/// - System generation/build information /// - System generation/build information
/// - Version information /// - Version information
/// - Agent version from Nix store path /// - Agent version from Nix store path
pub struct NixOSCollector { pub struct NixOSCollector;
config: NixOSConfig,
}
impl NixOSCollector { impl NixOSCollector {
pub fn new(config: NixOSConfig) -> Self { pub fn new(_config: crate::config::NixOSConfig) -> Self {
Self { config } Self
} }
/// Collect NixOS system information and populate AgentData /// Collect NixOS system information and populate AgentData
@@ -46,8 +43,8 @@ impl NixOSCollector {
match fs::read_to_string("/etc/hostname") { match fs::read_to_string("/etc/hostname") {
Ok(hostname) => Some(hostname.trim().to_string()), Ok(hostname) => Some(hostname.trim().to_string()),
Err(_) => { Err(_) => {
// Fallback to hostname command // Fallback to hostname command (with 2 second timeout)
match Command::new("hostname").output() { match Command::new("timeout").args(["2", "hostname"]).output() {
Ok(output) => Some(String::from_utf8_lossy(&output.stdout).trim().to_string()), Ok(output) => Some(String::from_utf8_lossy(&output.stdout).trim().to_string()),
Err(_) => None, Err(_) => None,
} }
@@ -83,14 +80,25 @@ impl NixOSCollector {
std::env::var("CM_DASHBOARD_VERSION").unwrap_or_else(|_| "unknown".to_string()) std::env::var("CM_DASHBOARD_VERSION").unwrap_or_else(|_| "unknown".to_string())
} }
/// Get NixOS system generation (build) information /// Get NixOS system generation (build) information from git commit
async fn get_nixos_generation(&self) -> Option<String> { async fn get_nixos_generation(&self) -> Option<String> {
match Command::new("nixos-version").output() { // Try to read git commit hash from file written during rebuild
Ok(output) => { let commit_file = "/var/lib/cm-dashboard/git-commit";
let version_str = String::from_utf8_lossy(&output.stdout); match fs::read_to_string(commit_file) {
Some(version_str.trim().to_string()) Ok(content) => {
let commit_hash = content.trim();
if commit_hash.len() >= 7 {
debug!("Found git commit hash: {}", commit_hash);
Some(commit_hash.to_string())
} else {
debug!("Git commit hash too short: {}", commit_hash);
None
}
}
Err(e) => {
debug!("Failed to read git commit file {}: {}", commit_file, e);
None
} }
Err(_) => None,
} }
} }
} }

View File

@@ -22,8 +22,6 @@ pub struct SystemdCollector {
struct ServiceCacheState { struct ServiceCacheState {
/// Last collection time for performance tracking /// Last collection time for performance tracking
last_collection: Option<Instant>, last_collection: Option<Instant>,
/// Cached service data
services: Vec<ServiceInfo>,
/// Cached complete service data with sub-services /// Cached complete service data with sub-services
cached_service_data: Vec<ServiceData>, cached_service_data: Vec<ServiceData>,
/// Interesting services to monitor (cached after discovery) /// Interesting services to monitor (cached after discovery)
@@ -45,25 +43,16 @@ struct ServiceCacheState {
/// Cached service status information from systemctl list-units /// Cached service status information from systemctl list-units
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
struct ServiceStatusInfo { struct ServiceStatusInfo {
load_state: String,
active_state: String, active_state: String,
sub_state: String, memory_bytes: Option<u64>,
} restart_count: Option<u32>,
start_timestamp: Option<u64>,
/// Internal service information
#[derive(Debug, Clone)]
struct ServiceInfo {
name: String,
status: String, // "active", "inactive", "failed", etc.
memory_mb: f32, // Memory usage in MB
disk_gb: f32, // Disk usage in GB
} }
impl SystemdCollector { impl SystemdCollector {
pub fn new(config: SystemdConfig) -> Self { pub fn new(config: SystemdConfig) -> Self {
let state = ServiceCacheState { let state = ServiceCacheState {
last_collection: None, last_collection: None,
services: Vec::new(),
cached_service_data: Vec::new(), cached_service_data: Vec::new(),
monitored_services: Vec::new(), monitored_services: Vec::new(),
service_status_cache: std::collections::HashMap::new(), service_status_cache: std::collections::HashMap::new(),
@@ -73,7 +62,7 @@ impl SystemdCollector {
last_nginx_check_time: None, last_nginx_check_time: None,
nginx_check_interval_seconds: config.nginx_check_interval_seconds, nginx_check_interval_seconds: config.nginx_check_interval_seconds,
}; };
Self { Self {
state: RwLock::new(state), state: RwLock::new(state),
config, config,
@@ -95,18 +84,23 @@ impl SystemdCollector {
}; };
// Collect service data for each monitored service // Collect service data for each monitored service
let mut services = Vec::new();
let mut complete_service_data = Vec::new(); let mut complete_service_data = Vec::new();
for service_name in &monitored_services { for service_name in &monitored_services {
match self.get_service_status(service_name) { match self.get_service_status(service_name) {
Ok((active_status, _detailed_info)) => { Ok(status_info) => {
let memory_mb = self.get_service_memory_usage(service_name).await.unwrap_or(0.0);
let disk_gb = self.get_service_disk_usage(service_name).await.unwrap_or(0.0);
let mut sub_services = Vec::new(); let mut sub_services = Vec::new();
// Calculate uptime if we have start timestamp
let uptime_seconds = status_info.start_timestamp.and_then(|start| {
let now = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.ok()?
.as_secs();
Some(now.saturating_sub(start))
});
// Sub-service metrics for specific services (always include cached results) // Sub-service metrics for specific services (always include cached results)
if service_name.contains("nginx") && active_status == "active" { if service_name.contains("nginx") && status_info.active_state == "active" {
let nginx_sites = self.get_nginx_site_metrics(); let nginx_sites = self.get_nginx_site_metrics();
for (site_name, latency_ms) in nginx_sites { for (site_name, latency_ms) in nginx_sites {
let site_status = if latency_ms >= 0.0 && latency_ms < self.config.nginx_latency_critical_ms { let site_status = if latency_ms >= 0.0 && latency_ms < self.config.nginx_latency_critical_ms {
@@ -126,41 +120,67 @@ impl SystemdCollector {
name: site_name.clone(), name: site_name.clone(),
service_status: self.calculate_service_status(&site_name, &site_status), service_status: self.calculate_service_status(&site_name, &site_status),
metrics, metrics,
service_type: "nginx_site".to_string(),
}); });
} }
} }
if service_name.contains("docker") && active_status == "active" { if service_name.contains("docker") && status_info.active_state == "active" {
let docker_containers = self.get_docker_containers(); let docker_containers = self.get_docker_containers();
for (container_name, container_status) in docker_containers { for (container_name, container_status) in docker_containers {
// For now, docker containers have no additional metrics // For now, docker containers have no additional metrics
// Future: could add memory_mb, cpu_percent, restart_count, etc. // Future: could add memory_mb, cpu_percent, restart_count, etc.
let metrics = Vec::new(); let metrics = Vec::new();
sub_services.push(SubServiceData { sub_services.push(SubServiceData {
name: container_name.clone(), name: container_name.clone(),
service_status: self.calculate_service_status(&container_name, &container_status), service_status: self.calculate_service_status(&container_name, &container_status),
metrics, metrics,
service_type: "container".to_string(),
});
}
// Add Docker images
let docker_images = self.get_docker_images();
for (image_name, image_status, image_size_mb) in docker_images {
let mut metrics = Vec::new();
metrics.push(SubServiceMetric {
label: "size".to_string(),
value: image_size_mb,
unit: Some("MB".to_string()),
});
sub_services.push(SubServiceData {
name: image_name.to_string(),
service_status: Status::Info, // Informational only, no status icon
metrics,
service_type: "image".to_string(),
}); });
} }
} }
let service_info = ServiceInfo { if service_name == "openvpn-vpn-connection" && status_info.active_state == "active" {
name: service_name.clone(), if let Some(external_ip) = self.get_vpn_external_ip() {
status: active_status.clone(), let metrics = Vec::new();
memory_mb,
disk_gb, sub_services.push(SubServiceData {
}; name: format!("ip: {}", external_ip),
services.push(service_info); service_status: Status::Info,
metrics,
service_type: "vpn_route".to_string(),
});
}
}
// Create complete service data // Create complete service data
let service_data = ServiceData { let service_data = ServiceData {
name: service_name.clone(), name: service_name.clone(),
memory_mb,
disk_gb,
user_stopped: false, // TODO: Integrate with service tracker user_stopped: false, // TODO: Integrate with service tracker
service_status: self.calculate_service_status(service_name, &active_status), service_status: self.calculate_service_status(service_name, &status_info.active_state),
sub_services, sub_services,
memory_bytes: status_info.memory_bytes,
restart_count: status_info.restart_count,
uptime_seconds,
}; };
// Add to AgentData and cache // Add to AgentData and cache
@@ -172,12 +192,15 @@ impl SystemdCollector {
} }
} }
} }
// Sort services alphabetically by name
agent_data.services.sort_by(|a, b| a.name.cmp(&b.name));
complete_service_data.sort_by(|a, b| a.name.cmp(&b.name));
// Update cached state // Update cached state
{ {
let mut state = self.state.write().unwrap(); let mut state = self.state.write().unwrap();
state.last_collection = Some(start_time); state.last_collection = Some(start_time);
state.services = services;
state.cached_service_data = complete_service_data; state.cached_service_data = complete_service_data;
} }
@@ -252,18 +275,18 @@ impl SystemdCollector {
/// Auto-discover interesting services to monitor /// Auto-discover interesting services to monitor
fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> { fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> {
// First: Get all service unit files // First: Get all service unit files (with 3 second timeout)
let unit_files_output = Command::new("systemctl") let unit_files_output = Command::new("timeout")
.args(&["list-unit-files", "--type=service", "--no-pager", "--plain"]) .args(&["3", "systemctl", "list-unit-files", "--type=service", "--no-pager", "--plain"])
.output()?; .output()?;
if !unit_files_output.status.success() { if !unit_files_output.status.success() {
return Err(anyhow::anyhow!("systemctl list-unit-files command failed")); return Err(anyhow::anyhow!("systemctl list-unit-files command failed"));
} }
// Second: Get runtime status of all units // Second: Get runtime status of all units (with 3 second timeout)
let units_status_output = Command::new("systemctl") let units_status_output = Command::new("timeout")
.args(&["list-units", "--type=service", "--all", "--no-pager", "--plain"]) .args(&["3", "systemctl", "list-units", "--type=service", "--all", "--no-pager", "--plain"])
.output()?; .output()?;
if !units_status_output.status.success() { if !units_status_output.status.success() {
@@ -293,14 +316,13 @@ impl SystemdCollector {
let fields: Vec<&str> = line.split_whitespace().collect(); let fields: Vec<&str> = line.split_whitespace().collect();
if fields.len() >= 4 && fields[0].ends_with(".service") { if fields.len() >= 4 && fields[0].ends_with(".service") {
let service_name = fields[0].trim_end_matches(".service"); let service_name = fields[0].trim_end_matches(".service");
let load_state = fields.get(1).unwrap_or(&"unknown").to_string();
let active_state = fields.get(2).unwrap_or(&"unknown").to_string(); let active_state = fields.get(2).unwrap_or(&"unknown").to_string();
let sub_state = fields.get(3).unwrap_or(&"unknown").to_string();
status_cache.insert(service_name.to_string(), ServiceStatusInfo { status_cache.insert(service_name.to_string(), ServiceStatusInfo {
load_state,
active_state, active_state,
sub_state, memory_bytes: None,
restart_count: None,
start_timestamp: None,
}); });
} }
} }
@@ -309,9 +331,10 @@ impl SystemdCollector {
for service_name in &all_service_names { for service_name in &all_service_names {
if !status_cache.contains_key(service_name) { if !status_cache.contains_key(service_name) {
status_cache.insert(service_name.to_string(), ServiceStatusInfo { status_cache.insert(service_name.to_string(), ServiceStatusInfo {
load_state: "not-loaded".to_string(),
active_state: "inactive".to_string(), active_state: "inactive".to_string(),
sub_state: "dead".to_string(), memory_bytes: None,
restart_count: None,
start_timestamp: None,
}); });
} }
} }
@@ -343,36 +366,60 @@ impl SystemdCollector {
Ok((services, status_cache)) Ok((services, status_cache))
} }
/// Get service status from cache (if available) or fallback to systemctl /// Get service status with detailed metrics from systemctl
fn get_service_status(&self, service: &str) -> Result<(String, String)> { fn get_service_status(&self, service: &str) -> Result<ServiceStatusInfo> {
// Try to get status from cache first // Always fetch fresh data to get detailed metrics (memory, restarts, uptime)
if let Ok(state) = self.state.read() { // Note: Cache in service_status_cache only has basic active_state from discovery,
if let Some(cached_info) = state.service_status_cache.get(service) { // with all detailed metrics set to None. We need fresh systemctl show data.
let active_status = cached_info.active_state.clone();
let detailed_info = format!( let output = Command::new("timeout")
"LoadState={}\nActiveState={}\nSubState={}", .args(&[
cached_info.load_state, "2",
cached_info.active_state, "systemctl",
cached_info.sub_state "show",
); &format!("{}.service", service),
return Ok((active_status, detailed_info)); "--property=LoadState,ActiveState,SubState,MemoryCurrent,NRestarts,ExecMainStartTimestamp"
])
.output()?;
let output_str = String::from_utf8(output.stdout)?;
// Parse properties
let mut active_state = String::new();
let mut memory_bytes = None;
let mut restart_count = None;
let mut start_timestamp = None;
for line in output_str.lines() {
if let Some(value) = line.strip_prefix("ActiveState=") {
active_state = value.to_string();
} else if let Some(value) = line.strip_prefix("MemoryCurrent=") {
if value != "[not set]" {
memory_bytes = value.parse().ok();
}
} else if let Some(value) = line.strip_prefix("NRestarts=") {
restart_count = value.parse().ok();
} else if let Some(value) = line.strip_prefix("ExecMainStartTimestamp=") {
if value != "[not set]" && !value.is_empty() {
// Parse timestamp to seconds since epoch
if let Ok(output) = Command::new("date")
.args(&["+%s", "-d", value])
.output()
{
if let Ok(timestamp_str) = String::from_utf8(output.stdout) {
start_timestamp = timestamp_str.trim().parse().ok();
}
}
}
} }
} }
// Fallback to systemctl if not in cache Ok(ServiceStatusInfo {
let output = Command::new("systemctl") active_state,
.args(&["is-active", &format!("{}.service", service)]) memory_bytes,
.output()?; restart_count,
start_timestamp,
let active_status = String::from_utf8(output.stdout)?.trim().to_string(); })
// Get more detailed info
let output = Command::new("systemctl")
.args(&["show", &format!("{}.service", service), "--property=LoadState,ActiveState,SubState"])
.output()?;
let detailed_info = String::from_utf8(output.stdout)?;
Ok((active_status, detailed_info))
} }
/// Check if service name matches pattern (supports wildcards like nginx*) /// Check if service name matches pattern (supports wildcards like nginx*)
@@ -414,94 +461,6 @@ impl SystemdCollector {
true true
} }
/// Get disk usage for a specific service
async fn get_service_disk_usage(&self, service_name: &str) -> Result<f32, CollectorError> {
// Check if this service has configured directory paths
if let Some(dirs) = self.config.service_directories.get(service_name) {
// Service has configured paths - use the first accessible one
for dir in dirs {
if let Some(size) = self.get_directory_size(dir) {
return Ok(size);
}
}
// If configured paths failed, return 0
return Ok(0.0);
}
// No configured path - try to get WorkingDirectory from systemctl
let output = Command::new("systemctl")
.args(&["show", &format!("{}.service", service_name), "--property=WorkingDirectory"])
.output()
.map_err(|e| CollectorError::SystemRead {
path: format!("WorkingDirectory for {}", service_name),
error: e.to_string(),
})?;
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
if line.starts_with("WorkingDirectory=") && !line.contains("[not set]") {
let dir = line.strip_prefix("WorkingDirectory=").unwrap_or("");
if !dir.is_empty() && dir != "/" {
return Ok(self.get_directory_size(dir).unwrap_or(0.0));
}
}
}
Ok(0.0)
}
/// Get size of a directory in GB
fn get_directory_size(&self, path: &str) -> Option<f32> {
let output = Command::new("sudo")
.args(&["du", "-sb", path])
.output()
.ok()?;
if !output.status.success() {
// Log permission errors for debugging but don't spam logs
let stderr = String::from_utf8_lossy(&output.stderr);
if stderr.contains("Permission denied") {
debug!("Permission denied accessing directory: {}", path);
} else {
debug!("Failed to get size for directory {}: {}", path, stderr);
}
return None;
}
let output_str = String::from_utf8(output.stdout).ok()?;
let size_str = output_str.split_whitespace().next()?;
if let Ok(size_bytes) = size_str.parse::<u64>() {
let size_gb = size_bytes as f32 / (1024.0 * 1024.0 * 1024.0);
// Return size even if very small (minimum 0.001 GB = 1MB for visibility)
if size_gb > 0.0 {
Some(size_gb.max(0.001))
} else {
None
}
} else {
None
}
}
/// Get service memory usage (if available)
fn get_service_memory(&self, service: &str) -> Option<f32> {
let output = Command::new("systemctl")
.args(&["show", &format!("{}.service", service), "--property=MemoryCurrent"])
.output()
.ok()?;
let output_str = String::from_utf8(output.stdout).ok()?;
for line in output_str.lines() {
if line.starts_with("MemoryCurrent=") {
let memory_str = line.strip_prefix("MemoryCurrent=")?;
if let Ok(memory_bytes) = memory_str.parse::<u64>() {
return Some(memory_bytes as f32 / (1024.0 * 1024.0)); // Convert to MB
}
}
}
None
}
/// Calculate service status, taking user-stopped services into account /// Calculate service status, taking user-stopped services into account
fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status { fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status {
match active_status.to_lowercase().as_str() { match active_status.to_lowercase().as_str() {
@@ -519,37 +478,10 @@ impl SystemdCollector {
} }
} }
/// Get memory usage for a specific service
async fn get_service_memory_usage(&self, service_name: &str) -> Result<f32, CollectorError> {
let output = Command::new("systemctl")
.args(&["show", &format!("{}.service", service_name), "--property=MemoryCurrent"])
.output()
.map_err(|e| CollectorError::SystemRead {
path: format!("memory usage for {}", service_name),
error: e.to_string(),
})?;
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
if line.starts_with("MemoryCurrent=") {
if let Some(mem_str) = line.strip_prefix("MemoryCurrent=") {
if mem_str != "[not set]" {
if let Ok(memory_bytes) = mem_str.parse::<u64>() {
return Ok(memory_bytes as f32 / (1024.0 * 1024.0)); // Convert to MB
}
}
}
}
}
Ok(0.0)
}
/// Check if service collection cache should be updated /// Check if service collection cache should be updated
fn should_update_cache(&self) -> bool { fn should_update_cache(&self) -> bool {
let state = self.state.read().unwrap(); let state = self.state.read().unwrap();
match state.last_collection { match state.last_collection {
None => true, None => true,
Some(last) => { Some(last) => {
@@ -559,16 +491,6 @@ impl SystemdCollector {
} }
} }
/// Get cached service data if available and fresh
fn get_cached_services(&self) -> Option<Vec<ServiceInfo>> {
if !self.should_update_cache() {
let state = self.state.read().unwrap();
Some(state.services.clone())
} else {
None
}
}
/// Get cached complete service data with sub-services if available and fresh /// Get cached complete service data with sub-services if available and fresh
fn get_cached_complete_services(&self) -> Option<Vec<ServiceData>> { fn get_cached_complete_services(&self) -> Option<Vec<ServiceData>> {
if !self.should_update_cache() { if !self.should_update_cache() {
@@ -807,9 +729,10 @@ impl SystemdCollector {
fn get_docker_containers(&self) -> Vec<(String, String)> { fn get_docker_containers(&self) -> Vec<(String, String)> {
let mut containers = Vec::new(); let mut containers = Vec::new();
// Check if docker is available // Check if docker is available (cm-agent user is in docker group)
let output = Command::new("docker") // Use -a to show ALL containers (running and stopped) with 3 second timeout
.args(&["ps", "--format", "{{.Names}},{{.Status}}"]) let output = Command::new("timeout")
.args(&["3", "docker", "ps", "-a", "--format", "{{.Names}},{{.Status}}"])
.output(); .output();
let output = match output { let output = match output {
@@ -834,10 +757,10 @@ impl SystemdCollector {
let container_status = if status_str.contains("Up") { let container_status = if status_str.contains("Up") {
"active" "active"
} else if status_str.contains("Exited") { } else if status_str.contains("Exited") || status_str.contains("Created") {
"warning" // Match original: Exited → Warning, not inactive "inactive" // Stopped/created containers are inactive
} else { } else {
"failed" // Other states → failed "failed" // Other states (restarting, paused, dead) → failed
}; };
containers.push((format!("docker_{}", container_name), container_status.to_string())); containers.push((format!("docker_{}", container_name), container_status.to_string()));
@@ -846,11 +769,123 @@ impl SystemdCollector {
containers containers
} }
/// Get docker images as sub-services
fn get_docker_images(&self) -> Vec<(String, String, f32)> {
let mut images = Vec::new();
// Check if docker is available (cm-agent user is in docker group) with 3 second timeout
let output = Command::new("timeout")
.args(&["3", "docker", "images", "--format", "{{.Repository}}:{{.Tag}},{{.Size}}"])
.output();
let output = match output {
Ok(out) if out.status.success() => out,
Ok(_) => {
return images;
}
Err(_) => {
return images;
}
};
let output_str = match String::from_utf8(output.stdout) {
Ok(s) => s,
Err(_) => return images,
};
for line in output_str.lines() {
if line.trim().is_empty() {
continue;
}
let parts: Vec<&str> = line.split(',').collect();
if parts.len() >= 2 {
let image_name = parts[0].trim();
let size_str = parts[1].trim();
// Skip <none>:<none> images (dangling images)
if image_name.contains("<none>") {
continue;
}
// Parse size to MB (sizes come as "142MB", "1.5GB", "512kB", etc.)
let size_mb = self.parse_docker_size(size_str);
images.push((
image_name.to_string(),
"inactive".to_string(), // Images are informational - use inactive for neutral display
size_mb
));
}
}
images
}
/// Parse Docker size string to MB
fn parse_docker_size(&self, size_str: &str) -> f32 {
let size_upper = size_str.to_uppercase();
// Extract numeric part and unit
let mut num_str = String::new();
let mut unit = String::new();
for ch in size_upper.chars() {
if ch.is_ascii_digit() || ch == '.' {
num_str.push(ch);
} else if ch.is_alphabetic() {
unit.push(ch);
}
}
let value: f32 = num_str.parse().unwrap_or(0.0);
// Convert to MB
match unit.as_str() {
"KB" | "K" => value / 1024.0,
"MB" | "M" => value,
"GB" | "G" => value * 1024.0,
"TB" | "T" => value * 1024.0 * 1024.0,
_ => value, // Assume bytes if no unit
}
}
/// Get VPN external IP by querying through the vpn namespace
fn get_vpn_external_ip(&self) -> Option<String> {
let output = Command::new("timeout")
.args(&[
"5",
"sudo",
"ip",
"netns",
"exec",
"vpn",
"curl",
"-s",
"--max-time",
"4",
"https://ifconfig.me"
])
.output()
.ok()?;
if output.status.success() {
let ip = String::from_utf8_lossy(&output.stdout).trim().to_string();
if !ip.is_empty() && ip.contains('.') {
return Some(ip);
}
}
None
}
} }
#[async_trait] #[async_trait]
impl Collector for SystemdCollector { impl Collector for SystemdCollector {
async fn collect_structured(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> { async fn collect_structured(&self, agent_data: &mut AgentData) -> Result<(), CollectorError> {
// Clear services to prevent duplicates when updating cached data
agent_data.services.clear();
// Use cached complete data if available and fresh // Use cached complete data if available and fresh
if let Some(cached_complete_services) = self.get_cached_complete_services() { if let Some(cached_complete_services) = self.get_cached_complete_services() {
for service_data in cached_complete_services { for service_data in cached_complete_services {

View File

@@ -5,10 +5,9 @@ use zmq::{Context, Socket, SocketType};
use crate::config::ZmqConfig; use crate::config::ZmqConfig;
/// ZMQ communication handler for publishing metrics and receiving commands /// ZMQ communication handler for publishing metrics
pub struct ZmqHandler { pub struct ZmqHandler {
publisher: Socket, publisher: Socket,
command_receiver: Socket,
} }
impl ZmqHandler { impl ZmqHandler {
@@ -26,20 +25,8 @@ impl ZmqHandler {
publisher.set_sndhwm(1000)?; // High water mark for outbound messages publisher.set_sndhwm(1000)?; // High water mark for outbound messages
publisher.set_linger(1000)?; // Linger time on close publisher.set_linger(1000)?; // Linger time on close
// Create command receiver socket (PULL socket to receive commands from dashboard)
let command_receiver = context.socket(SocketType::PULL)?;
let cmd_bind_address = format!("tcp://{}:{}", config.bind_address, config.command_port);
command_receiver.bind(&cmd_bind_address)?;
info!("ZMQ command receiver bound to {}", cmd_bind_address);
// Set non-blocking mode for command receiver
command_receiver.set_rcvtimeo(0)?; // Non-blocking receive
command_receiver.set_linger(1000)?;
Ok(Self { Ok(Self {
publisher, publisher,
command_receiver,
}) })
} }
@@ -65,36 +52,4 @@ impl ZmqHandler {
Ok(()) Ok(())
} }
/// Try to receive a command (non-blocking)
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
match self.command_receiver.recv_bytes(zmq::DONTWAIT) {
Ok(bytes) => {
debug!("Received command message ({} bytes)", bytes.len());
let command: AgentCommand = serde_json::from_slice(&bytes)
.map_err(|e| anyhow::anyhow!("Failed to deserialize command: {}", e))?;
debug!("Parsed command: {:?}", command);
Ok(Some(command))
}
Err(zmq::Error::EAGAIN) => {
// No message available (non-blocking)
Ok(None)
}
Err(e) => Err(anyhow::anyhow!("ZMQ receive error: {}", e)),
}
}
}
/// Commands that can be sent to the agent
#[derive(Debug, Clone, serde::Deserialize, serde::Serialize)]
pub enum AgentCommand {
/// Request immediate metric collection
CollectNow,
/// Change collection interval
SetInterval { seconds: u64 },
/// Enable/disable a collector
ToggleCollector { name: String, enabled: bool },
/// Request status/health check
Ping,
} }

View File

@@ -1,2 +0,0 @@
// This file is now empty - all configuration values come from config files
// No hardcoded defaults are used

View File

@@ -13,14 +13,12 @@ pub struct AgentConfig {
pub collectors: CollectorConfig, pub collectors: CollectorConfig,
pub cache: CacheConfig, pub cache: CacheConfig,
pub notifications: NotificationConfig, pub notifications: NotificationConfig,
pub collection_interval_seconds: u64,
} }
/// ZMQ communication configuration /// ZMQ communication configuration
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ZmqConfig { pub struct ZmqConfig {
pub publisher_port: u16, pub publisher_port: u16,
pub command_port: u16,
pub bind_address: String, pub bind_address: String,
pub transmission_interval_seconds: u64, pub transmission_interval_seconds: u64,
/// Heartbeat transmission interval in seconds for host connectivity detection /// Heartbeat transmission interval in seconds for host connectivity detection

View File

@@ -7,21 +7,13 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
bail!("ZMQ publisher port cannot be 0"); bail!("ZMQ publisher port cannot be 0");
} }
if config.zmq.command_port == 0 {
bail!("ZMQ command port cannot be 0");
}
if config.zmq.publisher_port == config.zmq.command_port {
bail!("ZMQ publisher and command ports cannot be the same");
}
if config.zmq.bind_address.is_empty() { if config.zmq.bind_address.is_empty() {
bail!("ZMQ bind address cannot be empty"); bail!("ZMQ bind address cannot be empty");
} }
// Validate collection interval // Validate ZMQ transmission interval
if config.collection_interval_seconds == 0 { if config.zmq.transmission_interval_seconds == 0 {
bail!("Collection interval cannot be 0"); bail!("ZMQ transmission interval cannot be 0");
} }
// Validate CPU thresholds // Validate CPU thresholds

View File

@@ -8,7 +8,6 @@ mod collectors;
mod communication; mod communication;
mod config; mod config;
mod notifications; mod notifications;
mod service_tracker;
use agent::Agent; use agent::Agent;

View File

@@ -1,164 +0,0 @@
use anyhow::Result;
use serde::{Deserialize, Serialize};
use std::collections::HashSet;
use std::fs;
use std::path::Path;
use std::sync::{Arc, Mutex, OnceLock};
use tracing::{debug, info, warn};
/// Shared instance for global access
static GLOBAL_TRACKER: OnceLock<Arc<Mutex<UserStoppedServiceTracker>>> = OnceLock::new();
/// Tracks services that have been stopped by user action
/// These services should be treated as OK status instead of Warning
#[derive(Debug)]
pub struct UserStoppedServiceTracker {
/// Set of services stopped by user action
user_stopped_services: HashSet<String>,
/// Path to persistent storage file
storage_path: String,
}
/// Serializable data structure for persistence
#[derive(Debug, Serialize, Deserialize)]
struct UserStoppedData {
services: Vec<String>,
}
impl UserStoppedServiceTracker {
/// Create new tracker with default storage path
pub fn new() -> Self {
Self::with_storage_path("/var/lib/cm-dashboard/user-stopped-services.json")
}
/// Initialize global instance (called by agent)
pub fn init_global() -> Result<Self> {
let tracker = Self::new();
// Set global instance
let global_instance = Arc::new(Mutex::new(tracker));
if GLOBAL_TRACKER.set(global_instance).is_err() {
warn!("Global service tracker was already initialized");
}
// Return a new instance for the agent to use
Ok(Self::new())
}
/// Check if a service is user-stopped (global access for collectors)
pub fn is_service_user_stopped(service_name: &str) -> bool {
if let Some(global) = GLOBAL_TRACKER.get() {
if let Ok(tracker) = global.lock() {
tracker.is_user_stopped(service_name)
} else {
debug!("Failed to lock global service tracker");
false
}
} else {
debug!("Global service tracker not initialized");
false
}
}
/// Update global tracker (called by agent when tracker state changes)
pub fn update_global(updated_tracker: &UserStoppedServiceTracker) {
if let Some(global) = GLOBAL_TRACKER.get() {
if let Ok(mut tracker) = global.lock() {
tracker.user_stopped_services = updated_tracker.user_stopped_services.clone();
} else {
debug!("Failed to lock global service tracker for update");
}
} else {
debug!("Global service tracker not initialized for update");
}
}
/// Create new tracker with custom storage path
pub fn with_storage_path<P: AsRef<Path>>(storage_path: P) -> Self {
let storage_path = storage_path.as_ref().to_string_lossy().to_string();
let mut tracker = Self {
user_stopped_services: HashSet::new(),
storage_path,
};
// Load existing data from storage
if let Err(e) = tracker.load_from_storage() {
warn!("Failed to load user-stopped services from storage: {}", e);
info!("Starting with empty user-stopped services list");
}
tracker
}
/// Clear user-stopped flag for a service (when user starts it)
pub fn clear_user_stopped(&mut self, service_name: &str) -> Result<()> {
if self.user_stopped_services.remove(service_name) {
info!("Cleared user-stopped flag for service '{}'", service_name);
self.save_to_storage()?;
debug!("Service '{}' user-stopped flag cleared and saved to storage", service_name);
} else {
debug!("Service '{}' was not marked as user-stopped", service_name);
}
Ok(())
}
/// Check if a service is marked as user-stopped
pub fn is_user_stopped(&self, service_name: &str) -> bool {
let is_stopped = self.user_stopped_services.contains(service_name);
debug!("Service '{}' user-stopped status: {}", service_name, is_stopped);
is_stopped
}
/// Save current state to persistent storage
fn save_to_storage(&self) -> Result<()> {
// Create parent directory if it doesn't exist
if let Some(parent_dir) = Path::new(&self.storage_path).parent() {
if !parent_dir.exists() {
fs::create_dir_all(parent_dir)?;
debug!("Created parent directory: {}", parent_dir.display());
}
}
let data = UserStoppedData {
services: self.user_stopped_services.iter().cloned().collect(),
};
let json_data = serde_json::to_string_pretty(&data)?;
fs::write(&self.storage_path, json_data)?;
debug!(
"Saved {} user-stopped services to {}",
data.services.len(),
self.storage_path
);
Ok(())
}
/// Load state from persistent storage
fn load_from_storage(&mut self) -> Result<()> {
if !Path::new(&self.storage_path).exists() {
debug!("Storage file {} does not exist, starting fresh", self.storage_path);
return Ok(());
}
let json_data = fs::read_to_string(&self.storage_path)?;
let data: UserStoppedData = serde_json::from_str(&json_data)?;
self.user_stopped_services = data.services.into_iter().collect();
info!(
"Loaded {} user-stopped services from {}",
self.user_stopped_services.len(),
self.storage_path
);
if !self.user_stopped_services.is_empty() {
debug!("User-stopped services: {:?}", self.user_stopped_services);
}
Ok(())
}
}

View File

@@ -1,1001 +0,0 @@
warning: fields `total_services`, `backup_disk_filesystem_label`, `services_completed_count`, `services_failed_count`, and `services_disabled_count` are never read
--> dashboard/src/ui/widgets/backup.rs:22:5
|
14 | pub struct BackupWidget {
| ------------ fields in this struct
...
22 | total_services: Option<i64>,
| ^^^^^^^^^^^^^^
...
36 | backup_disk_filesystem_label: Option<String>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
37 | /// Number of completed services
38 | services_completed_count: Option<i64>,
| ^^^^^^^^^^^^^^^^^^^^^^^^
39 | /// Number of failed services
40 | services_failed_count: Option<i64>,
| ^^^^^^^^^^^^^^^^^^^^^
41 | /// Number of disabled services
42 | services_disabled_count: Option<i64>,
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `BackupWidget` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
= note: `#[warn(dead_code)]` on by default
warning: field `exit_code` is never read
--> dashboard/src/ui/widgets/backup.rs:53:5
|
50 | struct ServiceMetricData {
| ----------------- field in this struct
...
53 | exit_code: Option<i64>,
| ^^^^^^^^^
|
= note: `ServiceMetricData` has derived impls for the traits `Clone` and `Debug`, but these are intentionally ignored during dead code analysis
warning: associated function `extract_service_name` is never used
--> dashboard/src/ui/widgets/backup.rs:115:8
|
58 | impl BackupWidget {
| ----------------- associated function in this implementation
...
115 | fn extract_service_name(metric_name: &str) -> Option<String> {
| ^^^^^^^^^^^^^^^^^^^^
warning: method `update_from_metrics` is never used
--> dashboard/src/ui/widgets/backup.rs:157:8
|
156 | impl BackupWidget {
| ----------------- method in this implementation
157 | fn update_from_metrics(&mut self, metrics: &[&Metric]) {
| ^^^^^^^^^^^^^^^^^^^
warning: associated function `extract_service_info` is never used
--> dashboard/src/ui/widgets/services.rs:50:8
|
38 | impl ServicesWidget {
| ------------------- associated function in this implementation
...
50 | fn extract_service_info(metric_name: &str) -> Option<(String, Option<String>)> {
| ^^^^^^^^^^^^^^^^^^^^
warning: method `update_from_metrics` is never used
--> dashboard/src/ui/widgets/services.rs:285:8
|
284 | impl ServicesWidget {
| ------------------- method in this implementation
285 | fn update_from_metrics(&mut self, metrics: &[&Metric]) {
| ^^^^^^^^^^^^^^^^^^^
warning: field `health_status` is never read
--> dashboard/src/ui/widgets/system.rs:53:5
|
43 | struct StoragePool {
| ----------- field in this struct
...
53 | health_status: Status, // Separate status for pool health vs usage
| ^^^^^^^^^^^^^
|
= note: `StoragePool` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis
warning: `cm-dashboard` (bin "cm-dashboard") generated 7 warnings
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.16s
Running `target/debug/cm-dashboard --headless --raw-data`
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936501,
"system": {
"cpu": {
"load_1min": 1.82,
"load_5min": 2.1,
"load_15min": 2.1,
"frequency_mhz": 3743.09,
"temperature_celsius": 55.0
},
"memory": {
"usage_percent": 27.183601,
"total_gb": 23.339516,
"used_gb": 6.3445206,
"available_gb": 16.994995,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.094376,
"used_gb": 0.3018875,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.582031,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936502,
"system": {
"cpu": {
"load_1min": 1.82,
"load_5min": 2.1,
"load_15min": 2.1,
"frequency_mhz": 3743.09,
"temperature_celsius": 55.0
},
"memory": {
"usage_percent": 27.183601,
"total_gb": 23.339516,
"used_gb": 6.3445206,
"available_gb": 16.994995,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.094376,
"used_gb": 0.3018875,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.582031,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936503,
"system": {
"cpu": {
"load_1min": 1.82,
"load_5min": 2.1,
"load_15min": 2.1,
"frequency_mhz": 3743.09,
"temperature_celsius": 55.0
},
"memory": {
"usage_percent": 27.183601,
"total_gb": 23.339516,
"used_gb": 6.3445206,
"available_gb": 16.994995,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.094376,
"used_gb": 0.3018875,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.582031,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936505,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3600.005,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 26.780334,
"total_gb": 23.339516,
"used_gb": 6.2504005,
"available_gb": 17.089115,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936506,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3600.005,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 26.780334,
"total_gb": 23.339516,
"used_gb": 6.2504005,
"available_gb": 17.089115,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936507,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3600.005,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 26.780334,
"total_gb": 23.339516,
"used_gb": 6.2504005,
"available_gb": 17.089115,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936508,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3600.005,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 26.780334,
"total_gb": 23.339516,
"used_gb": 6.2504005,
"available_gb": 17.089115,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936509,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3638.71,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 27.014532,
"total_gb": 23.339516,
"used_gb": 6.3050613,
"available_gb": 17.034454,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936509,
"system": {
"cpu": {
"load_1min": 0.0,
"load_5min": 0.0,
"load_15min": 0.0,
"frequency_mhz": 0.0,
"temperature_celsius": null
},
"memory": {
"usage_percent": 0.0,
"total_gb": 0.0,
"used_gb": 0.0,
"available_gb": 0.0,
"swap_total_gb": 0.0,
"swap_used_gb": 0.0,
"tmpfs": []
},
"storage": {
"drives": [],
"pools": []
}
},
"services": [],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936510,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3638.71,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 27.014532,
"total_gb": 23.339516,
"used_gb": 6.3050613,
"available_gb": 17.034454,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936511,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3638.71,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 27.014532,
"total_gb": 23.339516,
"used_gb": 6.3050613,
"available_gb": 17.034454,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
RAW AGENT DATA FROM cmbox:
{
"hostname": "cmbox",
"agent_version": "v0.1.133",
"timestamp": 1763936512,
"system": {
"cpu": {
"load_1min": 1.75,
"load_5min": 2.08,
"load_15min": 2.1,
"frequency_mhz": 3638.71,
"temperature_celsius": 56.0
},
"memory": {
"usage_percent": 27.014532,
"total_gb": 23.339516,
"used_gb": 6.3050613,
"available_gb": 17.034454,
"swap_total_gb": 14.634708,
"swap_used_gb": 0.17599106,
"tmpfs": [
{
"mount": "/tmp",
"usage_percent": 15.095139,
"used_gb": 0.30190277,
"total_gb": 2.0
}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 28.0,
"wear_percent": 1.0,
"filesystems": [
{
"mount": "root",
"usage_percent": 24.404377,
"used_gb": 226.51398,
"total_gb": 928.1695
},
{
"mount": "boot",
"usage_percent": 10.666672,
"used_gb": 0.10645676,
"total_gb": 0.9980316
}
]
}
],
"pools": []
}
},
"services": [
{
"name": "tailscaled",
"status": "active",
"memory_mb": 25.59375,
"disk_gb": 0.0,
"user_stopped": false
},
{
"name": "sshd",
"status": "active",
"memory_mb": 4.3085938,
"disk_gb": 0.0,
"user_stopped": false
}
],
"backup": {
"status": "unknown",
"last_run": null,
"next_scheduled": null,
"total_size_gb": null,
"repository_health": null
}
}
────────────────────────────────────────────────────────────────────────────────
Terminated

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.152" version = "0.1.239"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -20,13 +20,12 @@ pub struct Dashboard {
tui_app: Option<TuiApp>, tui_app: Option<TuiApp>,
terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>, terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>,
headless: bool, headless: bool,
raw_data: bool,
initial_commands_sent: std::collections::HashSet<String>, initial_commands_sent: std::collections::HashSet<String>,
config: DashboardConfig, config: DashboardConfig,
} }
impl Dashboard { impl Dashboard {
pub async fn new(config_path: Option<String>, headless: bool, raw_data: bool) -> Result<Self> { pub async fn new(config_path: Option<String>, headless: bool) -> Result<Self> {
info!("Initializing dashboard"); info!("Initializing dashboard");
// Load configuration - try default path if not specified // Load configuration - try default path if not specified
@@ -120,7 +119,6 @@ impl Dashboard {
tui_app, tui_app,
terminal, terminal,
headless, headless,
raw_data,
initial_commands_sent: std::collections::HashSet::new(), initial_commands_sent: std::collections::HashSet::new(),
config, config,
}) })
@@ -205,13 +203,6 @@ impl Dashboard {
.insert(agent_data.hostname.clone()); .insert(agent_data.hostname.clone());
} }
// Show raw data if requested (before processing)
if self.raw_data {
println!("RAW AGENT DATA FROM {}:", agent_data.hostname);
println!("{}", serde_json::to_string_pretty(&agent_data).unwrap_or_else(|e| format!("Serialization error: {}", e)));
println!("{}", "".repeat(80));
}
// Store structured data directly // Store structured data directly
self.metric_store.store_agent_data(agent_data); self.metric_store.store_agent_data(agent_data);
@@ -224,7 +215,7 @@ impl Dashboard {
// Update TUI with new metrics (only if not headless) // Update TUI with new metrics (only if not headless)
if let Some(ref mut tui_app) = self.tui_app { if let Some(ref mut tui_app) = self.tui_app {
tui_app.update_metrics(&self.metric_store); tui_app.update_metrics(&mut self.metric_store);
} }
} }

View File

@@ -51,10 +51,6 @@ struct Cli {
/// Run in headless mode (no TUI, just logging) /// Run in headless mode (no TUI, just logging)
#[arg(long)] #[arg(long)]
headless: bool, headless: bool,
/// Show raw agent data in headless mode
#[arg(long)]
raw_data: bool,
} }
#[tokio::main] #[tokio::main]
@@ -90,7 +86,7 @@ async fn main() -> Result<()> {
} }
// Create and run dashboard // Create and run dashboard
let mut dashboard = Dashboard::new(cli.config, cli.headless, cli.raw_data).await?; let mut dashboard = Dashboard::new(cli.config, cli.headless).await?;
// Setup graceful shutdown // Setup graceful shutdown
let ctrl_c = async { let ctrl_c = async {

View File

@@ -5,6 +5,14 @@ use tracing::{debug, info, warn};
use super::MetricDataPoint; use super::MetricDataPoint;
/// ZMQ communication statistics per host
#[derive(Debug, Clone)]
pub struct ZmqStats {
pub packets_received: u64,
pub last_packet_time: Instant,
pub last_packet_age_secs: f64,
}
/// Central metric storage for the dashboard /// Central metric storage for the dashboard
pub struct MetricStore { pub struct MetricStore {
/// Current structured data: hostname -> AgentData /// Current structured data: hostname -> AgentData
@@ -13,6 +21,8 @@ pub struct MetricStore {
historical_metrics: HashMap<String, Vec<MetricDataPoint>>, historical_metrics: HashMap<String, Vec<MetricDataPoint>>,
/// Last heartbeat timestamp per host /// Last heartbeat timestamp per host
last_heartbeat: HashMap<String, Instant>, last_heartbeat: HashMap<String, Instant>,
/// ZMQ communication statistics per host
zmq_stats: HashMap<String, ZmqStats>,
/// Configuration /// Configuration
max_metrics_per_host: usize, max_metrics_per_host: usize,
history_retention: Duration, history_retention: Duration,
@@ -24,6 +34,7 @@ impl MetricStore {
current_agent_data: HashMap::new(), current_agent_data: HashMap::new(),
historical_metrics: HashMap::new(), historical_metrics: HashMap::new(),
last_heartbeat: HashMap::new(), last_heartbeat: HashMap::new(),
zmq_stats: HashMap::new(),
max_metrics_per_host, max_metrics_per_host,
history_retention: Duration::from_secs(history_retention_hours * 3600), history_retention: Duration::from_secs(history_retention_hours * 3600),
} }
@@ -44,6 +55,16 @@ impl MetricStore {
self.last_heartbeat.insert(hostname.clone(), now); self.last_heartbeat.insert(hostname.clone(), now);
debug!("Updated heartbeat for host {}", hostname); debug!("Updated heartbeat for host {}", hostname);
// Update ZMQ stats
let stats = self.zmq_stats.entry(hostname.clone()).or_insert(ZmqStats {
packets_received: 0,
last_packet_time: now,
last_packet_age_secs: 0.0,
});
stats.packets_received += 1;
stats.last_packet_time = now;
stats.last_packet_age_secs = 0.0; // Just received
// Add to history // Add to history
let host_history = self let host_history = self
.historical_metrics .historical_metrics
@@ -65,6 +86,15 @@ impl MetricStore {
self.current_agent_data.get(hostname) self.current_agent_data.get(hostname)
} }
/// Get ZMQ communication statistics for a host
pub fn get_zmq_stats(&mut self, hostname: &str) -> Option<ZmqStats> {
let now = Instant::now();
self.zmq_stats.get_mut(hostname).map(|stats| {
// Update packet age
stats.last_packet_age_secs = now.duration_since(stats.last_packet_time).as_secs_f64();
stats.clone()
})
}
/// Get connected hosts (hosts with recent heartbeats) /// Get connected hosts (hosts with recent heartbeats)
pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> { pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> {

View File

@@ -100,7 +100,7 @@ impl TuiApp {
} }
/// Update widgets with structured data from store (only for current host) /// Update widgets with structured data from store (only for current host)
pub fn update_metrics(&mut self, metric_store: &MetricStore) { pub fn update_metrics(&mut self, metric_store: &mut MetricStore) {
if let Some(hostname) = self.current_host.clone() { if let Some(hostname) = self.current_host.clone() {
// Get structured data for this host // Get structured data for this host
if let Some(agent_data) = metric_store.get_agent_data(&hostname) { if let Some(agent_data) = metric_store.get_agent_data(&hostname) {
@@ -110,6 +110,14 @@ impl TuiApp {
host_widgets.system_widget.update_from_agent_data(agent_data); host_widgets.system_widget.update_from_agent_data(agent_data);
host_widgets.services_widget.update_from_agent_data(agent_data); host_widgets.services_widget.update_from_agent_data(agent_data);
// Update ZMQ stats
if let Some(zmq_stats) = metric_store.get_zmq_stats(&hostname) {
host_widgets.system_widget.update_zmq_stats(
zmq_stats.packets_received,
zmq_stats.last_packet_age_secs
);
}
host_widgets.last_update = Some(Instant::now()); host_widgets.last_update = Some(Instant::now());
} }
} }

View File

@@ -142,6 +142,7 @@ impl Theme {
/// Get color for status level /// Get color for status level
pub fn status_color(status: Status) -> Color { pub fn status_color(status: Status) -> Color {
match status { match status {
Status::Info => Self::muted_text(), // Gray for informational data
Status::Ok => Self::success(), Status::Ok => Self::success(),
Status::Inactive => Self::muted_text(), // Gray for inactive services in service list Status::Inactive => Self::muted_text(), // Gray for inactive services in service list
Status::Pending => Self::highlight(), // Blue for pending Status::Pending => Self::highlight(), // Blue for pending
@@ -225,9 +226,6 @@ impl Layout {
pub const LEFT_PANEL_WIDTH: u16 = 45; pub const LEFT_PANEL_WIDTH: u16 = 45;
/// Right panel percentage (services) /// Right panel percentage (services)
pub const RIGHT_PANEL_WIDTH: u16 = 55; pub const RIGHT_PANEL_WIDTH: u16 = 55;
/// System vs backup split (equal)
pub const SYSTEM_PANEL_HEIGHT: u16 = 50;
pub const BACKUP_PANEL_HEIGHT: u16 = 50;
} }
/// Typography system /// Typography system
@@ -243,6 +241,7 @@ impl StatusIcons {
/// Get status icon symbol /// Get status icon symbol
pub fn get_icon(status: Status) -> &'static str { pub fn get_icon(status: Status) -> &'static str {
match status { match status {
Status::Info => "", // No icon for informational data
Status::Ok => "", Status::Ok => "",
Status::Inactive => "", // Empty circle for inactive services Status::Inactive => "", // Empty circle for inactive services
Status::Pending => "", // Hollow circle for pending Status::Pending => "", // Hollow circle for pending
@@ -257,6 +256,7 @@ impl StatusIcons {
pub fn create_status_spans(status: Status, text: &str) -> Vec<ratatui::text::Span<'static>> { pub fn create_status_spans(status: Status, text: &str) -> Vec<ratatui::text::Span<'static>> {
let icon = Self::get_icon(status); let icon = Self::get_icon(status);
let status_color = match status { let status_color = match status {
Status::Info => Theme::muted_text(), // Gray for info
Status::Ok => Theme::success(), // Green Status::Ok => Theme::success(), // Green
Status::Inactive => Theme::muted_text(), // Gray for inactive services Status::Inactive => Theme::muted_text(), // Gray for inactive services
Status::Pending => Theme::highlight(), // Blue Status::Pending => Theme::highlight(), // Blue

View File

@@ -1 +0,0 @@
// This file is intentionally left minimal - CPU functionality is handled by the SystemWidget

View File

@@ -1 +0,0 @@
// This file is intentionally left minimal - Memory functionality is handled by the SystemWidget

View File

@@ -1,7 +1,5 @@
use cm_dashboard_shared::AgentData; use cm_dashboard_shared::AgentData;
pub mod cpu;
pub mod memory;
pub mod services; pub mod services;
pub mod system; pub mod system;

View File

@@ -11,6 +11,74 @@ use tracing::debug;
use crate::ui::theme::{Components, StatusIcons, Theme, Typography}; use crate::ui::theme::{Components, StatusIcons, Theme, Typography};
use ratatui::style::Style; use ratatui::style::Style;
/// Column visibility configuration based on terminal width
#[derive(Debug, Clone, Copy)]
struct ColumnVisibility {
show_name: bool,
show_status: bool,
show_ram: bool,
show_uptime: bool,
show_restarts: bool,
}
impl ColumnVisibility {
/// Calculate actual width needed for all columns
const NAME_WIDTH: u16 = 23;
const STATUS_WIDTH: u16 = 10;
const RAM_WIDTH: u16 = 8;
const UPTIME_WIDTH: u16 = 8;
const RESTARTS_WIDTH: u16 = 5;
const COLUMN_SPACING: u16 = 1; // Space between columns
/// Determine which columns to show based on available width
/// Priority order: Name > Status > RAM > Uptime > Restarts
fn from_width(width: u16) -> Self {
// Calculate cumulative widths for each configuration
let minimal = Self::NAME_WIDTH + Self::COLUMN_SPACING + Self::STATUS_WIDTH; // 34
let with_ram = minimal + Self::COLUMN_SPACING + Self::RAM_WIDTH; // 43
let with_uptime = with_ram + Self::COLUMN_SPACING + Self::UPTIME_WIDTH; // 52
let full = with_uptime + Self::COLUMN_SPACING + Self::RESTARTS_WIDTH; // 58
if width >= full {
// Show all columns
Self {
show_name: true,
show_status: true,
show_ram: true,
show_uptime: true,
show_restarts: true,
}
} else if width >= with_uptime {
// Hide restarts
Self {
show_name: true,
show_status: true,
show_ram: true,
show_uptime: true,
show_restarts: false,
}
} else if width >= with_ram {
// Hide uptime and restarts
Self {
show_name: true,
show_status: true,
show_ram: true,
show_uptime: false,
show_restarts: false,
}
} else {
// Minimal: Name + Status only
Self {
show_name: true,
show_status: true,
show_ram: false,
show_uptime: false,
show_restarts: false,
}
}
}
}
/// Services widget displaying hierarchical systemd service statuses /// Services widget displaying hierarchical systemd service statuses
#[derive(Clone)] #[derive(Clone)]
pub struct ServicesWidget { pub struct ServicesWidget {
@@ -28,10 +96,12 @@ pub struct ServicesWidget {
#[derive(Clone)] #[derive(Clone)]
struct ServiceInfo { struct ServiceInfo {
memory_mb: Option<f32>,
disk_gb: Option<f32>,
metrics: Vec<(String, f32, Option<String>)>, // (label, value, unit) metrics: Vec<(String, f32, Option<String>)>, // (label, value, unit)
widget_status: Status, widget_status: Status,
service_type: String, // "nginx_site", "container", "image", or empty for parent services
memory_bytes: Option<u64>,
restart_count: Option<u32>,
uptime_seconds: Option<u64>,
} }
impl ServicesWidget { impl ServicesWidget {
@@ -51,8 +121,6 @@ impl ServicesWidget {
if metric_name.starts_with("service_") { if metric_name.starts_with("service_") {
if let Some(end_pos) = metric_name if let Some(end_pos) = metric_name
.rfind("_status") .rfind("_status")
.or_else(|| metric_name.rfind("_memory_mb"))
.or_else(|| metric_name.rfind("_disk_gb"))
.or_else(|| metric_name.rfind("_latency_ms")) .or_else(|| metric_name.rfind("_latency_ms"))
{ {
let service_part = &metric_name[8..end_pos]; // Remove "service_" prefix let service_part = &metric_name[8..end_pos]; // Remove "service_" prefix
@@ -75,47 +143,22 @@ impl ServicesWidget {
None None
} }
/// Format disk size with appropriate units (kB/MB/GB)
fn format_disk_size(size_gb: f32) -> String {
let size_mb = size_gb * 1024.0; // Convert GB to MB
if size_mb >= 1024.0 {
// Show as GB
format!("{:.1}GB", size_gb)
} else if size_mb >= 1.0 {
// Show as MB
format!("{:.0}MB", size_mb)
} else if size_mb >= 0.001 {
// Convert to kB
let size_kb = size_mb * 1024.0;
format!("{:.0}kB", size_kb)
} else {
// Show very small sizes as bytes
let size_bytes = size_mb * 1024.0 * 1024.0;
format!("{:.0}B", size_bytes)
}
}
/// Format parent service line - returns text without icon for span formatting /// Format parent service line - returns text without icon for span formatting
fn format_parent_service_line(&self, name: &str, info: &ServiceInfo) -> String { fn format_parent_service_line(&self, name: &str, info: &ServiceInfo, columns: ColumnVisibility) -> String {
let memory_str = info // Truncate long service names to fit layout
.memory_mb // NAME_WIDTH - 3 chars for "..." = max displayable chars
.map_or("0M".to_string(), |m| format!("{:.0}M", m)); let max_name_len = (ColumnVisibility::NAME_WIDTH - 3) as usize;
let disk_str = info let short_name = if name.len() > max_name_len {
.disk_gb format!("{}...", &name[..max_name_len.saturating_sub(3)])
.map_or("0".to_string(), |d| Self::format_disk_size(d));
// Truncate long service names to fit layout (account for icon space)
let short_name = if name.len() > 22 {
format!("{}...", &name[..19])
} else { } else {
name.to_string() name.to_string()
}; };
// Convert Status enum to display text // Convert Status enum to display text
let status_str = match info.widget_status { let status_str = match info.widget_status {
Status::Info => "", // Shouldn't happen for parent services
Status::Ok => "active", Status::Ok => "active",
Status::Inactive => "inactive", Status::Inactive => "inactive",
Status::Critical => "failed", Status::Critical => "failed",
Status::Pending => "pending", Status::Pending => "pending",
Status::Warning => "warning", Status::Warning => "warning",
@@ -123,10 +166,59 @@ impl ServicesWidget {
Status::Offline => "offline", Status::Offline => "offline",
}; };
format!( // Format memory
"{:<23} {:<10} {:<8} {:<8}", let memory_str = info.memory_bytes.map_or("-".to_string(), |bytes| {
short_name, status_str, memory_str, disk_str let mb = bytes as f64 / (1024.0 * 1024.0);
) if mb >= 1000.0 {
format!("{:.1}G", mb / 1024.0)
} else {
format!("{:.0}M", mb)
}
});
// Format uptime
let uptime_str = info.uptime_seconds.map_or("-".to_string(), |secs| {
let days = secs / 86400;
let hours = (secs % 86400) / 3600;
let mins = (secs % 3600) / 60;
if days > 0 {
format!("{}d{}h", days, hours)
} else if hours > 0 {
format!("{}h{}m", hours, mins)
} else {
format!("{}m", mins)
}
});
// Format restarts (show "!" if > 0 to indicate instability)
let restart_str = info.restart_count.map_or("-".to_string(), |count| {
if count > 0 {
format!("!{}", count)
} else {
"0".to_string()
}
});
// Build format string based on column visibility
let mut parts = Vec::new();
if columns.show_name {
parts.push(format!("{:<width$}", short_name, width = ColumnVisibility::NAME_WIDTH as usize));
}
if columns.show_status {
parts.push(format!("{:<width$}", status_str, width = ColumnVisibility::STATUS_WIDTH as usize));
}
if columns.show_ram {
parts.push(format!("{:<width$}", memory_str, width = ColumnVisibility::RAM_WIDTH as usize));
}
if columns.show_uptime {
parts.push(format!("{:<width$}", uptime_str, width = ColumnVisibility::UPTIME_WIDTH as usize));
}
if columns.show_restarts {
parts.push(format!("{:<width$}", restart_str, width = ColumnVisibility::RESTARTS_WIDTH as usize));
}
parts.join(" ")
} }
@@ -148,6 +240,7 @@ impl ServicesWidget {
// Get status icon and text // Get status icon and text
let icon = StatusIcons::get_icon(info.widget_status); let icon = StatusIcons::get_icon(info.widget_status);
let status_color = match info.widget_status { let status_color = match info.widget_status {
Status::Info => Theme::muted_text(),
Status::Ok => Theme::success(), Status::Ok => Theme::success(),
Status::Inactive => Theme::muted_text(), Status::Inactive => Theme::muted_text(),
Status::Pending => Theme::highlight(), Status::Pending => Theme::highlight(),
@@ -168,8 +261,9 @@ impl ServicesWidget {
} else { } else {
// Convert Status enum to display text for sub-services // Convert Status enum to display text for sub-services
match info.widget_status { match info.widget_status {
Status::Info => "",
Status::Ok => "active", Status::Ok => "active",
Status::Inactive => "inactive", Status::Inactive => "inactive",
Status::Critical => "failed", Status::Critical => "failed",
Status::Pending => "pending", Status::Pending => "pending",
Status::Warning => "warning", Status::Warning => "warning",
@@ -179,32 +273,62 @@ impl ServicesWidget {
}; };
let tree_symbol = if is_last { "└─" } else { "├─" }; let tree_symbol = if is_last { "└─" } else { "├─" };
vec![ if info.widget_status == Status::Info {
// Indentation and tree prefix // Informational data - no status icon, show metrics if available
ratatui::text::Span::styled( let mut spans = vec![
format!(" {} ", tree_symbol), // Indentation and tree prefix
Typography::tree(), ratatui::text::Span::styled(
), format!(" {} ", tree_symbol),
// Status icon Typography::tree(),
ratatui::text::Span::styled( ),
format!("{} ", icon), // Service name (no icon)
Style::default().fg(status_color).bg(Theme::background()), ratatui::text::Span::styled(
), format!("{:<18} ", short_name),
// Service name Style::default()
ratatui::text::Span::styled( .fg(Theme::secondary_text())
format!("{:<18} ", short_name), .bg(Theme::background()),
Style::default() ),
.fg(Theme::secondary_text()) ];
.bg(Theme::background()),
), // Add metrics if available (e.g., Docker image size)
// Status/latency text if !status_str.is_empty() {
ratatui::text::Span::styled( spans.push(ratatui::text::Span::styled(
status_str, status_str,
Style::default() Style::default()
.fg(Theme::secondary_text()) .fg(Theme::secondary_text())
.bg(Theme::background()), .bg(Theme::background()),
), ));
] }
spans
} else {
vec![
// Indentation and tree prefix
ratatui::text::Span::styled(
format!(" {} ", tree_symbol),
Typography::tree(),
),
// Status icon
ratatui::text::Span::styled(
format!("{} ", icon),
Style::default().fg(status_color).bg(Theme::background()),
),
// Service name
ratatui::text::Span::styled(
format!("{:<18} ", short_name),
Style::default()
.fg(Theme::secondary_text())
.bg(Theme::background()),
),
// Status/latency text
ratatui::text::Span::styled(
status_str,
Style::default()
.fg(Theme::secondary_text())
.bg(Theme::background()),
),
]
}
} }
/// Move selection up /// Move selection up
@@ -278,13 +402,15 @@ impl Widget for ServicesWidget {
for service in &agent_data.services { for service in &agent_data.services {
// Store parent service // Store parent service
let parent_info = ServiceInfo { let parent_info = ServiceInfo {
memory_mb: Some(service.memory_mb),
disk_gb: Some(service.disk_gb),
metrics: Vec::new(), // Parent services don't have custom metrics metrics: Vec::new(), // Parent services don't have custom metrics
widget_status: service.service_status, widget_status: service.service_status,
service_type: String::new(), // Parent services have no type
memory_bytes: service.memory_bytes,
restart_count: service.restart_count,
uptime_seconds: service.uptime_seconds,
}; };
self.parent_services.insert(service.name.clone(), parent_info); self.parent_services.insert(service.name.clone(), parent_info);
// Process sub-services if any // Process sub-services if any
if !service.sub_services.is_empty() { if !service.sub_services.is_empty() {
let mut sub_list = Vec::new(); let mut sub_list = Vec::new();
@@ -293,12 +419,14 @@ impl Widget for ServicesWidget {
let metrics: Vec<(String, f32, Option<String>)> = sub_service.metrics.iter() let metrics: Vec<(String, f32, Option<String>)> = sub_service.metrics.iter()
.map(|m| (m.label.clone(), m.value, m.unit.clone())) .map(|m| (m.label.clone(), m.value, m.unit.clone()))
.collect(); .collect();
let sub_info = ServiceInfo { let sub_info = ServiceInfo {
memory_mb: None, // Not used for sub-services
disk_gb: None, // Not used for sub-services
metrics, metrics,
widget_status: sub_service.service_status, widget_status: sub_service.service_status,
service_type: sub_service.service_type.clone(),
memory_bytes: None, // Sub-services don't have individual metrics yet
restart_count: None,
uptime_seconds: None,
}; };
sub_list.push((sub_service.name.clone(), sub_info)); sub_list.push((sub_service.name.clone(), sub_info));
} }
@@ -338,22 +466,16 @@ impl ServicesWidget {
self.parent_services self.parent_services
.entry(parent_service) .entry(parent_service)
.or_insert(ServiceInfo { .or_insert(ServiceInfo {
memory_mb: None,
disk_gb: None,
metrics: Vec::new(), metrics: Vec::new(),
widget_status: Status::Unknown, widget_status: Status::Unknown,
service_type: String::new(),
memory_bytes: None,
restart_count: None,
uptime_seconds: None,
}); });
if metric.name.ends_with("_status") { if metric.name.ends_with("_status") {
service_info.widget_status = metric.status; service_info.widget_status = metric.status;
} else if metric.name.ends_with("_memory_mb") {
if let Some(memory) = metric.value.as_f32() {
service_info.memory_mb = Some(memory);
}
} else if metric.name.ends_with("_disk_gb") {
if let Some(disk) = metric.value.as_f32() {
service_info.disk_gb = Some(disk);
}
} }
} }
Some(sub_name) => { Some(sub_name) => {
@@ -373,10 +495,12 @@ impl ServicesWidget {
sub_service_list.push(( sub_service_list.push((
sub_name.clone(), sub_name.clone(),
ServiceInfo { ServiceInfo {
memory_mb: None,
disk_gb: None,
metrics: Vec::new(), metrics: Vec::new(),
widget_status: Status::Unknown, widget_status: Status::Unknown,
service_type: String::new(), // Unknown type in legacy path
memory_bytes: None,
restart_count: None,
uptime_seconds: None,
}, },
)); ));
&mut sub_service_list.last_mut().unwrap().1 &mut sub_service_list.last_mut().unwrap().1
@@ -384,14 +508,6 @@ impl ServicesWidget {
if metric.name.ends_with("_status") { if metric.name.ends_with("_status") {
sub_service_info.widget_status = metric.status; sub_service_info.widget_status = metric.status;
} else if metric.name.ends_with("_memory_mb") {
if let Some(memory) = metric.value.as_f32() {
sub_service_info.memory_mb = Some(memory);
}
} else if metric.name.ends_with("_disk_gb") {
if let Some(disk) = metric.value.as_f32() {
sub_service_info.disk_gb = Some(disk);
}
} }
} }
} }
@@ -448,11 +564,28 @@ impl ServicesWidget {
.constraints([Constraint::Length(1), Constraint::Min(0)]) .constraints([Constraint::Length(1), Constraint::Min(0)])
.split(inner_area); .split(inner_area);
// Header // Determine which columns to show based on available width
let header = format!( let columns = ColumnVisibility::from_width(inner_area.width);
"{:<25} {:<10} {:<8} {:<8}",
"Service:", "Status:", "RAM:", "Disk:" // Build header based on visible columns
); let mut header_parts = Vec::new();
if columns.show_name {
header_parts.push(format!("{:<width$}", "Service:", width = ColumnVisibility::NAME_WIDTH as usize));
}
if columns.show_status {
header_parts.push(format!("{:<width$}", "Status:", width = ColumnVisibility::STATUS_WIDTH as usize));
}
if columns.show_ram {
header_parts.push(format!("{:<width$}", "RAM:", width = ColumnVisibility::RAM_WIDTH as usize));
}
if columns.show_uptime {
header_parts.push(format!("{:<width$}", "Uptime:", width = ColumnVisibility::UPTIME_WIDTH as usize));
}
if columns.show_restarts {
header_parts.push(format!("{:<width$}", "↻:", width = ColumnVisibility::RESTARTS_WIDTH as usize));
}
let header = header_parts.join(" ");
let header_para = Paragraph::new(header).style(Typography::muted()); let header_para = Paragraph::new(header).style(Typography::muted());
frame.render_widget(header_para, content_chunks[0]); frame.render_widget(header_para, content_chunks[0]);
@@ -464,11 +597,11 @@ impl ServicesWidget {
} }
// Render the services list // Render the services list
self.render_services(frame, content_chunks[1], is_focused); self.render_services(frame, content_chunks[1], is_focused, columns);
} }
/// Render services list /// Render services list
fn render_services(&mut self, frame: &mut Frame, area: Rect, is_focused: bool) { fn render_services(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, columns: ColumnVisibility) {
// Build hierarchical service list for display // Build hierarchical service list for display
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>)> = Vec::new(); let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>)> = Vec::new();
@@ -478,7 +611,7 @@ impl ServicesWidget {
for (parent_name, parent_info) in parent_services { for (parent_name, parent_info) in parent_services {
// Add parent service line // Add parent service line
let parent_line = self.format_parent_service_line(parent_name, parent_info); let parent_line = self.format_parent_service_line(parent_name, parent_info, columns);
display_lines.push((parent_line, parent_info.widget_status, false, None)); display_lines.push((parent_line, parent_info.widget_status, false, None));
// Add sub-services for this parent (if any) // Add sub-services for this parent (if any)

View File

@@ -8,18 +8,25 @@ use ratatui::{
use crate::ui::theme::{StatusIcons, Typography}; use crate::ui::theme::{StatusIcons, Typography};
/// System widget displaying NixOS info, CPU, RAM, and Storage in unified layout /// System widget displaying NixOS info, Network, CPU, RAM, and Storage in unified layout
#[derive(Clone)] #[derive(Clone)]
pub struct SystemWidget { pub struct SystemWidget {
// NixOS information // NixOS information
nixos_build: Option<String>, nixos_build: Option<String>,
agent_hash: Option<String>, agent_hash: Option<String>,
// ZMQ communication stats
zmq_packets_received: Option<u64>,
zmq_last_packet_age: Option<f64>,
// Network interfaces
network_interfaces: Vec<cm_dashboard_shared::NetworkInterfaceData>,
// CPU metrics // CPU metrics
cpu_load_1min: Option<f32>, cpu_load_1min: Option<f32>,
cpu_load_5min: Option<f32>, cpu_load_5min: Option<f32>,
cpu_load_15min: Option<f32>, cpu_load_15min: Option<f32>,
cpu_frequency: Option<f32>, cpu_cstates: Vec<cm_dashboard_shared::CStateInfo>,
cpu_status: Status, cpu_status: Status,
// Memory metrics // Memory metrics
@@ -38,15 +45,9 @@ pub struct SystemWidget {
storage_pools: Vec<StoragePool>, storage_pools: Vec<StoragePool>,
// Backup metrics // Backup metrics
backup_status: String, backup_repositories: Vec<String>,
backup_start_time_raw: Option<String>, backup_repository_status: Status,
backup_disk_serial: Option<String>, backup_disks: Vec<cm_dashboard_shared::BackupDiskData>,
backup_disk_usage_percent: Option<f32>,
backup_disk_used_gb: Option<f32>,
backup_disk_total_gb: Option<f32>,
backup_disk_wear_percent: Option<f32>,
backup_disk_temperature: Option<f32>,
backup_last_size_gb: Option<f32>,
// Overall status // Overall status
has_data: bool, has_data: bool,
@@ -89,10 +90,13 @@ impl SystemWidget {
Self { Self {
nixos_build: None, nixos_build: None,
agent_hash: None, agent_hash: None,
zmq_packets_received: None,
zmq_last_packet_age: None,
network_interfaces: Vec::new(),
cpu_load_1min: None, cpu_load_1min: None,
cpu_load_5min: None, cpu_load_5min: None,
cpu_load_15min: None, cpu_load_15min: None,
cpu_frequency: None, cpu_cstates: Vec::new(),
cpu_status: Status::Unknown, cpu_status: Status::Unknown,
memory_usage_percent: None, memory_usage_percent: None,
memory_used_gb: None, memory_used_gb: None,
@@ -104,15 +108,9 @@ impl SystemWidget {
tmp_status: Status::Unknown, tmp_status: Status::Unknown,
tmpfs_mounts: Vec::new(), tmpfs_mounts: Vec::new(),
storage_pools: Vec::new(), storage_pools: Vec::new(),
backup_status: "unknown".to_string(), backup_repositories: Vec::new(),
backup_start_time_raw: None, backup_repository_status: Status::Unknown,
backup_disk_serial: None, backup_disks: Vec::new(),
backup_disk_usage_percent: None,
backup_disk_used_gb: None,
backup_disk_total_gb: None,
backup_disk_wear_percent: None,
backup_disk_temperature: None,
backup_last_size_gb: None,
has_data: false, has_data: false,
} }
} }
@@ -127,12 +125,19 @@ impl SystemWidget {
} }
} }
/// Format CPU frequency /// Format CPU C-states (idle depth) with percentages
fn format_cpu_frequency(&self) -> String { fn format_cpu_cstate(&self) -> String {
match self.cpu_frequency { if self.cpu_cstates.is_empty() {
Some(freq) => format!("{:.0} MHz", freq), return "".to_string();
None => "— MHz".to_string(),
} }
// Format top 3 C-states with percentages: "C10:79% C8:10% C6:8%"
// Agent already sends clean names (C3, C10, etc.)
self.cpu_cstates
.iter()
.map(|cs| format!("{}:{:.0}%", cs.name, cs.percent))
.collect::<Vec<_>>()
.join(" ")
} }
/// Format memory usage /// Format memory usage
@@ -150,6 +155,12 @@ impl SystemWidget {
pub fn _get_agent_hash(&self) -> Option<&String> { pub fn _get_agent_hash(&self) -> Option<&String> {
self.agent_hash.as_ref() self.agent_hash.as_ref()
} }
/// Update ZMQ communication statistics
pub fn update_zmq_stats(&mut self, packets_received: u64, last_packet_age_secs: f64) {
self.zmq_packets_received = Some(packets_received);
self.zmq_last_packet_age = Some(last_packet_age_secs);
}
} }
use super::Widget; use super::Widget;
@@ -164,12 +175,15 @@ impl Widget for SystemWidget {
// Extract build version // Extract build version
self.nixos_build = agent_data.build_version.clone(); self.nixos_build = agent_data.build_version.clone();
// Extract network interfaces
self.network_interfaces = agent_data.system.network.interfaces.clone();
// Extract CPU data directly // Extract CPU data directly
let cpu = &agent_data.system.cpu; let cpu = &agent_data.system.cpu;
self.cpu_load_1min = Some(cpu.load_1min); self.cpu_load_1min = Some(cpu.load_1min);
self.cpu_load_5min = Some(cpu.load_5min); self.cpu_load_5min = Some(cpu.load_5min);
self.cpu_load_15min = Some(cpu.load_15min); self.cpu_load_15min = Some(cpu.load_15min);
self.cpu_frequency = Some(cpu.frequency_mhz); self.cpu_cstates = cpu.cstates.clone();
self.cpu_status = Status::Ok; self.cpu_status = Status::Ok;
// Extract memory data directly // Extract memory data directly
@@ -195,25 +209,9 @@ impl Widget for SystemWidget {
// Extract backup data // Extract backup data
let backup = &agent_data.backup; let backup = &agent_data.backup;
self.backup_status = backup.status.clone(); self.backup_repositories = backup.repositories.clone();
self.backup_start_time_raw = backup.start_time_raw.clone(); self.backup_repository_status = backup.repository_status;
self.backup_last_size_gb = backup.last_backup_size_gb; self.backup_disks = backup.disks.clone();
if let Some(disk) = &backup.repository_disk {
self.backup_disk_serial = Some(disk.serial.clone());
self.backup_disk_usage_percent = Some(disk.usage_percent);
self.backup_disk_used_gb = Some(disk.used_gb);
self.backup_disk_total_gb = Some(disk.total_gb);
self.backup_disk_wear_percent = disk.wear_percent;
self.backup_disk_temperature = disk.temperature_celsius;
} else {
self.backup_disk_serial = None;
self.backup_disk_usage_percent = None;
self.backup_disk_used_gb = None;
self.backup_disk_total_gb = None;
self.backup_disk_wear_percent = None;
self.backup_disk_temperature = None;
}
} }
} }
@@ -239,8 +237,11 @@ impl SystemWidget {
}; };
// Add drive info // Add drive info
let display_name = drive.serial_number.as_ref()
.map(|s| truncate_serial(s))
.unwrap_or(drive.name.clone());
let storage_drive = StorageDrive { let storage_drive = StorageDrive {
name: drive.name.clone(), name: display_name,
temperature: drive.temperature_celsius, temperature: drive.temperature_celsius,
wear_percent: drive.wear_percent, wear_percent: drive.wear_percent,
status: Status::Ok, status: Status::Ok,
@@ -273,6 +274,17 @@ impl SystemWidget {
// Convert pools (MergerFS, RAID, etc.) // Convert pools (MergerFS, RAID, etc.)
for pool in &agent_data.system.storage.pools { for pool in &agent_data.system.storage.pools {
// Use agent-calculated status (combined health and usage status)
let pool_status = if pool.health_status == Status::Critical || pool.usage_status == Status::Critical {
Status::Critical
} else if pool.health_status == Status::Warning || pool.usage_status == Status::Warning {
Status::Warning
} else if pool.health_status == Status::Ok && pool.usage_status == Status::Ok {
Status::Ok
} else {
Status::Unknown
};
let mut storage_pool = StoragePool { let mut storage_pool = StoragePool {
name: pool.name.clone(), name: pool.name.clone(),
mount_point: pool.mount.clone(), mount_point: pool.mount.clone(),
@@ -284,27 +296,55 @@ impl SystemWidget {
usage_percent: Some(pool.usage_percent), usage_percent: Some(pool.usage_percent),
used_gb: Some(pool.used_gb), used_gb: Some(pool.used_gb),
total_gb: Some(pool.total_gb), total_gb: Some(pool.total_gb),
status: Status::Ok, // TODO: map pool health to status status: pool_status,
}; };
// Add data drives // Add data drives - use agent-calculated status
for drive in &pool.data_drives { for drive in &pool.data_drives {
// Use combined health and temperature status
let drive_status = if drive.health_status == Status::Critical || drive.temperature_status == Status::Critical {
Status::Critical
} else if drive.health_status == Status::Warning || drive.temperature_status == Status::Warning {
Status::Warning
} else if drive.health_status == Status::Ok && drive.temperature_status == Status::Ok {
Status::Ok
} else {
Status::Unknown
};
let display_name = drive.serial_number.as_ref()
.map(|s| truncate_serial(s))
.unwrap_or(drive.name.clone());
let storage_drive = StorageDrive { let storage_drive = StorageDrive {
name: drive.name.clone(), name: display_name,
temperature: drive.temperature_celsius, temperature: drive.temperature_celsius,
wear_percent: drive.wear_percent, wear_percent: drive.wear_percent,
status: Status::Ok, // TODO: map drive health to status status: drive_status,
}; };
storage_pool.data_drives.push(storage_drive); storage_pool.data_drives.push(storage_drive);
} }
// Add parity drives // Add parity drives - use agent-calculated status
for drive in &pool.parity_drives { for drive in &pool.parity_drives {
// Use combined health and temperature status
let drive_status = if drive.health_status == Status::Critical || drive.temperature_status == Status::Critical {
Status::Critical
} else if drive.health_status == Status::Warning || drive.temperature_status == Status::Warning {
Status::Warning
} else if drive.health_status == Status::Ok && drive.temperature_status == Status::Ok {
Status::Ok
} else {
Status::Unknown
};
let display_name = drive.serial_number.as_ref()
.map(|s| truncate_serial(s))
.unwrap_or(drive.name.clone());
let storage_drive = StorageDrive { let storage_drive = StorageDrive {
name: drive.name.clone(), name: display_name,
temperature: drive.temperature_celsius, temperature: drive.temperature_celsius,
wear_percent: drive.wear_percent, wear_percent: drive.wear_percent,
status: Status::Ok, // TODO: map drive health to status status: drive_status,
}; };
storage_pool.parity_drives.push(storage_drive); storage_pool.parity_drives.push(storage_drive);
} }
@@ -326,12 +366,8 @@ impl SystemWidget {
// Pool header line with type and health // Pool header line with type and health
let pool_label = if pool.pool_type == "drive" { let pool_label = if pool.pool_type == "drive" {
// For physical drives, show the drive name with temperature and wear percentage if available // For physical drives, show the drive name with temperature and wear percentage if available
// Look for any drive with temp/wear data (physical drives may have drives named after the pool) // Physical drives only have one drive entry
let drive_info = pool.drives.iter() if let Some(drive) = pool.drives.first() {
.find(|d| d.name == pool.name)
.or_else(|| pool.drives.first());
if let Some(drive) = drive_info {
let mut drive_details = Vec::new(); let mut drive_details = Vec::new();
if let Some(temp) = drive.temperature { if let Some(temp) = drive.temperature {
drive_details.push(format!("T: {}°C", temp as i32)); drive_details.push(format!("T: {}°C", temp as i32));
@@ -339,18 +375,18 @@ impl SystemWidget {
if let Some(wear) = drive.wear_percent { if let Some(wear) = drive.wear_percent {
drive_details.push(format!("W: {}%", wear as i32)); drive_details.push(format!("W: {}%", wear as i32));
} }
if !drive_details.is_empty() { if !drive_details.is_empty() {
format!("{} {}", pool.name, drive_details.join(" ")) format!("{} {}", drive.name, drive_details.join(" "))
} else { } else {
pool.name.clone() drive.name.clone()
} }
} else { } else {
pool.name.clone() pool.name.clone()
} }
} else { } else {
// For mergerfs pools, show pool name with format like "mergerfs (2+1):" // For mergerfs pools, show pool type with mount point
format!("{}:", pool.pool_type) format!("mergerfs {}:", pool.mount_point)
}; };
let pool_spans = StatusIcons::create_status_spans(pool.status.clone(), &pool_label); let pool_spans = StatusIcons::create_status_spans(pool.status.clone(), &pool_label);
@@ -389,7 +425,7 @@ impl SystemWidget {
// └─ Mount: /srv/media // └─ Mount: /srv/media
// Pool total usage // Pool total usage
let total_text = format!("Total: {:.0}% {:.1}GB/{:.1}GB", let total_text = format!("{:.0}% {:.1}GB/{:.1}GB",
pool.usage_percent.unwrap_or(0.0), pool.usage_percent.unwrap_or(0.0),
pool.used_gb.unwrap_or(0.0), pool.used_gb.unwrap_or(0.0),
pool.total_gb.unwrap_or(0.0) pool.total_gb.unwrap_or(0.0)
@@ -400,22 +436,37 @@ impl SystemWidget {
total_spans.extend(StatusIcons::create_status_spans(Status::Ok, &total_text)); total_spans.extend(StatusIcons::create_status_spans(Status::Ok, &total_text));
lines.push(Line::from(total_spans)); lines.push(Line::from(total_spans));
// Data Disks section // Data drives - at same level as parity
if !pool.data_drives.is_empty() { let has_parity = !pool.parity_drives.is_empty();
lines.push(Line::from(vec![ for (i, drive) in pool.data_drives.iter().enumerate() {
Span::styled(" ├─ Data Disks:", Typography::secondary()) let is_last_data = i == pool.data_drives.len() - 1;
])); let mut drive_details = Vec::new();
for (i, drive) in pool.data_drives.iter().enumerate() { if let Some(temp) = drive.temperature {
let is_last = i == pool.data_drives.len() - 1; drive_details.push(format!("T: {}°C", temp as i32));
let tree_symbol = if is_last { " │ └─ " } else { " │ ├─ " };
render_mergerfs_drive(drive, tree_symbol, &mut lines);
} }
if let Some(wear) = drive.wear_percent {
drive_details.push(format!("W: {}%", wear as i32));
}
let drive_text = if !drive_details.is_empty() {
format!("Data_{}: {} {}", i + 1, drive.name, drive_details.join(" "))
} else {
format!("Data_{}: {}", i + 1, drive.name)
};
// Last data drive uses └─ if there's no parity, otherwise ├─
let tree_symbol = if is_last_data && !has_parity { " └─ " } else { " ├─ " };
let mut data_spans = vec![
Span::styled(tree_symbol, Typography::tree()),
];
data_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(data_spans));
} }
// Parity section // Parity drives - last item(s)
if !pool.parity_drives.is_empty() { if !pool.parity_drives.is_empty() {
let parity_symbol = " ├─ Parity: "; for (i, drive) in pool.parity_drives.iter().enumerate() {
for drive in &pool.parity_drives { let is_last = i == pool.parity_drives.len() - 1;
let mut drive_details = Vec::new(); let mut drive_details = Vec::new();
if let Some(temp) = drive.temperature { if let Some(temp) = drive.temperature {
drive_details.push(format!("T: {}°C", temp as i32)); drive_details.push(format!("T: {}°C", temp as i32));
@@ -423,26 +474,21 @@ impl SystemWidget {
if let Some(wear) = drive.wear_percent { if let Some(wear) = drive.wear_percent {
drive_details.push(format!("W: {}%", wear as i32)); drive_details.push(format!("W: {}%", wear as i32));
} }
let drive_text = if !drive_details.is_empty() { let drive_text = if !drive_details.is_empty() {
format!("{} {}", drive.name, drive_details.join(" ")) format!("Parity: {} {}", drive.name, drive_details.join(" "))
} else { } else {
drive.name.clone() format!("Parity: {}", drive.name)
}; };
let tree_symbol = if is_last { " └─ " } else { " ├─ " };
let mut parity_spans = vec![ let mut parity_spans = vec![
Span::styled(parity_symbol, Typography::tree()), Span::styled(tree_symbol, Typography::tree()),
]; ];
parity_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text)); parity_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(parity_spans)); lines.push(Line::from(parity_spans));
} }
} }
// Mount point
lines.push(Line::from(vec![
Span::styled(" └─ Mount: ", Typography::tree()),
Span::styled(&pool.mount_point, Typography::secondary())
]));
} }
} }
@@ -450,53 +496,14 @@ impl SystemWidget {
} }
} }
/// Helper function to render a drive in a MergerFS pool /// Truncate serial number to last 8 characters
fn render_mergerfs_drive<'a>(drive: &StorageDrive, tree_symbol: &'a str, lines: &mut Vec<Line<'a>>) { fn truncate_serial(serial: &str) -> String {
let mut drive_details = Vec::new(); let len = serial.len();
if let Some(temp) = drive.temperature { if len > 8 {
drive_details.push(format!("T: {}°C", temp as i32)); serial[len - 8..].to_string()
}
if let Some(wear) = drive.wear_percent {
drive_details.push(format!("W: {}%", wear as i32));
}
let drive_text = if !drive_details.is_empty() {
format!("{} {}", drive.name, drive_details.join(" "))
} else { } else {
drive.name.clone() serial.to_string()
};
let mut drive_spans = vec![
Span::styled(tree_symbol, Typography::tree()),
];
drive_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(drive_spans));
}
/// Helper function to render a drive in a storage pool
fn render_pool_drive(drive: &StorageDrive, is_last: bool, lines: &mut Vec<Line<'_>>) {
let tree_symbol = if is_last { " └─" } else { " ├─" };
let mut drive_details = Vec::new();
if let Some(temp) = drive.temperature {
drive_details.push(format!("T: {}°C", temp as i32));
} }
if let Some(wear) = drive.wear_percent {
drive_details.push(format!("W: {}%", wear as i32));
}
let drive_text = if !drive_details.is_empty() {
format!("{} {}", drive.name, drive_details.join(" "))
} else {
format!("{}", drive.name)
};
let mut drive_spans = vec![
Span::styled(tree_symbol, Typography::tree()),
Span::raw(" "),
];
drive_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(drive_spans));
} }
impl SystemWidget { impl SystemWidget {
@@ -504,104 +511,283 @@ impl SystemWidget {
fn render_backup(&self) -> Vec<Line<'_>> { fn render_backup(&self) -> Vec<Line<'_>> {
let mut lines = Vec::new(); let mut lines = Vec::new();
// First line: serial number with temperature and wear // First section: Repository status and list
if let Some(serial) = &self.backup_disk_serial { if !self.backup_repositories.is_empty() {
let repo_text = format!("Repo: {}", self.backup_repositories.len());
let repo_spans = StatusIcons::create_status_spans(self.backup_repository_status, &repo_text);
lines.push(Line::from(repo_spans));
// List all repositories (sorted for consistent display)
let mut sorted_repos = self.backup_repositories.clone();
sorted_repos.sort();
let repo_count = sorted_repos.len();
for (idx, repo) in sorted_repos.iter().enumerate() {
let tree_char = if idx == repo_count - 1 { "└─" } else { "├─" };
lines.push(Line::from(vec![
Span::styled(format!(" {} ", tree_char), Typography::tree()),
Span::styled(repo.clone(), Typography::secondary()),
]));
}
}
// Second section: Per-disk backup information (sorted by serial for consistent display)
let mut sorted_disks = self.backup_disks.clone();
sorted_disks.sort_by(|a, b| a.serial.cmp(&b.serial));
for disk in &sorted_disks {
let truncated_serial = truncate_serial(&disk.serial);
let mut details = Vec::new(); let mut details = Vec::new();
if let Some(temp) = self.backup_disk_temperature {
if let Some(temp) = disk.temperature_celsius {
details.push(format!("T: {}°C", temp as i32)); details.push(format!("T: {}°C", temp as i32));
} }
if let Some(wear) = self.backup_disk_wear_percent { if let Some(wear) = disk.wear_percent {
details.push(format!("W: {}%", wear as i32)); details.push(format!("W: {}%", wear as i32));
} }
let disk_text = if !details.is_empty() { let disk_text = if !details.is_empty() {
format!("{} {}", serial, details.join(" ")) format!("{} {}", truncated_serial, details.join(" "))
} else { } else {
serial.clone() truncated_serial
}; };
let backup_status = match self.backup_status.as_str() { // Overall disk status (worst of backup and usage)
"completed" | "success" => Status::Ok, let disk_status = disk.backup_status.max(disk.usage_status);
"running" => Status::Pending, let disk_spans = StatusIcons::create_status_spans(disk_status, &disk_text);
"failed" => Status::Critical,
_ => Status::Unknown,
};
let disk_spans = StatusIcons::create_status_spans(backup_status, &disk_text);
lines.push(Line::from(disk_spans)); lines.push(Line::from(disk_spans));
// Show backup time from TOML if available // Show backup time with status
if let Some(start_time) = &self.backup_start_time_raw { if let Some(backup_time) = &disk.last_backup_time {
let time_text = if let Some(size) = self.backup_last_size_gb { let time_text = format!("Backup: {}", backup_time);
format!("Time: {} ({:.1}GB)", start_time, size) let mut time_spans = vec![
} else {
format!("Time: {}", start_time)
};
lines.push(Line::from(vec![
Span::styled(" ├─ ", Typography::tree()), Span::styled(" ├─ ", Typography::tree()),
Span::styled(time_text, Typography::secondary()) ];
])); time_spans.extend(StatusIcons::create_status_spans(disk.backup_status, &time_text));
lines.push(Line::from(time_spans));
} }
// Usage information // Show usage with status and archive count
if let (Some(used), Some(total), Some(usage_percent)) = ( let archive_display = if disk.archives_min == disk.archives_max {
self.backup_disk_used_gb, format!("{}", disk.archives_min)
self.backup_disk_total_gb, } else {
self.backup_disk_usage_percent format!("{}-{}", disk.archives_min, disk.archives_max)
) { };
let usage_text = format!("Usage: {:.0}% {:.0}GB/{:.0}GB", usage_percent, used, total);
let usage_spans = StatusIcons::create_status_spans(Status::Ok, &usage_text); let usage_text = format!(
let mut full_spans = vec![ "Usage: ({}) {:.0}% {:.0}GB/{:.0}GB",
Span::styled(" └─ ", Typography::tree()), archive_display,
]; disk.disk_usage_percent,
full_spans.extend(usage_spans); disk.disk_used_gb,
lines.push(Line::from(full_spans)); disk.disk_total_gb
} );
let mut usage_spans = vec![
Span::styled(" └─ ", Typography::tree()),
];
usage_spans.extend(StatusIcons::create_status_spans(disk.usage_status, &usage_text));
lines.push(Line::from(usage_spans));
} }
lines lines
} }
/// Format time ago from timestamp /// Compress IPv4 addresses from same subnet
fn format_time_ago(&self, timestamp: u64) -> String { /// Example: "192.168.30.1, 192.168.30.100" -> "192.168.30.1, 100"
let now = chrono::Utc::now().timestamp() as u64; fn compress_ipv4_addresses(addresses: &[String]) -> String {
let seconds_ago = now.saturating_sub(timestamp); if addresses.is_empty() {
return String::new();
let hours = seconds_ago / 3600;
let minutes = (seconds_ago % 3600) / 60;
if hours > 0 {
format!("{}h ago", hours)
} else if minutes > 0 {
format!("{}m ago", minutes)
} else {
"now".to_string()
} }
if addresses.len() == 1 {
return addresses[0].clone();
}
let mut result = Vec::new();
let mut last_prefix = String::new();
for addr in addresses {
let parts: Vec<&str> = addr.split('.').collect();
if parts.len() == 4 {
let prefix = format!("{}.{}.{}", parts[0], parts[1], parts[2]);
if prefix == last_prefix {
// Same subnet, show only last octet
result.push(parts[3].to_string());
} else {
// Different subnet, show full IP
result.push(addr.clone());
last_prefix = prefix;
}
} else {
// Invalid IP format, show as-is
result.push(addr.clone());
}
}
result.join(", ")
} }
/// Format time until from future timestamp /// Render network section for display with physical/virtual grouping
fn format_time_until(&self, timestamp: u64) -> String { fn render_network(&self) -> Vec<Line<'_>> {
let now = chrono::Utc::now().timestamp() as u64; let mut lines = Vec::new();
if timestamp <= now {
return "overdue".to_string(); if self.network_interfaces.is_empty() {
return lines;
} }
let seconds_until = timestamp - now; // Separate physical and virtual interfaces
let hours = seconds_until / 3600; let physical: Vec<_> = self.network_interfaces.iter().filter(|i| i.is_physical).collect();
let minutes = (seconds_until % 3600) / 60; let virtual_interfaces: Vec<_> = self.network_interfaces.iter().filter(|i| !i.is_physical).collect();
if hours > 0 { // Find standalone virtual interfaces (those without a parent)
format!("in {}h", hours) let mut standalone_virtual: Vec<_> = virtual_interfaces.iter()
} else if minutes > 0 { .filter(|i| i.parent_interface.is_none())
format!("in {}m", minutes) .collect();
} else {
"soon".to_string() // Sort standalone virtual: VLANs first (by VLAN ID), then others alphabetically
standalone_virtual.sort_by(|a, b| {
match (a.vlan_id, b.vlan_id) {
(Some(vlan_a), Some(vlan_b)) => vlan_a.cmp(&vlan_b),
(Some(_), None) => std::cmp::Ordering::Less,
(None, Some(_)) => std::cmp::Ordering::Greater,
(None, None) => a.name.cmp(&b.name),
}
});
// Render physical interfaces with their children
for (phy_idx, interface) in physical.iter().enumerate() {
let is_last_physical = phy_idx == physical.len() - 1 && standalone_virtual.is_empty();
// Physical interface header with status icon
let mut header_spans = vec![];
header_spans.extend(StatusIcons::create_status_spans(
interface.link_status.clone(),
&format!("{}:", interface.name)
));
lines.push(Line::from(header_spans));
// Find child interfaces for this physical interface
let mut children: Vec<_> = virtual_interfaces.iter()
.filter(|vi| {
if let Some(parent) = &vi.parent_interface {
parent == &interface.name
} else {
false
}
})
.collect();
// Sort children: VLANs first (by VLAN ID), then others alphabetically
children.sort_by(|a, b| {
match (a.vlan_id, b.vlan_id) {
(Some(vlan_a), Some(vlan_b)) => vlan_a.cmp(&vlan_b),
(Some(_), None) => std::cmp::Ordering::Less,
(None, Some(_)) => std::cmp::Ordering::Greater,
(None, None) => a.name.cmp(&b.name),
}
});
// Count total items under this physical interface (IPs + children)
let ip_count = interface.ipv4_addresses.len() + interface.ipv6_addresses.len();
let total_children = ip_count + children.len();
let mut child_index = 0;
// IPv4 addresses on the physical interface itself
for ipv4 in &interface.ipv4_addresses {
child_index += 1;
let is_last = child_index == total_children && is_last_physical;
let tree_symbol = if is_last { " └─ " } else { " ├─ " };
lines.push(Line::from(vec![
Span::styled(tree_symbol, Typography::tree()),
Span::styled(format!("ip: {}", ipv4), Typography::secondary()),
]));
}
// IPv6 addresses on the physical interface itself
for ipv6 in &interface.ipv6_addresses {
child_index += 1;
let is_last = child_index == total_children && is_last_physical;
let tree_symbol = if is_last { " └─ " } else { " ├─ " };
lines.push(Line::from(vec![
Span::styled(tree_symbol, Typography::tree()),
Span::styled(format!("ip: {}", ipv6), Typography::secondary()),
]));
}
// Child virtual interfaces (VLANs, etc.)
for child in children {
child_index += 1;
let is_last = child_index == total_children && is_last_physical;
let tree_symbol = if is_last { " └─ " } else { " ├─ " };
let ip_text = if !child.ipv4_addresses.is_empty() {
Self::compress_ipv4_addresses(&child.ipv4_addresses)
} else if !child.ipv6_addresses.is_empty() {
child.ipv6_addresses.join(", ")
} else {
String::new()
};
// Format: "name (vlan X): IP" or "name: IP"
let child_text = if let Some(vlan_id) = child.vlan_id {
if !ip_text.is_empty() {
format!("{} (vlan {}): {}", child.name, vlan_id, ip_text)
} else {
format!("{} (vlan {}):", child.name, vlan_id)
}
} else {
if !ip_text.is_empty() {
format!("{}: {}", child.name, ip_text)
} else {
format!("{}:", child.name)
}
};
lines.push(Line::from(vec![
Span::styled(tree_symbol, Typography::tree()),
Span::styled(child_text, Typography::secondary()),
]));
}
} }
// Render standalone virtual interfaces (those without a parent)
for (virt_idx, interface) in standalone_virtual.iter().enumerate() {
let is_last = virt_idx == standalone_virtual.len() - 1;
let tree_symbol = if is_last { " └─ " } else { " ├─ " };
// Virtual interface with IPs
let ip_text = if !interface.ipv4_addresses.is_empty() {
Self::compress_ipv4_addresses(&interface.ipv4_addresses)
} else if !interface.ipv6_addresses.is_empty() {
interface.ipv6_addresses.join(", ")
} else {
String::new()
};
// Format: "name (vlan X): IP" or "name: IP"
let interface_text = if let Some(vlan_id) = interface.vlan_id {
if !ip_text.is_empty() {
format!("{} (vlan {}): {}", interface.name, vlan_id, ip_text)
} else {
format!("{} (vlan {}):", interface.name, vlan_id)
}
} else {
if !ip_text.is_empty() {
format!("{}: {}", interface.name, ip_text)
} else {
format!("{}:", interface.name)
}
};
lines.push(Line::from(vec![
Span::styled(tree_symbol, Typography::tree()),
Span::styled(interface_text, Typography::secondary()),
]));
}
lines
} }
/// Render system widget /// Render system widget
pub fn render(&mut self, frame: &mut Frame, area: Rect, hostname: &str, config: Option<&crate::config::DashboardConfig>) { pub fn render(&mut self, frame: &mut Frame, area: Rect, hostname: &str, _config: Option<&crate::config::DashboardConfig>) {
let mut lines = Vec::new(); let mut lines = Vec::new();
// NixOS section // NixOS section
@@ -618,41 +804,42 @@ impl SystemWidget {
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled(format!("Agent: {}", agent_version_text), Typography::secondary()) Span::styled(format!("Agent: {}", agent_version_text), Typography::secondary())
])); ]));
// Display detected connection IP // ZMQ communication stats
if let Some(config) = config { if let (Some(packets), Some(age)) = (self.zmq_packets_received, self.zmq_last_packet_age) {
if let Some(host_details) = config.hosts.get(hostname) { let age_text = if age < 1.0 {
let detected_ip = host_details.get_connection_ip(hostname); format!("{:.0}ms ago", age * 1000.0)
lines.push(Line::from(vec![ } else {
Span::styled(format!("IP: {}", detected_ip), Typography::secondary()) format!("{:.1}s ago", age)
])); };
} lines.push(Line::from(vec![
Span::styled(format!("ZMQ: {} pkts, last {}", packets, age_text), Typography::secondary())
]));
} }
// CPU section // CPU section
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled("CPU:", Typography::widget_title()) Span::styled("CPU:", Typography::widget_title())
])); ]));
let load_text = self.format_cpu_load(); let load_text = self.format_cpu_load();
let cpu_spans = StatusIcons::create_status_spans( let cpu_spans = StatusIcons::create_status_spans(
self.cpu_status.clone(), self.cpu_status.clone(),
&format!("Load: {}", load_text) &format!("Load: {}", load_text)
); );
lines.push(Line::from(cpu_spans)); lines.push(Line::from(cpu_spans));
let freq_text = self.format_cpu_frequency(); let cstate_text = self.format_cpu_cstate();
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled(" └─ ", Typography::tree()), Span::styled(" └─ ", Typography::tree()),
Span::styled(format!("Freq: {}", freq_text), Typography::secondary()) Span::styled(format!("C-state: {}", cstate_text), Typography::secondary())
])); ]));
// RAM section // RAM section
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled("RAM:", Typography::widget_title()) Span::styled("RAM:", Typography::widget_title())
])); ]));
let memory_text = self.format_memory_usage(); let memory_text = self.format_memory_usage();
let memory_spans = StatusIcons::create_status_spans( let memory_spans = StatusIcons::create_status_spans(
self.memory_status.clone(), self.memory_status.clone(),
@@ -664,16 +851,16 @@ impl SystemWidget {
for (i, tmpfs) in self.tmpfs_mounts.iter().enumerate() { for (i, tmpfs) in self.tmpfs_mounts.iter().enumerate() {
let is_last = i == self.tmpfs_mounts.len() - 1; let is_last = i == self.tmpfs_mounts.len() - 1;
let tree_symbol = if is_last { " └─ " } else { " ├─ " }; let tree_symbol = if is_last { " └─ " } else { " ├─ " };
let usage_text = if tmpfs.total_gb > 0.0 { let usage_text = if tmpfs.total_gb > 0.0 {
format!("{:.0}% {:.1}GB/{:.1}GB", format!("{:.0}% {:.1}GB/{:.1}GB",
tmpfs.usage_percent, tmpfs.usage_percent,
tmpfs.used_gb, tmpfs.used_gb,
tmpfs.total_gb) tmpfs.total_gb)
} else { } else {
"— —/—".to_string() "— —/—".to_string()
}; };
let mut tmpfs_spans = vec![ let mut tmpfs_spans = vec![
Span::styled(tree_symbol, Typography::tree()), Span::styled(tree_symbol, Typography::tree()),
]; ];
@@ -684,6 +871,16 @@ impl SystemWidget {
lines.push(Line::from(tmpfs_spans)); lines.push(Line::from(tmpfs_spans));
} }
// Network section
if !self.network_interfaces.is_empty() {
lines.push(Line::from(vec![
Span::styled("Network:", Typography::widget_title())
]));
let network_lines = self.render_network();
lines.extend(network_lines);
}
// Storage section // Storage section
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled("Storage:", Typography::widget_title()) Span::styled("Storage:", Typography::widget_title())
@@ -694,7 +891,7 @@ impl SystemWidget {
lines.extend(storage_lines); lines.extend(storage_lines);
// Backup section (if available) // Backup section (if available)
if self.backup_status != "unavailable" && self.backup_status != "unknown" { if !self.backup_repositories.is_empty() || !self.backup_disks.is_empty() {
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled("Backup:", Typography::widget_title()) Span::styled("Backup:", Typography::widget_title())
])); ]));

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.152" version = "0.1.239"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -16,18 +16,44 @@ pub struct AgentData {
/// System-level monitoring data /// System-level monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SystemData { pub struct SystemData {
pub network: NetworkData,
pub cpu: CpuData, pub cpu: CpuData,
pub memory: MemoryData, pub memory: MemoryData,
pub storage: StorageData, pub storage: StorageData,
} }
/// Network interface monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NetworkData {
pub interfaces: Vec<NetworkInterfaceData>,
}
/// Individual network interface data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct NetworkInterfaceData {
pub name: String,
pub ipv4_addresses: Vec<String>,
pub ipv6_addresses: Vec<String>,
pub is_physical: bool,
pub link_status: Status,
pub parent_interface: Option<String>,
pub vlan_id: Option<u16>,
}
/// CPU C-state usage information
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CStateInfo {
pub name: String,
pub percent: f32,
}
/// CPU monitoring data /// CPU monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CpuData { pub struct CpuData {
pub load_1min: f32, pub load_1min: f32,
pub load_5min: f32, pub load_5min: f32,
pub load_15min: f32, pub load_15min: f32,
pub frequency_mhz: f32, pub cstates: Vec<CStateInfo>, // C-state usage percentages (C1, C6, C10, etc.) - indicates CPU idle depth distribution
pub temperature_celsius: Option<f32>, pub temperature_celsius: Option<f32>,
pub load_status: Status, pub load_status: Status,
pub temperature_status: Status, pub temperature_status: Status,
@@ -66,6 +92,7 @@ pub struct StorageData {
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DriveData { pub struct DriveData {
pub name: String, pub name: String,
pub serial_number: Option<String>,
pub health: String, pub health: String,
pub temperature_celsius: Option<f32>, pub temperature_celsius: Option<f32>,
pub wear_percent: Option<f32>, pub wear_percent: Option<f32>,
@@ -96,26 +123,35 @@ pub struct PoolData {
pub total_gb: f32, pub total_gb: f32,
pub data_drives: Vec<PoolDriveData>, pub data_drives: Vec<PoolDriveData>,
pub parity_drives: Vec<PoolDriveData>, pub parity_drives: Vec<PoolDriveData>,
pub health_status: Status,
pub usage_status: Status,
} }
/// Drive in a storage pool /// Drive in a storage pool
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolDriveData { pub struct PoolDriveData {
pub name: String, pub name: String,
pub serial_number: Option<String>,
pub temperature_celsius: Option<f32>, pub temperature_celsius: Option<f32>,
pub wear_percent: Option<f32>, pub wear_percent: Option<f32>,
pub health: String, pub health: String,
pub health_status: Status,
pub temperature_status: Status,
} }
/// Service monitoring data /// Service monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServiceData { pub struct ServiceData {
pub name: String, pub name: String,
pub memory_mb: f32,
pub disk_gb: f32,
pub user_stopped: bool, pub user_stopped: bool,
pub service_status: Status, pub service_status: Status,
pub sub_services: Vec<SubServiceData>, pub sub_services: Vec<SubServiceData>,
/// Memory usage in bytes (from MemoryCurrent)
pub memory_bytes: Option<u64>,
/// Number of service restarts (from NRestarts)
pub restart_count: Option<u32>,
/// Uptime in seconds (calculated from ExecMainStartTimestamp)
pub uptime_seconds: Option<u64>,
} }
/// Sub-service data (nginx sites, docker containers, etc.) /// Sub-service data (nginx sites, docker containers, etc.)
@@ -124,6 +160,9 @@ pub struct SubServiceData {
pub name: String, pub name: String,
pub service_status: Status, pub service_status: Status,
pub metrics: Vec<SubServiceMetric>, pub metrics: Vec<SubServiceMetric>,
/// Type of sub-service: "nginx_site", "container", "image"
#[serde(default)]
pub service_type: String,
} }
/// Individual metric for a sub-service /// Individual metric for a sub-service
@@ -137,23 +176,27 @@ pub struct SubServiceMetric {
/// Backup system data /// Backup system data
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BackupData { pub struct BackupData {
pub status: String, pub repositories: Vec<String>,
pub total_size_gb: Option<f32>, pub repository_status: Status,
pub repository_health: Option<String>, pub disks: Vec<BackupDiskData>,
pub repository_disk: Option<BackupDiskData>,
pub last_backup_size_gb: Option<f32>,
pub start_time_raw: Option<String>,
} }
/// Backup repository disk information /// Backup repository disk information
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BackupDiskData { pub struct BackupDiskData {
pub serial: String, pub serial: String,
pub usage_percent: f32, pub product_name: Option<String>,
pub used_gb: f32,
pub total_gb: f32,
pub wear_percent: Option<f32>, pub wear_percent: Option<f32>,
pub temperature_celsius: Option<f32>, pub temperature_celsius: Option<f32>,
pub last_backup_time: Option<String>,
pub backup_status: Status,
pub disk_usage_percent: f32,
pub disk_used_gb: f32,
pub disk_total_gb: f32,
pub usage_status: Status,
pub services: Vec<String>,
pub archives_min: i64,
pub archives_max: i64,
} }
impl AgentData { impl AgentData {
@@ -165,11 +208,14 @@ impl AgentData {
build_version: None, build_version: None,
timestamp: chrono::Utc::now().timestamp() as u64, timestamp: chrono::Utc::now().timestamp() as u64,
system: SystemData { system: SystemData {
network: NetworkData {
interfaces: Vec::new(),
},
cpu: CpuData { cpu: CpuData {
load_1min: 0.0, load_1min: 0.0,
load_5min: 0.0, load_5min: 0.0,
load_15min: 0.0, load_15min: 0.0,
frequency_mhz: 0.0, cstates: Vec::new(),
temperature_celsius: None, temperature_celsius: None,
load_status: Status::Unknown, load_status: Status::Unknown,
temperature_status: Status::Unknown, temperature_status: Status::Unknown,
@@ -191,12 +237,9 @@ impl AgentData {
}, },
services: Vec::new(), services: Vec::new(),
backup: BackupData { backup: BackupData {
status: "unknown".to_string(), repositories: Vec::new(),
total_size_gb: None, repository_status: Status::Unknown,
repository_health: None, disks: Vec::new(),
repository_disk: None,
last_backup_size_gb: None,
start_time_raw: None,
}, },
} }
} }

View File

@@ -82,12 +82,13 @@ impl MetricValue {
/// Health status for metrics /// Health status for metrics
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)] #[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
pub enum Status { pub enum Status {
Inactive, // Lowest priority Info, // Lowest priority - informational data with no status (no icon)
Unknown, // Inactive, //
Offline, // Unknown, //
Pending, // Offline, //
Ok, // 5th place - good status has higher priority than unknown states Pending, //
Warning, // Ok, // Good status has higher priority than unknown states
Warning, //
Critical, // Highest priority Critical, // Highest priority
} }
@@ -223,6 +224,17 @@ impl HysteresisThresholds {
Status::Ok Status::Ok
} }
} }
Status::Info => {
// Informational data shouldn't be used with hysteresis calculations
// Treat like Unknown if it somehow ends up here
if value >= self.critical_high {
Status::Critical
} else if value >= self.warning_high {
Status::Warning
} else {
Status::Ok
}
}
} }
} }
} }