141 Commits

Author SHA1 Message Date
7a3ee3d5ba Fix physical drive grouping logic for unified pool visualization
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
Updated filesystem grouping to use extract_base_device method for proper
partition-to-drive mapping. This ensures nvme0n1p1 and nvme0n1p2 are
correctly grouped under nvme0n1 drive pool instead of separate pools.
2025-11-23 13:54:33 +01:00
d68ecfbc64 Complete unified pool visualization with filesystem children
All checks were successful
Build and Release / build-and-release (push) Successful in 2m17s
- Implement filesystem children display under physical drive pools
- Agent generates individual filesystem metrics for each mount point
- Dashboard parses filesystem metrics and displays as tree children
- Add filesystem usage, total, and available space metrics
- Support target format: drive info + filesystem children hierarchy
- Fix compilation warnings by properly using available_bytes calculation
2025-11-23 12:48:24 +01:00
d1272a6c13 Implement unified pool visualization for single drives
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Group single disk filesystems by physical drive during auto-discovery
- Create physical drive pools with filesystem children
- Display temperature, wear, and health at drive level
- Provide consistent hierarchical storage visualization
- Fix borrow checker issues in create_physical_drive_pool method
- Add PhysicalDrive case to all StoragePoolType match statements
2025-11-23 12:10:42 +01:00
33b3beb342 Implement storage auto-discovery system
All checks were successful
Build and Release / build-and-release (push) Successful in 1m49s
- Add automatic detection of mergerfs pools by parsing /proc/mounts
- Implement smart heuristics for parity disk identification
- Store discovered topology at agent startup for efficient monitoring
- Eliminate need for manual storage pool configuration
- Support zero-config storage visualization with backward compatibility
- Clean up mount parsing and remove unused fields
2025-11-23 11:44:57 +01:00
f9384d9df6 Implement enhanced storage pool visualization
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
- Add support for mergerfs pool grouping with data and parity disk separation
- Implement pool health monitoring (healthy/degraded/critical status)
- Create hierarchical tree view for multi-disk storage arrays
- Add automatic pool type detection and member disk association
- Maintain backward compatibility for single disk configurations
- Support future extension for RAID and ZFS pool types
2025-11-23 11:18:21 +01:00
dc1a2e3a0f Add disk wear monitoring and fix storage overflow display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m15s
- Add disk wear percentage collection from SMART data in backup script
- Add backup_disk_wear_percent metric to backup collector with thresholds
- Display wear percentage in backup widget disk section
- Fix storage section overflow handling to use consistent "X more below" logic
- Update maintenance mode to return pending status instead of unknown
2025-11-20 20:36:45 +01:00
c0f7a97a6f Remove all scrolling code and user-stopped tracking logic
All checks were successful
Build and Release / build-and-release (push) Successful in 2m36s
- Remove scroll offset fields from HostWidgets struct
- Replace scrolling with simple "X more below" indicators in all widgets
- Remove user-stopped service tracking from agent (now uses SSH control)
- Inactive services now consistently show Status::Inactive with empty circles
- Simplify widget render methods by removing scroll parameters
- Clean up unused imports and legacy scrolling infrastructure
- Fix journalctl command to use -fu for proper log following
2025-11-19 08:32:42 +01:00
d11aa11f99 Add Status::Inactive for inactive services with empty circle display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
- Add new Status::Inactive variant to enum for better service state representation
- Agent now assigns Status::Inactive instead of Status::Warning for inactive services
- Dashboard displays inactive services with empty circle (○) icon in gray color
- User-stopped services still show as Status::Ok with green filled circle
- Inactive services treated as OK for host status aggregation
- Improves visual clarity between active (●), inactive (○), and warning (◐) states
2025-11-18 17:54:51 +01:00
6179bd51a7 Implement WakeOnLAN functionality with simplified configuration
All checks were successful
Build and Release / build-and-release (push) Successful in 2m32s
- Add Status::Offline enum variant for disconnected hosts
- All configured hosts now always visible showing offline status when disconnected
- Add WakeOnLAN support using wake-on-lan Rust crate
- Implement w key binding to wake offline hosts with MAC addresses
- Simplify configuration to single [hosts] section with MAC addresses only
- Change critical status icon from ◯ to ! for better visibility
- Add proper MAC address parsing and error handling
- Silent WakeOnLAN operation with logging for success/failure

Configuration format:
[hosts]
hostname = { mac_address = "AA:BB:CC:DD:EE:FF" }
2025-10-31 09:03:01 +01:00
bd20f0cae1 Fix user-stopped flag timing and service transition handling
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Correct user-stopped service behavior during startup transitions:

User-Stopped Flag Timing Fix:
- Clear user-stopped flag only when service actually becomes active, not when start command succeeds
- Remove premature flag clearing from service control handler
- Add automatic flag clearing when service status metrics show active state
- Services retain user-stopped status during activating/transitioning states

Service Transition Handling:
- User-stopped services in activating state now report Status::OK instead of Status::Pending
- Prevents host warnings during legitimate service startup transitions
- Maintains accurate status reporting throughout service lifecycle
- Failed service starts preserve user-stopped flags correctly

Journalctl Popup Fix:
- Fix terminal corruption when using J key for service logs
- Correct command quoting to prevent tmux popup interference
- Stable popup display without dashboard interface corruption

Result: Clean service startup experience with no false warnings and proper
user-stopped tracking throughout the entire service lifecycle.

Bump version to v0.1.47
2025-10-30 12:05:54 +01:00
c56e9d7be2 Implement user-stopped service tracking system
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
Add comprehensive tracking for services stopped via dashboard to prevent
false alerts when users intentionally stop services.

Features:
- User-stopped services report Status::Ok instead of Warning
- Persistent storage survives agent restarts
- Dashboard sends UserStart/UserStop commands
- Agent tracks and syncs user-stopped state globally
- Systemd collector respects user-stopped flags

Implementation:
- New service_tracker module with persistent JSON storage
- Enhanced ServiceAction enum with UserStart/UserStop variants
- Global singleton tracker accessible by collectors
- Service status logic updated to check user-stopped flag
- Dashboard version now uses CARGO_PKG_VERSION automatically

Bump version to v0.1.43
2025-10-30 10:42:56 +01:00
c8f800a1e5 Implement git commit hash tracking for build display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m24s
- Add get_git_commit() method to read /var/lib/cm-dashboard/git-commit
- Replace NixOS build version with actual git commit hash
- Show deployed commit hash as 'Build:' value for accurate tracking
- Enable verification of which exact commit is deployed per host
- Update version to 0.1.42
2025-10-29 15:29:02 +01:00
6509a2b91a Make nginx site latency thresholds configurable and simplify status logic
All checks were successful
Build and Release / build-and-release (push) Successful in 4m25s
- Replace hardcoded 500ms/2000ms thresholds with configurable nginx_latency_critical_ms
- Simplify status logic to only OK or Critical (no Warning status)
- Add validation for nginx latency threshold configuration
- Re-enable nginx site collection with configurable thresholds
- Resolves issue where sites showed critical at 2000ms despite 30s timeout setting
- Bump version to v0.1.38
2025-10-28 21:24:34 +01:00
e890c5e810 Fix service status detection with combined discovery and status approach
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Enhanced service discovery to properly show status for all services:

Changes:
- Use systemctl list-unit-files for complete service discovery (finds all services)
- Use systemctl list-units --all for batch runtime status fetching
- Combine both datasets to get comprehensive service list with correct status
- Services found in unit-files but not runtime are marked as inactive (Warning status)
- Eliminates 'unknown' status issue while maintaining complete service visibility

Now inactive services show as Warning (yellow ◐) and active services show as Ok (green ●)
instead of all services showing as unknown (? icon).
2025-10-28 15:56:47 +01:00
078c30a592 Fix service discovery to show all configured services regardless of state
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
Changed service discovery from 'systemctl list-units --all' to 'systemctl list-unit-files'
to ensure ALL service unit files are discovered, including services that have never been started.

Changes:
- Updated systemctl command to use list-unit-files instead of list-units --all
- Modified parsing logic to handle unit file format (2 fields vs 4 fields)
- Set placeholder values in discovery cache, actual runtime status fetched during collection
- This ensures all configured services (like inactive ARK servers) appear in dashboard

The issue was that list-units --all only shows services systemd has loaded/attempted to load,
but list-unit-files shows ALL service unit files regardless of their runtime state.
2025-10-28 15:41:58 +01:00
2910b7d875 Update version to 0.1.22 and fix system metric status calculation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
- Fix /tmp usage status to use proper thresholds instead of hardcoded Ok status
- Fix wear level status to use configurable thresholds instead of hardcoded values
- Add dedicated tmp_status field to SystemWidget for proper /tmp status display
- Remove host-level hourglass icon during service operations
- Implement immediate service status updates after start/stop/restart commands
- Remove active users display and collection from NixOS section
- Fix immediate host status aggregation transmission to dashboard
2025-10-28 13:21:56 +01:00
91f037aa3e Update to v0.1.19 with event-driven status aggregation
All checks were successful
Build and Release / build-and-release (push) Successful in 2m4s
Major architectural improvements:

CORE CHANGES:
- Remove notification_interval_seconds - status aggregation now immediate
- Status calculation moved to collection phase instead of transmission
- Event-driven transmission triggers immediately on status changes
- Dual transmission strategy: immediate on change + periodic backup
- Real-time notifications without batching delays

TECHNICAL IMPROVEMENTS:
- process_metric() now returns bool indicating status change
- Immediate ZMQ broadcast when status changes detected
- Status aggregation happens during metric collection, not later
- Legacy get_nixos_build_info() method removed (unused)
- All compilation warnings fixed

BEHAVIOR CHANGES:
- Critical alerts sent instantly instead of waiting for intervals
- Dashboard receives real-time status updates
- Notifications triggered immediately on status transitions
- Backup periodic transmission every 1s ensures heartbeat

This provides much more responsive monitoring with instant alerting
while maintaining the reliability of periodic transmission as backup.
2025-10-28 10:36:34 +01:00
627c533724 Update to v0.1.18 with per-collector intervals and tmux check
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Implement per-collector interval timing respecting NixOS config
- Remove all hardcoded timeout/interval values and make configurable
- Add tmux session requirement check for TUI mode (bypassed for headless)
- Update agent to send config hash in Build field instead of nixos version
- Add nginx check interval, HTTP timeouts, and ZMQ transmission interval configs
- Update NixOS configuration with new configurable values

Breaking changes:
- Build field now shows nix store config hash (8 chars) instead of nixos version
- All intervals now follow individual collector configuration instead of global

New configuration fields:
- systemd.nginx_check_interval_seconds
- systemd.http_timeout_seconds
- systemd.http_connect_timeout_seconds
- zmq.transmission_interval_seconds
2025-10-28 10:08:25 +01:00
8dffe18a23 Improve SATA SSD wear level calculation
Some checks failed
Build and Release / build-and-release (push) Failing after 1m24s
- Support multiple SATA SSD wear attributes (SSD_Life_Left, Media_Wearout_Indicator, etc.)
- Handle manufacturer differences in wear reporting
- Proper parsing of SMART table format with VALUE column
- Covers Samsung, Intel, Crucial and other common SSD types
- NVMe Percentage Used support maintained
2025-10-25 22:32:09 +02:00
0c544753f9 Move SMART configuration into disk config
- Consolidate SMART thresholds into DiskConfig structure
- Remove separate SmartConfig - disk collector handles all drive data
- Update NixOS configuration to use disk.temperature_* settings
- Remove hardcoded temperature thresholds in disk collector
- Logical grouping: disk collector owns all disk/drive configuration
2025-10-25 22:29:26 +02:00
c8e26b9bac Remove redundant smart collector - consolidate SMART into disk collector
- Remove separate smart collector implementation
- Disk collector already handles SMART data for drives
- Eliminates duplicate smartctl calls causing performance issues
- SMART functionality remains in logical place with disk monitoring
- Fixes infinite smartctl loop issue
2025-10-25 22:25:22 +02:00
9160fac80b Fix smart collector compilation errors
- Update to match current Metric structure
- Use correct Status enum and collector interface
- Fix MetricValue types and constructor usage
- Builds successfully with warnings only
2025-10-25 17:13:04 +02:00
83cb43bcf1 Restore missing smart collector implementation
Some checks failed
Build and Release / build-and-release (push) Failing after 1m24s
- Rewrite smart collector to match current architecture
- Add back to mod.rs exports
- Fixes infinite smartctl loop issue
- Uses simple health and temperature monitoring
2025-10-25 16:59:09 +02:00
4b54a59e35 Remove unused code and eliminate compiler warnings
- Remove unused fields from CommandStatus variants
- Clean up unused methods and unused collector fields
- Fix lifetime syntax warning in SystemWidget
- Delete unused cache module completely
- Remove redundant render methods from widgets

All agent and dashboard warnings eliminated while preserving
panel switching and scrolling functionality.
2025-10-25 14:15:52 +02:00
fb6ee6d7ae Fix config hash to show actual deployed nix store hash
- Replace git commit hash with nix store hash extraction
- Read from /run/current-system symlink target
- Extract first 8 characters of nix store hash: d8ivwiar
- Shows actual deployed configuration, not just source
- Enables proper rebuild completion detection
- Accurate deployment verification
2025-10-25 12:22:17 +02:00
2d3844b5dd Add configuration hash display to system panel
- Collect config hash from cloned nixos-config git repository
- Display "Config: xxxxx" after "Build: xxxxx" in NixOS section
- Uses /var/lib/cm-dashboard/nixos-config directory
- Shows actual configuration hash vs nixpkgs build hash
2025-10-25 01:30:46 +02:00
967244064f Fix command execution permissions and eliminate backup error spam
- Add sudo permissions for systemctl and nixos-rebuild commands
- Use sudo in agent command execution for proper privileges
- Fix backup collector to handle missing status files gracefully
- Eliminate backup error spam when no backup system is configured
2025-10-23 23:07:52 +02:00
d193b90ba1 Fix device detection to properly parse lsblk output
- Handle lsblk tree symbols (├─, └─) in device parsing
- Extract base device names from partitions (nvme0n1p2 -> nvme0n1)
- Support both NVMe and traditional device naming schemes
- Fixes missing device lines in storage display
2025-10-23 19:16:33 +02:00
ad298ac70c Fix device detection, tree indentation, and hide Single storage type
- Replace findmnt with lsblk for efficient device name detection
- Fix tree indentation to align consistently with status icon text
- Hide '(Single)' label for single disk storage pools
- Device detection returns actual names (nvme0n1, sda) not UUID paths
2025-10-23 19:06:52 +02:00
9f34c67bfa Fix debug log reference to removed underlying_devices field 2025-10-23 18:56:16 +02:00
5134c5320a Fix disk collector to use dynamic device detection
- Remove underlying_devices field from FilesystemConfig
- Add device detection at startup using findmnt command
- Store detected devices in HashMap for reuse during collection
- Keep all existing functionality (StoragePool, DriveInfo, SMART data)
- Detect devices only once at initialization, not every collection cycle
- Fixes agent startup failure due to missing underlying_devices config
2025-10-23 18:50:40 +02:00
c5ec529210 Add agent hash display to system panel
Implement agent version tracking to diagnose deployment issues:
- Add get_agent_hash() method to extract Nix store hash from executable path
- Collect system_agent_hash metric in NixOS collector
- Display "Agent Hash" in system panel under NixOS section
- Update metric filtering to include agent hash

This helps identify which version of the agent is actually running
when troubleshooting deployment or metric collection issues.
2025-10-23 17:33:45 +02:00
3b1bda741b Remove codename from NixOS build display
- Strip codename part (e.g., '(Warbler)') from nixos-version output
- Display clean version format: '25.05.20251004.3bcc93c'
- Simplify parsing to use raw nixos-version output as requested
2025-10-23 14:55:18 +02:00
64af24dc40 Update NixOS display format to show build hash and timestamp
- Change from showing version to build format: 'hash dd/mm/yy H:M:S'
- Parse nixos-version output to extract short hash and format date
- Update system widget to display 'Build:' instead of 'Version:'
- Remove version/build_date fields in favor of single build string
- Follow TODO.md specification for NixOS section layout
2025-10-23 14:48:25 +02:00
9e80d6b654 Remove hardcoded /tmp autodetection and implement proper tmpfs monitoring
- Remove /tmp autodetection from disk collector (57 lines removed)
- Add tmpfs monitoring to memory collector with get_tmpfs_metrics() method
- Generate memory_tmp_* metrics for proper RAM-based tmpfs monitoring
- Fix type annotations in tmpfs parsing for compilation
- System widget now correctly displays tmpfs usage in RAM section
2025-10-23 14:26:15 +02:00
39fc9cd22f Implement unified system widget with NixOS info, CPU, RAM, and Storage
- Create NixOS collector for version and active users detection
- Add SystemWidget combining all system information in TODO.md layout
- Replace separate CPU/Memory widgets with unified system display
- Add tree structure for storage with drive temperature/wear info
- Support NixOS version, active users, load averages, memory usage
- Follow exact decimal formatting from specification
2025-10-23 14:01:14 +02:00
c99e0bd8ee Remove hardcoded discovery interval in systemd collector
- Use config.interval_seconds instead of hardcoded 300 seconds
- Discovery now happens every 10 seconds (configurable) instead of 5 minutes
- Follows configuration-driven architecture requirements
2025-10-23 13:20:48 +02:00
0f12438ab4 Fix RwLock deadlock in systemd collector Phase 4
- Restructure get_monitored_services to avoid nested write locks
- Split discover_services into discover_services_internal that returns data
- Update state in separate scope to prevent deadlock
- Fix borrow checker errors with clone() for status cache
2025-10-23 13:12:53 +02:00
7607e971b8 Add debug logging to diagnose Phase 4 service discovery issue
Add detailed debug logging to track:
- Service discovery start
- Individual service parsing
- Final service count and list
- Empty results indication

This will help identify why cmbox disappeared from dashboard.
2025-10-23 12:57:10 +02:00
da6f3c3855 Phase 4: Cache service status from discovery to eliminate per-service calls
Major performance optimization:
- Parse and cache service status during discovery from systemctl list-units
- Eliminate per-service systemctl is-active and show calls
- Reduce systemctl calls from 1+2N to just 1 call total
- For 10 services: 21 calls → 1 call (95% reduction)
- Add fallback to systemctl for cache misses

This completes the major systemctl call reduction goal from TODO.md.
2025-10-23 12:51:17 +02:00
174b27f31a Phase 3: Add wildcard support for service pattern matching
Implement glob pattern matching for service filters:
- nginx* matches nginx, nginx-config-reload, etc.
- *backup matches any service ending with 'backup'
- docker*prune matches docker-weekly-prune, etc.
- Exact matches still work as before (backward compatible)

Addresses TODO.md requirement for '*' filtering support.
2025-10-23 12:37:16 +02:00
dc11538ae9 Phase 2b: Optimize to single systemctl command
Reduce from 2 systemctl commands to 1 by using only:
systemctl list-units --type=service --all

This captures all services (active, inactive, failed) in one call,
eliminating the redundant list-unit-files command.
Achieves the TODO.md goal of reducing systemctl calls.
2025-10-23 12:34:54 +02:00
9133e18090 Phase 2: Remove user service collection logic
Remove all sudo -u systemctl commands and user service processing.
Now only collects system services via systemctl list-units/list-unit-files.
Eliminates user service discovery completely as planned in TODO.md.
2025-10-23 12:32:19 +02:00
616fad2c5d Phase 1: Implement exact name filtering for service matching
Change service matching logic from contains-based to exact equality.
Services now match only if service_name == pattern exactly.
This is the first step in the systemd collector optimization plan.
2025-10-23 12:22:26 +02:00
08d3454683 Enhance disk collector with individual drive health monitoring
- Add StoragePool and DriveInfo structures for grouping drives by mount point
- Implement SMART data collection for individual drives (health, temperature, wear)
- Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types
- Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear
- Add storage_type and underlying_devices to filesystem configuration
- Move hardcoded service directory mappings to NixOS configuration
- Move hardcoded host-to-user mapping to NixOS configuration
- Remove all unused code and fix compilation warnings
- Clean implementation with zero warnings and no dead code

Individual drives now show health status per storage pool:
Storage root (ext4): nvme0n1 PASSED 42°C 5% wear
Storage steampool (mergerfs): sda/sdb/sdc with individual health data
2025-10-22 19:59:25 +02:00
34822bd835 Fix systemd collector to use Status::Pending for transitional states 2025-10-21 19:08:58 +02:00
41208aa2a0 Implement status aggregation with notification batching 2025-10-21 18:12:42 +02:00
a937032eb1 Remove hardcoded defaults, require configuration file
- Remove all Default implementations from agent configuration structs
- Make configuration file required for agent startup
- Update NixOS module to generate complete agent.toml configuration
- Add comprehensive configuration options to NixOS module including:
  - Service include/exclude patterns for systemd collector
  - All thresholds and intervals
  - ZMQ communication settings
  - Notification and cache configuration
- Agent now fails fast if no configuration provided
- Eliminates configuration drift between defaults and NixOS settings
2025-10-21 00:01:26 +02:00
1e8da8c187 Add user service discovery to systemd collector
- Use systemctl --user commands to discover user-level services
- Include both user unit files and loaded user units
- Gracefully handle cases where user commands fail (no user session)
- Treat user services same as system services in filtering
- Enables monitoring of user-level Docker, development servers, etc.
2025-10-20 23:11:11 +02:00
1cc31ec26a Update service filters for better discovery
- Add ark-permissions to exclusion list (maintenance service)
- Add sunshine to service_name_filters (game streaming server)
- Improves service discovery for game streaming infrastructure
2025-10-20 23:01:03 +02:00