Compare commits

...

62 Commits

Author SHA1 Message Date
c8db463204 Add interactive SSH terminal session functionality
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
- Press 't' to open SSH session to current host in tmux split
- Uses 30% vertical split consistent with logs and rebuild commands
- Auto-closes tmux window when SSH session ends
- Provides direct host administration access from dashboard
- Uses same SSH configuration as rebuild operations

Version 0.1.65
2025-11-09 11:39:43 +01:00
e8e50ef9bb Replace empty panels with offline host message for better UX
All checks were successful
Build and Release / build-and-release (push) Successful in 2m33s
- Hide all system/backup/service panels when host is offline
- Show centered wake-up message with host status
- Display "Press 'w' to wake up host" if MAC address configured
- Provide clear visual indication when hosts are unreachable
- Improve user experience by removing confusing empty panels

Version 0.1.64
2025-11-08 18:28:28 +01:00
0faed9309e Improve host disconnection detection and fix notification exclusions
All checks were successful
Build and Release / build-and-release (push) Successful in 1m34s
- Add dedicated heartbeat transmission every 5 seconds independent of metric collection
- Fix host offline detection by clearing metrics for disconnected hosts
- Move exclude_email_metrics to NotificationConfig for better organization
- Add cleanup_offline_hosts method to remove stale metrics after heartbeat timeout
- Ensure offline hosts show proper status icons and visual indicators

Version 0.1.63
2025-11-08 11:33:32 +01:00
c980346d05 Fix heartbeat detection to properly detect offline hosts
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
- Add independent heartbeat checking timer (1 second interval) separate from metric reception
- Move get_connected_hosts() call outside metric receive condition to run periodically
- Remove duplicate update_hosts() call from metric processing to avoid redundancy
- Ensure offline host detection works even when no new metrics are received
- Fix issue where hosts going offline were never detected due to conditional heartbeat check
- Heartbeat timeouts now properly detected within configured timeout + 1 second
- Bump version to 0.1.62
2025-11-07 14:27:03 +01:00
3e3d3f0c2b Fix Tab key 1-second delay by reverting ZMQ to non-blocking mode
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
- Change receive_metrics() from blocking to DONTWAIT to prevent main loop freezing
- Eliminate 1-second ZMQ socket timeout that was blocking UI after Tab key press
- Main loop now continues immediately after immediate render instead of waiting
- Maintain heartbeat-based host detection while fixing visual responsiveness
- Fix blocking operation introduced when implementing heartbeat timeout mechanism
- Tab navigation now truly immediate without any network operation delays
- Bump version to 0.1.61
2025-11-06 12:04:49 +01:00
9eb7444d56 Cache localhost hostname to eliminate Tab key sluggishness
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Add cached localhost field to TuiApp struct to avoid repeated gethostname() system calls
- Initialize localhost once in constructor instead of calling gethostname() on every navigation
- Replace gethostname() calls in update_hosts() and navigate_host() with cached value
- Eliminate expensive system call bottleneck causing Tab key responsiveness issues
- Reduce Tab navigation from 2+ system calls to zero system calls (memory access only)
- Fix performance regression introduced by immediate UI refresh implementation
- Bump version to 0.1.60
2025-11-06 11:53:49 +01:00
278d1763aa Fix Tab key responsiveness with immediate UI refresh
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Add immediate terminal.draw() call after input handling in main loop
- Eliminate delay between Tab key press and visual host switching
- Provide instant visual feedback for all navigation inputs
- Maintain existing metric update render cycle without duplication
- Fix UI update timing issue where changes only appeared on metric intervals
- Bump version to 0.1.59
2025-11-06 11:30:26 +01:00
f874264e13 Optimize dashboard performance for responsive Tab key navigation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
- Replace 6 separate filter operations with single-pass metric categorization in update_metrics
- Reduce CPU overhead from 6x to 1x work per metric update cycle
- Fix Tab key sluggishness caused by competing expensive filtering operations
- Maintain exact same functionality with significantly better performance
- Improve UI responsiveness for host switching and navigation
- Bump version to 0.1.58
2025-11-06 11:18:39 +01:00
5f6e47ece5 Implement heartbeat-based host connectivity detection
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
- Add agent_heartbeat metric to agent transmission for reliable host detection
- Update dashboard to track heartbeat timestamps per host instead of general metrics
- Add configurable heartbeat_timeout_seconds to dashboard ZMQ config (default 10s)
- Remove unused timeout_ms from agent config and revert to non-blocking command reception
- Remove unused heartbeat_interval_ms from agent configuration
- Host disconnect detection now uses dedicated heartbeat metrics for improved reliability
- Bump version to 0.1.57
2025-11-06 11:04:01 +01:00
0e7cf24dbb Add exclude_email_metrics configuration option
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
- Add exclude_email_metrics field to AgentConfig for filtering email notifications
- Metrics matching excluded names skip notification processing but still appear in dashboard
- Optional field with serde(default) for backward compatibility
- Bump version to 0.1.56
2025-11-06 10:31:25 +01:00
2d080a2f51 Implement WakeOnLAN functionality and offline status handling
All checks were successful
Build and Release / build-and-release (push) Successful in 1m35s
- Add WakeOnLAN support for offline hosts using 'w' key
- Configure MAC addresses for all infrastructure hosts
- Implement Status::Offline for disconnected hosts
- Exclude offline hosts from status aggregation to prevent false alerts
- Update versions to 0.1.55
2025-10-31 09:28:31 +01:00
6179bd51a7 Implement WakeOnLAN functionality with simplified configuration
All checks were successful
Build and Release / build-and-release (push) Successful in 2m32s
- Add Status::Offline enum variant for disconnected hosts
- All configured hosts now always visible showing offline status when disconnected
- Add WakeOnLAN support using wake-on-lan Rust crate
- Implement w key binding to wake offline hosts with MAC addresses
- Simplify configuration to single [hosts] section with MAC addresses only
- Change critical status icon from ◯ to ! for better visibility
- Add proper MAC address parsing and error handling
- Silent WakeOnLAN operation with logging for success/failure

Configuration format:
[hosts]
hostname = { mac_address = "AA:BB:CC:DD:EE:FF" }
2025-10-31 09:03:01 +01:00
57de4c366a Bump version to 0.1.53
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
2025-10-30 17:00:39 +01:00
e18778e962 Fix string syntax error in rebuild command
- Replace raw string with escaped string to fix compilation error
- Maintain same functionality with proper string formatting
2025-10-30 16:59:41 +01:00
e4469a0ebf Replace tmux popups with split windows for better log navigation
Some checks failed
Build and Release / build-and-release (push) Failing after 1m9s
- Change J/L log commands from popups to split windows for scrolling support
- Change rebuild command from popup to split window with consistent 30% height
- Add auto-close behavior with bash -c "command; exit" wrapper for logs
- Add "press any key to close" prompt with visual separators for rebuild
- Enable proper tmux copy mode and navigation in all split windows

Users can now scroll through logs, copy text, and resize windows while
maintaining clean auto-close behavior for all operations.
2025-10-30 15:30:58 +01:00
6fedf4c7fc Add sudo support and line count to log viewing commands
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
- Add sudo to journalctl command for proper systemd log access
- Add sudo to tail command for system log file access
- Add -n 50 to tail command to match journalctl behavior
- Both J and L keys now show last 50 lines before following

Ensures consistent behavior and proper permissions for all log viewing.
2025-10-30 13:26:04 +01:00
3f6dffa66e Add custom service log file support with L key
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Add ServiceLogConfig structure for per-host service log paths
- Implement L key handler for custom log file viewing via tmux popup
- Update dashboard config to support service_logs HashMap
- Add tail -f command execution over SSH for real-time log streaming
- Update status line to show L: Custom shortcut
- Document configuration format in CLAUDE.md

Each service can now have custom log file paths configured per host,
accessible via L key with same tmux popup interface as journalctl.
2025-10-30 13:12:36 +01:00
1b64fbde3d Fix tmux popup title flag for service logs feature
All checks were successful
Build and Release / build-and-release (push) Successful in 1m47s
Fix journalctl popup that was failing with 'can't find session' error:

Issue Resolution:
- Change tmux display-popup flag from -t to -T for setting popup title
- -t flag was incorrectly trying to target a session named 'Logs: servicename'
- -T flag correctly sets the popup window title

The J key (Shift+j) service logs feature now works properly, opening
an 80% tmux popup with journalctl -f for real-time log viewing.

Bump version to v0.1.49
2025-10-30 12:42:58 +01:00
4f4c3b0d6e Improve notification behavior during startup and recovery
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Fix notification issues for better operational experience:

Startup Notification Suppression:
- Suppress notifications for transitions from Status::Unknown during agent/server startup
- Prevents notification spam when services transition from Unknown to Warning/Critical on restart
- Only real status changes (not initial discovery) trigger notifications
- Maintains alerting for actual service state changes after startup

Recovery Notification Refinement:
- Recovery notifications only sent when ALL services reach OK status
- Individual service recoveries suppressed if other services still have problems
- Ensures recovery notifications indicate complete system health restoration
- Prevents premature celebration when partial recoveries occur

Result: Clean startup experience without false alerts and meaningful recovery
notifications that truly indicate full system health restoration.

Bump version to v0.1.48
2025-10-30 12:35:23 +01:00
bd20f0cae1 Fix user-stopped flag timing and service transition handling
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Correct user-stopped service behavior during startup transitions:

User-Stopped Flag Timing Fix:
- Clear user-stopped flag only when service actually becomes active, not when start command succeeds
- Remove premature flag clearing from service control handler
- Add automatic flag clearing when service status metrics show active state
- Services retain user-stopped status during activating/transitioning states

Service Transition Handling:
- User-stopped services in activating state now report Status::OK instead of Status::Pending
- Prevents host warnings during legitimate service startup transitions
- Maintains accurate status reporting throughout service lifecycle
- Failed service starts preserve user-stopped flags correctly

Journalctl Popup Fix:
- Fix terminal corruption when using J key for service logs
- Correct command quoting to prevent tmux popup interference
- Stable popup display without dashboard interface corruption

Result: Clean service startup experience with no false warnings and proper
user-stopped tracking throughout the entire service lifecycle.

Bump version to v0.1.47
2025-10-30 12:05:54 +01:00
11c9a5f9d2 Add service logs feature and improve tmux popup sizing
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
New Features:
- Add journalctl service logs viewer via Shift+J key
- Opens tmux popup with real-time log streaming using journalctl -f
- Shows last 50 lines and follows new log entries for selected service
- Popup titled 'Logs: service.service' for clear context

Improvements:
- Increase tmux popup size to 80% width and height for better readability
- Applies to both rebuild (R) and logs (J) popups
- Compact status line text to fit new J: Logs shortcut
- Updated documentation with new key binding

Navigation Updates:
- J: Show service logs (journalctl in tmux popup)
- Status line: Tab: Host • ↑↓/jk: Select • r: Rebuild • s/S: Start/Stop • J: Logs • q: Quit

Bump version to v0.1.46
2025-10-30 11:21:14 +01:00
aeae60146d Fix user-stopped service display and flag timing issues
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
Improve user-stopped service tracking behavior:

Service Display Fix:
- Services widget now shows actual systemctl status (active/inactive)
- Use info.status instead of hardcoded text based on widget_status
- User-stopped services correctly display 'inactive' with green OK icon
- Prevents misleading 'active' display for stopped services

User-Stopped Flag Timing Fix:
- Clear user-stopped flag AFTER successful service start, not when command sent
- Prevents warnings during service startup transition period
- Service remains Status::OK during 'activating' state for user-stopped services
- Flag only cleared when systemctl start command actually succeeds
- Failed start attempts preserve user-stopped flag

Result: Clean service state tracking with accurate display and no false alerts
during intentional user operations.

Bump version to v0.1.45
2025-10-30 11:11:39 +01:00
a82c81e8e3 Fix service control by adding .service suffix to systemctl commands
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
Service stop/start operations were failing because systemctl commands
were missing the .service suffix. This caused the new user-stopped
tracking feature to mark services but not actually control them.

Changes:
- Add .service suffix to systemctl commands in service control handler
- Matches pattern used throughout systemd collector
- Fixes service start/stop functionality via dashboard

Clean up legacy documentation:
- Remove outdated TODO.md, AGENTS.md, and test files
- Update CLAUDE.md with current architecture and rules only
- Comprehensive README.md rewrite with technical documentation
- Document user-stopped service tracking feature

Bump version to v0.1.44
2025-10-30 11:00:36 +01:00
c56e9d7be2 Implement user-stopped service tracking system
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
Add comprehensive tracking for services stopped via dashboard to prevent
false alerts when users intentionally stop services.

Features:
- User-stopped services report Status::Ok instead of Warning
- Persistent storage survives agent restarts
- Dashboard sends UserStart/UserStop commands
- Agent tracks and syncs user-stopped state globally
- Systemd collector respects user-stopped flags

Implementation:
- New service_tracker module with persistent JSON storage
- Enhanced ServiceAction enum with UserStart/UserStop variants
- Global singleton tracker accessible by collectors
- Service status logic updated to check user-stopped flag
- Dashboard version now uses CARGO_PKG_VERSION automatically

Bump version to v0.1.43
2025-10-30 10:42:56 +01:00
c8f800a1e5 Implement git commit hash tracking for build display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m24s
- Add get_git_commit() method to read /var/lib/cm-dashboard/git-commit
- Replace NixOS build version with actual git commit hash
- Show deployed commit hash as 'Build:' value for accurate tracking
- Enable verification of which exact commit is deployed per host
- Update version to 0.1.42
2025-10-29 15:29:02 +01:00
fc6b3424cf Add hostname to NixOS title and make dashboard title bold
All checks were successful
Build and Release / build-and-release (push) Successful in 2m46s
- Change system panel title from 'NixOS:' to 'NixOS hostname:'
- Make main dashboard title 'cm-dashboard' bold in top bar
- Remove unused Typography::title() function to fix warnings
- Update SystemWidget::render_with_scroll to accept hostname parameter
- Update version to 0.1.41 in all Cargo.toml files and dashboard code
2025-10-29 14:24:17 +01:00
35e06c6734 Implement clean NixOS rebuild tmux popup
All checks were successful
Build and Release / build-and-release (push) Successful in 1m22s
- Replace complex ASCII logo with simple text header
- Remove extra blank lines for compact display
- Left-align text for clean appearance
- Add spacing after target line for readability
- Simplify heredoc format for better maintainability
2025-10-28 23:59:05 +01:00
783d233319 Add CM Dashboard ASCII logo to rebuild tmux popup
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
- Display branded CM Dashboard ASCII logo in green when rebuild starts
- Shows logo immediately when tmux popup opens for better UX
- Includes rebuild target hostname and visual separator
- Enhances rebuild process with professional branding
- Bump version to v0.1.39
2025-10-28 23:12:09 +01:00
6509a2b91a Make nginx site latency thresholds configurable and simplify status logic
All checks were successful
Build and Release / build-and-release (push) Successful in 4m25s
- Replace hardcoded 500ms/2000ms thresholds with configurable nginx_latency_critical_ms
- Simplify status logic to only OK or Critical (no Warning status)
- Add validation for nginx latency threshold configuration
- Re-enable nginx site collection with configurable thresholds
- Resolves issue where sites showed critical at 2000ms despite 30s timeout setting
- Bump version to v0.1.38
2025-10-28 21:24:34 +01:00
52f8c40b86 Fix title bar layout constraints to prevent text disappearing
All checks were successful
Build and Release / build-and-release (push) Successful in 2m12s
- Set fixed width (15 chars) for left side to prevent chunk collapse
- Resolves issue where "cm-dashboard" text would flash and disappear
- Ensures consistent visibility of title text in dynamic status bar
- Bump version to v0.1.37
2025-10-28 18:56:12 +01:00
a86b5ba8f9 Implement dynamic status-based title bar with infrastructure health indicator
All checks were successful
Build and Release / build-and-release (push) Successful in 1m15s
- Title bar background now dynamically changes based on worst-case status across all hosts
- Green: all OK, Yellow: warnings present, Red: critical issues, Blue: pending, Gray: unknown
- Provides immediate visual feedback of overall infrastructure health
- Added 1-character padding on both sides of title bar
- Maintains dark text for visibility against all status background colors
- Bump version to v0.1.36
2025-10-28 18:47:02 +01:00
1b964545be Fix storage display parsing and improve title bar UI
All checks were successful
Build and Release / build-and-release (push) Successful in 1m14s
- Fix disk drive name extraction for mount points with underscores (e.g., /mnt/steampool)
- Replace confusing "1" and "2" drive names with proper device names like "sda1", "sda2"
- Update title bar with blue background and dark text styling
- Right-align host list in title bar while keeping "cm-dashboard" on left
- Bump version to v0.1.35
2025-10-28 18:32:12 +01:00
97aa1708c2 Improve service selection UI and help text
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
- Fix service icons to use background color when selected for better visibility against blue selection background
- Combine start/stop service help text entries into single "s/S: Start/Stop Service"
- Change help text keys to lowercase (r: Rebuild Host, q: Quit)
- Bump version to v0.1.34
2025-10-28 18:17:15 +01:00
d12689f3b5 Update CLAUDE.md to reflect simplified navigation and current status
Updated documentation to reflect major UI improvements:

- Documented simplified navigation system (no more panel switching)
- Updated current status to October 28, 2025 with v0.1.33
- Described complete service discovery and visibility features
- Added vi-style j/k navigation documentation
- Removed outdated panel-focused navigation descriptions
- Updated visual feedback documentation for transitional icons
- Consolidated service discovery achievements and current working state
2025-10-28 17:00:40 +01:00
f22e3ee95e Simplify navigation and add vi-style keys
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
Major UI simplification and navigation improvements:

Changes:
- Removed panel selection concept entirely (no more Shift+Tab)
- Service selection always visible with blue highlighting
- Up/Down arrows now directly control service selection
- Added j/k vi-style navigation keys as alternatives to arrow keys
- Removed panel focus borders - all panels look uniform
- Service commands (s/S) work without panel focus requirements
- Updated keyboard shortcuts to reflect simplified navigation

Navigation:
- Tab: Switch hosts
- ↑↓/jk: Select service (always works)
- R: Rebuild host
- s: Start service
- S: Stop service
- q: Quit

The interface is now much simpler and more intuitive with direct service control.
2025-10-28 16:31:35 +01:00
e890c5e810 Fix service status detection with combined discovery and status approach
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Enhanced service discovery to properly show status for all services:

Changes:
- Use systemctl list-unit-files for complete service discovery (finds all services)
- Use systemctl list-units --all for batch runtime status fetching
- Combine both datasets to get comprehensive service list with correct status
- Services found in unit-files but not runtime are marked as inactive (Warning status)
- Eliminates 'unknown' status issue while maintaining complete service visibility

Now inactive services show as Warning (yellow ◐) and active services show as Ok (green ●)
instead of all services showing as unknown (? icon).
2025-10-28 15:56:47 +01:00
078c30a592 Fix service discovery to show all configured services regardless of state
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
Changed service discovery from 'systemctl list-units --all' to 'systemctl list-unit-files'
to ensure ALL service unit files are discovered, including services that have never been started.

Changes:
- Updated systemctl command to use list-unit-files instead of list-units --all
- Modified parsing logic to handle unit file format (2 fields vs 4 fields)
- Set placeholder values in discovery cache, actual runtime status fetched during collection
- This ensures all configured services (like inactive ARK servers) appear in dashboard

The issue was that list-units --all only shows services systemd has loaded/attempted to load,
but list-unit-files shows ALL service unit files regardless of their runtime state.
2025-10-28 15:41:58 +01:00
a847674004 Remove service restart functionality and make R always rebuild host
All checks were successful
Build and Release / build-and-release (push) Successful in 2m6s
Simplified keyboard controls by removing service restart functionality:

- Removed 'r' key restart functionality from Services panel
- Made 'R' key always trigger system rebuild regardless of focused panel
- Updated context shortcuts to show 'R: Rebuild Host' globally
- Removed all ServiceRestart enum variants and associated code:
  - UiCommand::ServiceRestart
  - CommandType::ServiceRestart
  - ServiceAction::Restart
- Cleaned up pending transition logic to only handle Start/Stop commands

The 'R' key now consistently rebuilds the current host from any panel,
while 's' and 'S' continue to handle service start/stop in Services panel.
2025-10-28 15:26:15 +01:00
2618f6b62f Fix transitional icons and selection highlighting visibility
All checks were successful
Build and Release / build-and-release (push) Successful in 1m15s
Resolved issues with transitional service icons not being properly visible:

- Removed 3-second timeout that was clearing pending transitions prematurely
- Fixed selection highlighting disappearing when transitional icons appeared
- Implemented conditional coloring for transitional icons:
  - Blue when service is not selected
  - Dark background color when service is selected (for visibility against blue selection)
- Transitions now persist until actual service status changes occur

Both selection highlighting and transitional icons are now visible simultaneously.
2025-10-28 15:14:49 +01:00
c3fc5a181d Fix service name mismatch in pending transitions lookup
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
The root cause of transitional service icons not showing was that service names
were stored as raw names (e.g., "sshd") in pending_transitions but looked up
against formatted display lines (e.g., "sshd                    active     1M     ").

Changes:
- Modified display_lines structure to include both formatted text and raw service names
- Updated rendering loop to use raw service names for pending transition lookups
- Fixed get_selected_service() method to use the new tuple structure
- Transitional icons (↑ ↓ ↻) should now appear correctly when pressing s/S/r keys
2025-10-28 15:00:48 +01:00
3f45a172b3 Add debug rendering to test transitional icon visibility
All checks were successful
Build and Release / build-and-release (push) Successful in 1m14s
- Force sshd service to always show "↑ starting" for debugging
- Test if basic directional arrow rendering works in services widget
- Temporary debug change to isolate rendering vs logic issues
- Will help determine if problem is in pending transitions or rendering

If arrow appears: pending transitions logic issue
If no arrow: basic rendering path issue
2025-10-28 14:49:24 +01:00
5b12c12228 Fix transitional icons by always storing pending transitions for visual feedback
All checks were successful
Build and Release / build-and-release (push) Successful in 1m13s
- Store pending transitions even for redundant commands (start active service)
- Add 3-second timeout for redundant command visual feedback
- Include timestamp in pending transitions to enable timeout clearing
- Show directional arrows immediately regardless of command validation result
- Fix core issue where state validation prevented visual feedback storage

Now pressing s/S/r always shows immediate directional arrows, even for
redundant operations, providing consistent visual feedback to users.
2025-10-28 14:38:33 +01:00
651b801de3 Fix transitional service icons being overridden by selection highlighting
All checks were successful
Build and Release / build-and-release (push) Successful in 1m14s
- Prevent selection highlighting when service has pending transition
- Allow directional arrows (↑ ↓ ↻) to show through on selected services
- Fix core issue where selection styling was overwriting transitional icons
- Transitional icons now properly visible during service command execution

The selection highlighting logic now skips services with pending transitions,
ensuring that directional arrows are visible when executing service commands.
2025-10-28 14:22:40 +01:00
71b9f93d7c Implement immediate transitional service icons with pending state tracking
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
- Replace timeout-based command status with pending service transitions
- Show immediate directional arrows when pressing service commands (↑ ↓ ↻)
- Track original service status and command type for each pending operation
- Automatically clear transitional icons when real status updates arrive
- Remove unused TerminalPopup and CommandStatus infrastructure
- Simplify visual feedback system using state-based approach

Service commands now provide instant visual feedback that persists until
the actual service state changes, eliminating timing issues and improving UX.
2025-10-28 14:11:59 +01:00
ae70946c61 Implement state-aware service command validation with immediate visual feedback
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
- Add service state detection before executing start/stop/restart commands
- Prevent redundant operations (start active services, stop inactive services)
- Show immediate directional arrows for command feedback (↑ starting, ↓ stopping, ↻ restarting)
- Add get_service_status() method to ServicesWidget for state access
- Remove unused TerminalPopup code and dangling methods
- Clean up warnings and unused code throughout codebase

Service commands now validate current state and provide instant UX feedback while
preserving existing status icons and colors during transitions.
2025-10-28 13:48:24 +01:00
2910b7d875 Update version to 0.1.22 and fix system metric status calculation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
- Fix /tmp usage status to use proper thresholds instead of hardcoded Ok status
- Fix wear level status to use configurable thresholds instead of hardcoded values
- Add dedicated tmp_status field to SystemWidget for proper /tmp status display
- Remove host-level hourglass icon during service operations
- Implement immediate service status updates after start/stop/restart commands
- Remove active users display and collection from NixOS section
- Fix immediate host status aggregation transmission to dashboard
2025-10-28 13:21:56 +01:00
43242debce Update version to 0.1.21 and fix dashboard data caching
All checks were successful
Build and Release / build-and-release (push) Successful in 1m13s
- Separate dashboard updates from email notifications for immediate status aggregation
- Add metric caching to MetricCollectionManager for instant dashboard updates
- Dashboard now receives cached data every 1 second instead of waiting for collection intervals
- Fix transmission to use cached metrics rather than triggering fresh collection
- Email notifications maintain separate 60-second batching interval
- Update configurable email notification aggregation interval
2025-10-28 12:16:31 +01:00
a2519b2814 Update version to 0.1.20 and fix email notification aggregation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
- Fix email notification aggregation to send batched notifications instead of individual emails
- Fix startup data collection to properly process initial status without triggering change notifications
- Maintain event-driven transmission while preserving aggregated notification batching
- Update version from 0.1.19 to 0.1.20 across all components
2025-10-28 10:48:29 +01:00
91f037aa3e Update to v0.1.19 with event-driven status aggregation
All checks were successful
Build and Release / build-and-release (push) Successful in 2m4s
Major architectural improvements:

CORE CHANGES:
- Remove notification_interval_seconds - status aggregation now immediate
- Status calculation moved to collection phase instead of transmission
- Event-driven transmission triggers immediately on status changes
- Dual transmission strategy: immediate on change + periodic backup
- Real-time notifications without batching delays

TECHNICAL IMPROVEMENTS:
- process_metric() now returns bool indicating status change
- Immediate ZMQ broadcast when status changes detected
- Status aggregation happens during metric collection, not later
- Legacy get_nixos_build_info() method removed (unused)
- All compilation warnings fixed

BEHAVIOR CHANGES:
- Critical alerts sent instantly instead of waiting for intervals
- Dashboard receives real-time status updates
- Notifications triggered immediately on status transitions
- Backup periodic transmission every 1s ensures heartbeat

This provides much more responsive monitoring with instant alerting
while maintaining the reliability of periodic transmission as backup.
2025-10-28 10:36:34 +01:00
627c533724 Update to v0.1.18 with per-collector intervals and tmux check
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Implement per-collector interval timing respecting NixOS config
- Remove all hardcoded timeout/interval values and make configurable
- Add tmux session requirement check for TUI mode (bypassed for headless)
- Update agent to send config hash in Build field instead of nixos version
- Add nginx check interval, HTTP timeouts, and ZMQ transmission interval configs
- Update NixOS configuration with new configurable values

Breaking changes:
- Build field now shows nix store config hash (8 chars) instead of nixos version
- All intervals now follow individual collector configuration instead of global

New configuration fields:
- systemd.nginx_check_interval_seconds
- systemd.http_timeout_seconds
- systemd.http_connect_timeout_seconds
- zmq.transmission_interval_seconds
2025-10-28 10:08:25 +01:00
b1bff4857b Update versions to 0.1.17 and fix backup panel visibility
All checks were successful
Build and Release / build-and-release (push) Successful in 1m16s
- Update all Cargo.toml versions to 0.1.17
- Fix backup panel to only show when meaningful data exists
- Hide backup panel when no backup configured
2025-10-27 18:50:20 +01:00
f8a061d496 Fix tmux popup SSH command syntax for interactive shell
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
- Use tmux display-popup instead of popup with incorrect arguments
- Add -tt flag for proper pseudo-terminal allocation
- Use bash -ic to load shell aliases in SSH session
- Enable rebuild_git alias to work through SSH popup
2025-10-27 16:08:38 +01:00
e61a845965 Replace complex SystemRebuild with simple SSH + tmux popup approach
All checks were successful
Build and Release / build-and-release (push) Successful in 2m6s
- Remove all SystemRebuild command infrastructure from agent and dashboard
- Replace with direct tmux popup execution: ssh {user}@{host} {alias}
- Add configurable SSH user and rebuild alias in dashboard config
- Eliminate agent process crashes during rebuilds
- Simplify architecture by removing ZMQ command streaming complexity
- Clean up all related dead code and fix compilation warnings

Benefits:
- Process isolation: rebuild runs independently via SSH
- Crash resilience: agent/dashboard can restart without affecting rebuilds
- Configuration flexibility: SSH user and alias configurable per deployment
- Operational simplicity: standard tmux popup interface
2025-10-27 14:25:45 +01:00
ac5d2d4db5 Fix compilation error in agent service status check
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
2025-10-26 23:42:19 +01:00
69892a2d84 Implement systemd service approach for nixos-rebuild operations
Some checks failed
Build and Release / build-and-release (push) Failing after 1m58s
- Add cm-rebuild systemd service for process isolation
- Add sudo permissions for service control and journal access
- Remove verbose flag for cleaner output
- Ensures reliable rebuild operations without agent crashes
2025-10-26 23:18:09 +01:00
a928d73134 Update Cargo.toml versions to 0.1.11
All checks were successful
Build and Release / build-and-release (push) Successful in 3m4s
- Update agent, dashboard, and shared package versions from 0.1.0 to 0.1.11
- Ensures agent version reporting shows correct v0.1.11 instead of v0.1.0
- Synchronize package versions with git tag for consistent version tracking
2025-10-26 14:12:03 +01:00
af52d49194 Fix system panel layout and switch to version-based agent reporting
All checks were successful
Build and Release / build-and-release (push) Successful in 2m6s
- Remove auto-close behavior from terminal popup for manual review
- Fix system panel to show correct NixOS section layout
- Add missing Active users line after Agent version
- Switch agent version from nix store hash to actual version number (v0.1.11)
- Display full version string without truncation for clear version tracking
2025-10-26 13:34:56 +01:00
bc94f75328 Enable real-time output streaming for nixos-rebuild command
All checks were successful
Build and Release / build-and-release (push) Successful in 1m24s
- Replace simulated progress messages with actual stdout/stderr capture
- Stream all nixos-rebuild output line-by-line to terminal popup
- Show transparent build process including downloads, compilation, and activation
- Maintain real-time visibility into complete rebuild process
2025-10-26 13:00:53 +01:00
b6da71b7e7 Implement real-time terminal popup for system rebuild operations
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
- Add terminal popup UI component with 80% screen coverage and terminal styling
- Extend ZMQ protocol with CommandOutputMessage for streaming output
- Implement real-time output streaming in agent system rebuild handler
- Add keyboard controls (ESC/Q to close, ↑↓ to scroll) for popup interaction
- Fix system panel Build display to show actual NixOS build instead of config hash
- Update service filters in README with wildcard patterns for better matching
- Add periodic progress updates during nixos-rebuild execution
- Integrate command output handling in dashboard main loop
2025-10-26 11:39:03 +01:00
aaf7edfbce Implement cross-host agent version comparison
- MetricStore tracks agent versions from all hosts
- Detects version mismatches using most common version as reference
- Dashboard logs warnings for hosts with outdated agents
- Foundation for visual version mismatch indicators in UI
- Helps identify deployment inconsistencies across infrastructure
2025-10-26 10:42:26 +01:00
bb72c42726 Add agent version reporting and display
- Agent reports version via agent_version metric using nix store hash
- Dashboard displays agent version in system widget
- Foundation for cross-host version comparison
- Both agent -V and dashboard show versions
2025-10-26 10:38:20 +01:00
af5f96ce2f Fix sed command in automated NixOS update workflow
All checks were successful
Build and Release / build-and-release (push) Successful in 1m23s
- Use pipe delimiter instead of forward slash to avoid conflicts
- Should fix 'number option to s command may not be zero' error
- More robust regex pattern matching
2025-10-26 01:13:58 +02:00
34 changed files with 1980 additions and 1563 deletions

View File

@@ -113,8 +113,8 @@ jobs:
NIX_HASH="sha256-$(python3 -c "import base64, binascii; print(base64.b64encode(binascii.unhexlify('$NEW_HASH')).decode())")" NIX_HASH="sha256-$(python3 -c "import base64, binascii; print(base64.b64encode(binascii.unhexlify('$NEW_HASH')).decode())")"
# Update the NixOS configuration # Update the NixOS configuration
sed -i "s/version = \"v[^\"]*\"/version = \"$VERSION\"/" hosts/common/cm-dashboard.nix sed -i "s|version = \"v[^\"]*\"|version = \"$VERSION\"|" hosts/common/cm-dashboard.nix
sed -i "s/sha256 = \"sha256-[^\"]*\"/sha256 = \"$NIX_HASH\"/" hosts/common/cm-dashboard.nix sed -i "s|sha256 = \"sha256-[^\"]*\"|sha256 = \"$NIX_HASH\"|" hosts/common/cm-dashboard.nix
# Commit and push changes # Commit and push changes
git config user.name "Gitea Actions" git config user.name "Gitea Actions"

View File

@@ -1,3 +0,0 @@
# Agent Guide
Agents working in this repo must follow the instructions in `CLAUDE.md`.

403
CLAUDE.md
View File

@@ -2,207 +2,76 @@
## Overview ## Overview
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection. A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
## Implementation Strategy ## Current Features
### Current Implementation Status ### Core Functionality
- **Real-time Monitoring**: CPU, RAM, Storage, and Service status
- **Service Management**: Start/stop services with user-stopped tracking
- **Multi-host Support**: Monitor multiple servers from single dashboard
- **NixOS Integration**: System rebuild via SSH + tmux popup
- **Backup Monitoring**: Borgbackup status and scheduling
**System Panel Enhancement - COMPLETED** ### User-Stopped Service Tracking
- Services stopped via dashboard are marked as "user-stopped"
- User-stopped services report Status::OK instead of Warning
- Prevents false alerts during intentional maintenance
- Persistent storage survives agent restarts
- Automatic flag clearing when services are restarted via dashboard
All system panel features successfully implemented: ### Custom Service Logs
- **NixOS Collector**: Created collector for version and active users - Configure service-specific log file paths per host in dashboard config
- **System Widget**: Unified widget combining NixOS, CPU, RAM, and Storage - Press `L` on any service to view custom log files via `tail -f`
- **Build Display**: Shows NixOS build information without codename - Configuration format in dashboard config:
-**Active Users**: Displays currently logged in users ```toml
-**Tmpfs Monitoring**: Added /tmp usage to RAM section [service_logs]
-**Agent Deployment**: NixOS collector working in production hostname1 = [
{ service_name = "nginx", log_file_path = "/var/log/nginx/access.log" },
**Keyboard Navigation and Service Management - COMPLETED** { service_name = "app", log_file_path = "/var/log/myapp/app.log" }
]
All keyboard navigation and service selection features successfully implemented: hostname2 = [
-**Panel Navigation**: Shift+Tab cycles through visible panels only (System → Services → Backup) { service_name = "database", log_file_path = "/var/log/postgres/postgres.log" }
-**Service Selection**: Up/Down arrows navigate through parent services with visual cursor ]
-**Focus Management**: Selection highlighting only visible when Services panel focused
-**Status Preservation**: Service health colors maintained during selection (green/red icons)
-**Smart Panel Switching**: Only cycles through panels with data (backup panel conditional)
-**Scroll Support**: All panels support content scrolling with proper overflow indicators
**Current Status - October 25, 2025:**
- All keyboard navigation features working correctly ✅
- Service selection cursor implemented with focus-aware highlighting ✅
- Panel scrolling fixed for System, Services, and Backup panels ✅
- Build display working: "Build: 25.05.20251004.3bcc93c" ✅
- Configuration hash display: Currently shows git hash, needs to be fixed ❌
**Target Layout:**
```
NixOS:
Build: 25.05.20251004.3bcc93c
Config: d8ivwiar # Should show nix store hash (8 chars) from deployed system
Active users: cm, simon
CPU:
● Load: 0.02 0.31 0.86 • 3000MHz
RAM:
● Usage: 33% 2.6GB/7.6GB
● /tmp: 0% 0B/2.0GB
Storage:
● root (Single):
├─ ● nvme0n1 W: 1%
└─ ● 18% 167.4GB/928.2GB
``` ```
**System panel layout fully implemented with blue tree symbols ✅** ### Service Management
**Tree symbols now use consistent blue theming across all panels ✅** - **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services
**Overflow handling restored for all widgets ("... and X more") ✅** - **Service Actions**:
**Agent hash display working correctly ✅** - `s` - Start service (sends UserStart command)
- `S` - Stop service (sends UserStop command)
- `J` - Show service logs (journalctl in tmux popup)
- `L` - Show custom log files (tail -f custom paths in tmux popup)
- `R` - Rebuild current host
- **Visual Status**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
- **Transitional Icons**: Blue arrows during operations
### Current Keyboard Navigation Implementation ### Navigation
- **Tab**: Switch between hosts
**Navigation Controls:** - **↑↓ or j/k**: Select services
- **Tab**: Switch between hosts (cmbox, srv01, srv02, steambox, etc.) - **J**: Show service logs (journalctl)
- **Shift+Tab**: Cycle through visible panels (System → Services → Backup → System) - **L**: Show custom log files
- **Up/Down (System/Backup)**: Scroll through panel content
- **Up/Down (Services)**: Move service selection cursor between parent services
- **q**: Quit dashboard - **q**: Quit dashboard
**Panel-Specific Features:** ## Core Architecture Principles
- **System Panel**: Scrollable content with CPU, RAM, Storage details
- **Services Panel**: Service selection cursor for parent services only (docker, nginx, postgresql, etc.)
- **Backup Panel**: Scrollable repository list with proper overflow handling
**Visual Feedback:**
- **Focused Panel**: Blue border and title highlighting
- **Service Selection**: Blue background with preserved status icon colors (green ● for active, red ● for failed)
- **Focus-Aware Selection**: Selection highlighting only visible when Services panel focused
- **Dynamic Statusbar**: Context-aware shortcuts based on focused panel
### Remote Command Execution - WORKING ✅
**All Issues Resolved (as of 2025-10-24):**
-**ZMQ Command Protocol**: Extended with ServiceControl and SystemRebuild variants
-**Agent Handlers**: systemctl and nixos-rebuild execution with maintenance mode
-**Dashboard Integration**: Keyboard shortcuts execute commands
-**Service Control**: Fixed toggle logic - replaced with separate 's' (start) and 'S' (stop)
-**System Rebuild**: Fixed permission issues and sandboxing problems
-**Git Clone Approach**: Implemented for nixos-rebuild to avoid directory permissions
-**Visual Feedback**: Directional arrows for service status (↑ starting, ↓ stopping, ↻ restarting)
**Keyboard Controls Status:**
- **Services Panel**:
- R (restart) ✅ Working
- s (start) ✅ Working
- S (stop) ✅ Working
- **System Panel**: R (nixos-rebuild) ✅ Working with --option sandbox false
- **Backup Panel**: B (trigger backup) ❓ Not implemented
**Visual Feedback Implementation - IN PROGRESS:**
Context-appropriate progress indicators for each panel:
**Services Panel** (Service status transitions):
```
● nginx active → ⏳ nginx restarting → ● nginx active
● docker active → ⏳ docker stopping → ● docker inactive
```
**System Panel** (Build progress in NixOS section):
```
NixOS:
Build: 25.05.20251004.3bcc93c → Build: [████████████ ] 65%
Active users: cm, simon Active users: cm, simon
```
**Backup Panel** (OnGoing status with progress):
```
Latest backup: → Latest backup:
● 2024-10-23 14:32:15 ● OnGoing
└─ Duration: 1.3m └─ [██████ ] 60%
```
**Critical Configuration Hash Fix - HIGH PRIORITY:**
**Problem:** Configuration hash currently shows git commit hash instead of actual deployed system hash.
**Current (incorrect):**
- Shows git hash: `db11f82` (source repository commit)
- Not accurate - doesn't reflect what's actually deployed
**Target (correct):**
- Show nix store hash: `d8ivwiar` (first 8 chars from deployed system)
- Source: `/nix/store/d8ivwiarhwhgqzskj6q2482r58z46qjf-nixos-system-cmbox-25.05.20251004.3bcc93c`
- Pattern: Extract hash from `/nix/store/HASH-nixos-system-HOSTNAME-VERSION`
**Benefits:**
1. **Deployment Verification:** Confirms rebuild actually succeeded
2. **Accurate Status:** Shows what's truly running, not just source
3. **Rebuild Completion Detection:** Hash change = rebuild completed
4. **Rollback Tracking:** Each deployment has unique identifier
**Implementation Required:**
1. Agent extracts nix store hash from `ls -la /run/current-system`
2. Reports this as `system_config_hash` metric instead of git hash
3. Dashboard displays first 8 characters: `Config: d8ivwiar`
**Next Session Priority Tasks:**
**Remaining Features:**
1. **Fix Configuration Hash Display (CRITICAL)**:
- Use nix store hash instead of git commit hash
- Extract from `/run/current-system` -> `/nix/store/HASH-nixos-system-*`
- Enables proper rebuild completion detection
2. **Command Response Protocol**:
- Agent sends command completion/failure back to dashboard via ZMQ
- Dashboard updates UI status from ⏳ to ● when commands complete
- Clear success/failure status after timeout
3. **Backup Panel Features**:
- Implement backup trigger functionality (B key)
- Complete visual feedback for backup operations
- Add backup progress indicators
**Enhancement Tasks:**
- Add confirmation dialogs for destructive actions (stop/restart/rebuild)
- Implement command history/logging
- Add keyboard shortcuts help overlay
**Future Enhanced Navigation:**
- Add Page Up/Down for faster scrolling through long service lists
- Implement search/filter functionality for services
- Add jump-to-service shortcuts (first letter navigation)
**Future Advanced Features:**
- Service dependency visualization
- Historical service status tracking
- Real-time log viewing integration
## Core Architecture Principles - CRITICAL
### Individual Metrics Philosophy ### Individual Metrics Philosophy
- Agent collects individual metrics, dashboard composes widgets
**NEW ARCHITECTURE**: Agent collects individual metrics, dashboard composes widgets from those metrics. - Each metric collected, transmitted, and stored individually
- Agent calculates status for each metric using thresholds
- Dashboard aggregates individual metric statuses for widget status
### Maintenance Mode ### Maintenance Mode
**Purpose:**
- Suppress email notifications during planned maintenance or backups
- Prevents false alerts when services are intentionally stopped
**Implementation:**
- Agent checks for `/tmp/cm-maintenance` file before sending notifications - Agent checks for `/tmp/cm-maintenance` file before sending notifications
- File presence suppresses all email notifications while continuing monitoring - File presence suppresses all email notifications while continuing monitoring
- Dashboard continues to show real status, only notifications are blocked - Dashboard continues to show real status, only notifications are blocked
**Usage:** Usage:
```bash ```bash
# Enable maintenance mode # Enable maintenance mode
touch /tmp/cm-maintenance touch /tmp/cm-maintenance
# Run maintenance tasks (backups, service restarts, etc.) # Run maintenance tasks
systemctl stop service systemctl stop service
# ... maintenance work ... # ... maintenance work ...
systemctl start service systemctl start service
@@ -211,61 +80,84 @@ systemctl start service
rm /tmp/cm-maintenance rm /tmp/cm-maintenance
``` ```
**NixOS Integration:** ## Development and Deployment Architecture
- Borgbackup script automatically creates/removes maintenance file ### Development Path
- Automatic cleanup via trap ensures maintenance mode doesn't stick - **Location:** `~/projects/cm-dashboard`
- All cinfiguration are shall be done from nixos config - **Purpose:** Development workflow only - for committing new code
- **Access:** Only for developers to commit changes
**ARCHITECTURE ENFORCEMENT**: ### Deployment Path
- **Location:** `/var/lib/cm-dashboard/nixos-config`
- **Purpose:** Production deployment only - agent clones/pulls from git
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild
- **ZERO legacy code reuse** - Fresh implementation following ARCHITECT.md exactly ### Git Flow
- **Individual metrics only** - NO grouped metric structures ```
- **Reference-only legacy** - Study old functionality, implement new architecture Development: ~/projects/cm-dashboard → git commit → git push
- **Clean slate mindset** - Build as if legacy codebase never existed Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
```
**Implementation Rules**: ## Automated Binary Release System
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually CM Dashboard uses automated binary releases instead of source builds.
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
**Testing & Building**:
- **Workspace builds**: `cargo build --workspace` for all testing ### Creating New Releases
- **Clean compilation**: Remove `target/` between architecture changes ```bash
- **ZMQ testing**: Test agent-dashboard communication independently cd ~/projects/cm-dashboard
- **Widget testing**: Verify UI layout matches legacy appearance exactly git tag v0.1.X
git push origin v0.1.X
```
**NEVER in New Implementation**: This automatically:
- Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
- Creates GitHub-style release with tarball
- Uploads binaries via Gitea API
- Copy/paste ANY code from legacy backup ### NixOS Configuration Updates
- Calculate status in dashboard widgets Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
- Hardcode metric names in widgets (use const arrays)
# Important Communication Guidelines ```nix
version = "v0.1.X";
src = pkgs.fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-NEW_HASH_HERE";
};
```
NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise. ### Get Release Hash
```bash
cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
```
NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation. ### Building
**Testing & Building:**
- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
- **Clean compilation**: Remove `target/` between major changes
## Important Communication Guidelines
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
## Commit Message Guidelines ## Commit Message Guidelines
**NEVER mention:** **NEVER mention:**
- Claude or any AI assistant names - Claude or any AI assistant names
- Automation or AI-generated content - Automation or AI-generated content
- Any reference to automated code generation - Any reference to automated code generation
**ALWAYS:** **ALWAYS:**
- Focus purely on technical changes and their purpose - Focus purely on technical changes and their purpose
- Use standard software development commit message format - Use standard software development commit message format
- Describe what was changed and why, not how it was created - Describe what was changed and why, not how it was created
- Write from the perspective of a human developer - Write from the perspective of a human developer
**Examples:** **Examples:**
- ❌ "Generated with Claude Code" - ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation" - ❌ "AI-assisted implementation"
- ❌ "Automated refactoring" - ❌ "Automated refactoring"
@@ -273,83 +165,22 @@ NEVER implement code without first getting explicit user agreement on the approa
- ✅ "Restructure storage widget with improved layout" - ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values" - ✅ "Update CPU thresholds to production values"
## Development and Deployment Architecture ## Implementation Rules
**CRITICAL:** Development and deployment paths are completely separate: 1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
### Development Path **NEVER:**
- **Location:** `~/projects/nixosbox` - Copy/paste ANY code from legacy implementations
- **Purpose:** Development workflow only - for committing new cm-dashboard code - Calculate status in dashboard widgets
- **Access:** Only for developers to commit changes - Hardcode metric names in widgets (use const arrays)
- **Code Access:** Running cm-dashboard code shall NEVER access this path - Create files unless absolutely necessary for achieving goals
- Create documentation files unless explicitly requested
### Deployment Path **ALWAYS:**
- **Location:** `/var/lib/cm-dashboard/nixos-config` - Prefer editing existing files to creating new ones
- **Purpose:** Production deployment only - agent clones/pulls from git - Follow existing code conventions and patterns
- **Access:** Only cm-dashboard agent for deployment operations - Use existing libraries and utilities
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild - Follow security best practices
### Git Flow
```
Development: ~/projects/nixosbox → git commit → git push
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
```
## Automated Binary Release System
**IMPLEMENTED:** cm-dashboard now uses automated binary releases instead of source builds.
### Release Workflow
1. **Automated Release Creation**
- Gitea Actions workflow builds static binaries on tag push
- Creates release with `cm-dashboard-linux-x86_64.tar.gz` tarball
- No manual intervention required for binary generation
2. **Creating New Releases**
```bash
cd ~/projects/cm-dashboard
git tag v0.1.X
git push origin v0.1.X
```
This automatically:
- Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
- Creates GitHub-style release with tarball
- Uploads binaries via Gitea API
3. **NixOS Configuration Updates**
Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
```nix
version = "v0.1.X";
src = pkgs.fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-NEW_HASH_HERE";
};
```
4. **Get Release Hash**
```bash
cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
```
5. **Commit and Deploy**
```bash
cd ~/projects/nixosbox
git add hosts/common/cm-dashboard.nix
git commit -m "Update cm-dashboard to v0.1.X with static binaries"
git push
```
### Benefits
- **No compilation overhead** on each host
- **Consistent static binaries** across all hosts
- **Faster deployments** - download vs compile
- **No library dependency issues** - static linking
- **Automated pipeline** - tag push triggers everything

13
Cargo.lock generated
View File

@@ -270,7 +270,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]] [[package]]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.0" version = "0.1.64"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"chrono", "chrono",
@@ -286,12 +286,13 @@ dependencies = [
"toml", "toml",
"tracing", "tracing",
"tracing-subscriber", "tracing-subscriber",
"wake-on-lan",
"zmq", "zmq",
] ]
[[package]] [[package]]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.0" version = "0.1.64"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"async-trait", "async-trait",
@@ -314,7 +315,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.0" version = "0.1.64"
dependencies = [ dependencies = [
"chrono", "chrono",
"serde", "serde",
@@ -2064,6 +2065,12 @@ version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]]
name = "wake-on-lan"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1ccf60b60ad7e5b1b37372c5134cbcab4db0706c231d212e0c643a077462bc8f"
[[package]] [[package]]
name = "walkdir" name = "walkdir"
version = "2.5.0" version = "2.5.0"

510
README.md
View File

@@ -1,88 +1,106 @@
# CM Dashboard # CM Dashboard
A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ. A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
## Current Implementation ## Features
This is a complete rewrite implementing an **individual metrics architecture** where: ### Core Monitoring
- **Real-time metrics**: CPU, RAM, Storage, and Service status
- **Multi-host support**: Monitor multiple servers from single dashboard
- **Service management**: Start/stop services with intelligent status tracking
- **NixOS integration**: System rebuild via SSH + tmux popup
- **Backup monitoring**: Borgbackup status and scheduling
- **Email notifications**: Intelligent batching prevents spam
- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status ### User-Stopped Service Tracking
- **Dashboard** subscribes to specific metrics and composes widgets Services stopped via the dashboard are intelligently tracked to prevent false alerts:
- **Status Aggregation** provides intelligent email notifications with batching
- **Persistent Cache** prevents false notifications on restart
## Dashboard Interface - **Smart status reporting**: User-stopped services show as Status::OK instead of Warning
- **Persistent storage**: Tracking survives agent restarts via JSON storage
- **Automatic management**: Flags cleared when services restarted via dashboard
- **Maintenance friendly**: No false alerts during intentional service operations
## Architecture
### Individual Metrics Philosophy
- **Agent**: Collects individual metrics, calculates status using thresholds
- **Dashboard**: Subscribes to specific metrics, composes widgets from individual data
- **ZMQ Communication**: Efficient real-time metric transmission
- **Status Aggregation**: Host-level status calculated from all service metrics
### Components
```
┌─────────────────┐ ZMQ ┌─────────────────┐
│ │◄──────────►│ │
│ Agent │ Metrics │ Dashboard │
│ - Collectors │ │ - TUI │
│ - Status │ │ - Widgets │
│ - Tracking │ │ - Commands │
│ │ │ │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ JSON Storage │ │ SSH + tmux │
│ - User-stopped │ │ - Remote rebuild│
│ - Cache │ │ - Process │
│ - State │ │ isolation │
└─────────────────┘ └─────────────────┘
```
### Service Control Flow
1. **User Action**: Dashboard sends `UserStart`/`UserStop` commands
2. **Agent Processing**:
- Marks service as user-stopped (if stopping)
- Executes `systemctl start/stop service`
- Syncs state to global tracker
3. **Status Calculation**:
- Systemd collector checks user-stopped flag
- Reports Status::OK for user-stopped inactive services
- Normal Warning status for system failures
## Interface
``` ```
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐ ┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
CPU: ││Service: Status: RAM: Disk: │ NixOS: ││Service: Status: RAM: Disk: │
● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │ Build: 25.05.20251004.3bcc93c ││● docker active 27M 496MB │
RAM: ││● docker-registry active 19M 496MB │ Agent: v0.1.43 ││● gitea active 579M 2.6GB │
● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB Active users: cm, simon ││● nginx active 28M 24MB
● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB CPU: ││ ├─ ● gitea.cmtec.se 51ms
Disk nvme0n1: ││● haasp-core active 9M 1MB ● Load: 0.10 0.52 0.88 • 3000MHz ││ ├─ ● photos.cmtec.se 41ms
● Health: PASSED ││● haasp-mqtt active 3M 1MB RAM: ││● postgresql active 112M 357MB
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │● Usage: 33% 2.6GB/7.6GB ││● redis-immich user-stopped
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │● /tmp: 0% 0B/2.0GB ││● sshd active 2M 0
││● mosquitto active 1M 1MB Storage: ││● unifi active 594M 495MB
││● mysql active 38M 225MB ● root (Single): ││
││● nginx active 28M 24MB ├─ ● nvme0n1 W: 1% ││
││ ├─ ● gitea.cmtec.se 51ms └─ ● 18% 167.4GB/928.2GB ││
│ ││ ├─ ● haasp.cmtec.se 43ms │
│ ││ ├─ ● haasp.net 43ms │
│ ││ ├─ ● pages.cmtec.se 45ms │
└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
│● kryddorten 2 archives 67.6MB ││ │
│● mariehall2 2 archives 321.8MB ││ │
│● nixosbox 2 archives 4.5MB ││ │
│● unifi 2 archives 2.9MB ││ │
│● vaultwarden 2 archives 305kB ││ │
└────────────────────────────────────┘└─────────────────────────────────────────────────┘ └────────────────────────────────────┘└─────────────────────────────────────────────────┘
``` ```
**Navigation**: `←→` switch hosts, `r` refresh, `q` quit ### Navigation
- **Tab**: Switch between hosts
- **↑↓ or j/k**: Navigate services
- **s**: Start selected service (UserStart)
- **S**: Stop selected service (UserStop)
- **J**: Show service logs (journalctl in tmux popup)
- **R**: Rebuild current host
- **q**: Quit
## Features ### Status Indicators
- **Green ●**: Active service
- **Real-time monitoring** - Dashboard updates every 1-2 seconds - **Yellow ◐**: Inactive service (system issue)
- **Individual metric collection** - Granular data for flexible dashboard composition - **Red ◯**: Failed service
- **Intelligent status aggregation** - Host-level status calculated from all services - **Blue arrows**: Service transitioning (↑ starting, ↓ stopping, ↻ restarting)
- **Smart email notifications** - Batched, detailed alerts with service groupings - **"user-stopped"**: Service stopped via dashboard (Status::OK)
- **Persistent state** - Prevents false notifications on restarts
- **ZMQ communication** - Efficient agent-to-dashboard messaging
- **Clean TUI** - Terminal-based dashboard with color-coded status indicators
## Architecture
### Core Components
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
- **Status Aggregation** - Intelligent batching and notification management
- **Persistent Cache** - Maintains state across restarts
### Status Levels
- **🟢 Ok** - Service running normally
- **🔵 Pending** - Service starting/stopping/reloading
- **🟡 Warning** - Service issues (high load, memory, disk usage)
- **🔴 Critical** - Service failed or critical thresholds exceeded
- **❓ Unknown** - Service state cannot be determined
## Quick Start ## Quick Start
### Build ### Building
```bash ```bash
# With Nix (recommended) # With Nix (recommended)
@@ -93,21 +111,20 @@ sudo apt install libssl-dev pkg-config # Ubuntu/Debian
cargo build --workspace cargo build --workspace
``` ```
### Run ### Running
```bash ```bash
# Start agent (requires configuration file) # Start agent (requires configuration)
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml ./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
# Start dashboard # Start dashboard (inside tmux session)
./target/debug/cm-dashboard --config /path/to/dashboard.toml tmux
./target/debug/cm-dashboard --config /etc/cm-dashboard/dashboard.toml
``` ```
## Configuration ## Configuration
### Agent Configuration (`agent.toml`) ### Agent Configuration
The agent requires a comprehensive TOML configuration file:
```toml ```toml
collection_interval_seconds = 2 collection_interval_seconds = 2
@@ -116,47 +133,27 @@ collection_interval_seconds = 2
publisher_port = 6130 publisher_port = 6130
command_port = 6131 command_port = 6131
bind_address = "0.0.0.0" bind_address = "0.0.0.0"
timeout_ms = 5000 transmission_interval_seconds = 2
heartbeat_interval_ms = 30000
[collectors.cpu] [collectors.cpu]
enabled = true enabled = true
interval_seconds = 2 interval_seconds = 2
load_warning_threshold = 9.0 load_warning_threshold = 5.0
load_critical_threshold = 10.0 load_critical_threshold = 10.0
temperature_warning_threshold = 100.0
temperature_critical_threshold = 110.0
[collectors.memory] [collectors.memory]
enabled = true enabled = true
interval_seconds = 2 interval_seconds = 2
usage_warning_percent = 80.0 usage_warning_percent = 80.0
usage_critical_percent = 95.0
[collectors.disk]
enabled = true
interval_seconds = 300
usage_warning_percent = 80.0
usage_critical_percent = 90.0 usage_critical_percent = 90.0
[[collectors.disk.filesystems]]
name = "root"
uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79"
mount_point = "/"
fs_type = "ext4"
monitor = true
[collectors.systemd] [collectors.systemd]
enabled = true enabled = true
interval_seconds = 10 interval_seconds = 10
memory_warning_mb = 1000.0 service_name_filters = ["nginx*", "postgresql*", "docker*", "sshd*"]
memory_critical_mb = 2000.0 excluded_services = ["nginx-config-reload", "systemd-", "getty@"]
service_name_filters = [ nginx_latency_critical_ms = 1000.0
"nginx", "postgresql", "redis", "docker", "sshd" http_timeout_seconds = 10
]
excluded_services = [
"nginx-config-reload", "sshd-keygen"
]
[notifications] [notifications]
enabled = true enabled = true
@@ -164,251 +161,202 @@ smtp_host = "localhost"
smtp_port = 25 smtp_port = 25
from_email = "{hostname}@example.com" from_email = "{hostname}@example.com"
to_email = "admin@example.com" to_email = "admin@example.com"
rate_limit_minutes = 0 aggregation_interval_seconds = 30
trigger_on_warnings = true
trigger_on_failures = true
recovery_requires_all_ok = true
suppress_individual_recoveries = true
[status_aggregation]
enabled = true
aggregation_method = "worst_case"
notification_interval_seconds = 30
[cache]
persist_path = "/var/lib/cm-dashboard/cache.json"
``` ```
### Dashboard Configuration (`dashboard.toml`) ### Dashboard Configuration
```toml ```toml
[zmq] [zmq]
hosts = [ subscriber_ports = [6130]
{ name = "server1", address = "192.168.1.100", port = 6130 },
{ name = "server2", address = "192.168.1.101", port = 6130 } [hosts]
] predefined_hosts = ["cmbox", "srv01", "srv02"]
connection_timeout_ms = 5000
reconnect_interval_ms = 10000
[ui] [ui]
refresh_interval_ms = 1000 ssh_user = "cm"
theme = "dark" rebuild_alias = "nixos-rebuild-cmtec"
``` ```
## Collectors ## Technical Implementation
The agent implements several specialized collectors: ### Collectors
### CPU Collector (`cpu.rs`) #### Systemd Collector
- **Service Discovery**: Uses `systemctl list-unit-files` + `list-units --all`
- **Status Calculation**: Checks user-stopped flag before assigning Warning status
- **Memory Tracking**: Per-service memory usage via `systemctl show`
- **Sub-services**: Nginx site latency, Docker containers
- **User-stopped Integration**: `UserStoppedServiceTracker::is_service_user_stopped()`
- Load average (1, 5, 15 minute) #### User-Stopped Service Tracker
- CPU temperature monitoring - **Storage**: `/var/lib/cm-dashboard/user-stopped-services.json`
- Real-time process monitoring (top CPU consumers) - **Thread Safety**: Global singleton with `Arc<Mutex<>>`
- Status calculation with configurable thresholds - **Persistence**: Automatic save on state changes
- **Global Access**: Static methods for collector integration
### Memory Collector (`memory.rs`) #### Other Collectors
- **CPU**: Load average, temperature, frequency monitoring
- **Memory**: RAM/swap usage, tmpfs monitoring
- **Disk**: Filesystem usage, SMART health data
- **NixOS**: Build version, active users, agent version
- **Backup**: Borgbackup repository status and metrics
- RAM usage (total, used, available) ### ZMQ Protocol
- Swap monitoring
- Real-time process monitoring (top RAM consumers)
- Memory pressure detection
### Disk Collector (`disk.rs`) ```rust
// Metric Message
#[derive(Serialize, Deserialize)]
pub struct MetricMessage {
pub hostname: String,
pub timestamp: u64,
pub metrics: Vec<Metric>,
}
- Filesystem usage per mount point // Service Commands
- SMART health monitoring pub enum AgentCommand {
- Temperature and wear tracking ServiceControl {
- Configurable filesystem monitoring service_name: String,
action: ServiceAction,
},
SystemRebuild { /* SSH config */ },
CollectNow,
}
### Systemd Collector (`systemd.rs`) pub enum ServiceAction {
Start, // System-initiated
Stop, // System-initiated
UserStart, // User via dashboard (clears user-stopped)
UserStop, // User via dashboard (marks user-stopped)
Status,
}
```
- Service status monitoring (`active`, `inactive`, `failed`) ### Maintenance Mode
- Memory usage per service
- Service filtering and exclusions
- Handles transitional states (`Status::Pending`)
### Backup Collector (`backup.rs`) Suppress notifications during planned maintenance:
- Reads TOML status files from backup systems ```bash
- Archive age verification # Enable maintenance mode
- Disk usage tracking touch /tmp/cm-maintenance
- Repository health monitoring
# Perform maintenance
systemctl stop service
# ... work ...
systemctl start service
# Disable maintenance mode
rm /tmp/cm-maintenance
```
## Email Notifications ## Email Notifications
### Intelligent Batching ### Intelligent Batching
- **Real-time dashboard**: Immediate status updates
- **Batched emails**: Aggregated every 30 seconds
- **Smart grouping**: Services organized by severity
- **Recovery suppression**: Reduces notification spam
The system implements smart notification batching to prevent email spam: ### Example Alert
- **Real-time dashboard updates** - Status changes appear immediately
- **Batched email notifications** - Aggregated every 30 seconds
- **Detailed groupings** - Services organized by severity
### Example Alert Email
``` ```
Subject: Status Alert: 2 critical, 1 warning, 15 started Subject: Status Alert: 1 critical, 2 warnings, 0 recoveries
Status Summary (30s duration) Status Summary (30s duration)
Host Status: Ok → Warning Host Status: Ok → Warning
🔴 CRITICAL ISSUES (2): 🔴 CRITICAL ISSUES (1):
postgresql: Ok → Critical postgresql: Ok → Critical (memory usage 95%)
nginx: Warning → Critical
🟡 WARNINGS (1): 🟡 WARNINGS (2):
redis: Ok → Warning (memory usage 85%) nginx: Ok → Warning (high load 8.5)
redis: user-stopped → Warning (restarted by system)
✅ RECOVERIES (0): ✅ RECOVERIES (0):
🟢 SERVICE STARTUPS (15):
docker: Unknown → Ok
sshd: Unknown → Ok
...
-- --
CM Dashboard Agent CM Dashboard Agent v0.1.43
Generated at 2025-10-21 19:42:42 CET
``` ```
## Individual Metrics Architecture
The system follows a **metrics-first architecture**:
### Agent Side
```rust
// Agent collects individual metrics
vec![
Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok),
Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning),
Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok),
]
```
### Dashboard Side
```rust
// Widgets subscribe to specific metrics
impl Widget for CpuWidget {
fn update_from_metrics(&mut self, metrics: &[&Metric]) {
for metric in metrics {
match metric.name.as_str() {
"cpu_load_1min" => self.load_1min = metric.value.as_f32(),
"cpu_load_5min" => self.load_5min = metric.value.as_f32(),
"cpu_temperature_celsius" => self.temperature = metric.value.as_f32(),
_ => {}
}
}
}
}
```
## Persistent Cache
The cache system prevents false notifications:
- **Automatic saving** - Saves when service status changes
- **Persistent storage** - Maintains state across agent restarts
- **Simple design** - No complex TTL or cleanup logic
- **Status preservation** - Prevents duplicate notifications
## Development ## Development
### Project Structure ### Project Structure
``` ```
cm-dashboard/ cm-dashboard/
├── agent/ # Metrics collection agent ├── agent/ # Metrics collection agent
│ ├── src/ │ ├── src/
│ │ ├── collectors/ # CPU, memory, disk, systemd, backup │ │ ├── collectors/ # CPU, memory, disk, systemd, backup, nixos
│ │ ├── status/ # Status aggregation and notifications │ │ ├── service_tracker.rs # User-stopped service tracking
│ │ ├── cache/ # Persistent metric caching │ │ ├── status/ # Status aggregation and notifications
│ │ ├── config/ # TOML configuration loading │ │ ├── config/ # TOML configuration loading
│ │ └── notifications/ # Email notification system │ │ └── communication/ # ZMQ message handling
├── dashboard/ # TUI dashboard application ├── dashboard/ # TUI dashboard application
│ ├── src/ │ ├── src/
│ │ ├── ui/widgets/ # CPU, memory, services, backup widgets │ │ ├── ui/widgets/ # CPU, memory, services, backup, system
│ │ ├── metrics/ # Metric storage and filtering │ │ ├── communication/ # ZMQ consumption and commands
│ │ └── communication/ # ZMQ metric consumption │ │ └── app.rs # Main application loop
├── shared/ # Shared types and utilities ├── shared/ # Shared types and utilities
│ └── src/ │ └── src/
│ ├── metrics.rs # Metric, Status, and Value types │ ├── metrics.rs # Metric, Status, StatusTracker types
│ ├── protocol.rs # ZMQ message format │ ├── protocol.rs # ZMQ message format
│ └── cache.rs # Cache configuration │ └── cache.rs # Cache configuration
└── README.md # This file └── CLAUDE.md # Development guidelines and rules
``` ```
### Building ### Testing
```bash ```bash
# Debug build # Build and test
cargo build --workspace nix-shell -p openssl pkg-config --run "cargo build --workspace"
nix-shell -p openssl pkg-config --run "cargo test --workspace"
# Release build # Code quality
cargo build --workspace --release cargo fmt --all
# Run tests
cargo test --workspace
# Check code formatting
cargo fmt --all -- --check
# Run clippy linter
cargo clippy --workspace -- -D warnings cargo clippy --workspace -- -D warnings
``` ```
### Dependencies ## Deployment
- **tokio** - Async runtime ### Automated Binary Releases
- **zmq** - Message passing between agent and dashboard ```bash
- **ratatui** - Terminal user interface # Create new release
- **serde** - Serialization for metrics and config cd ~/projects/cm-dashboard
- **anyhow/thiserror** - Error handling git tag v0.1.X
- **tracing** - Structured logging git push origin v0.1.X
- **lettre** - SMTP email notifications ```
- **clap** - Command-line argument parsing
- **toml** - Configuration file parsing
## NixOS Integration This triggers automated:
- Static binary compilation with `RUSTFLAGS="-C target-feature=+crt-static"`
- GitHub-style release creation
- Tarball upload to Gitea
This project is designed for declarative deployment via NixOS: ### NixOS Integration
Update `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
### Configuration Generation
The NixOS module automatically generates the agent configuration:
```nix ```nix
# hosts/common/cm-dashboard.nix version = "v0.1.43";
services.cm-dashboard-agent = { src = pkgs.fetchurl {
enable = true; url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
port = 6130; sha256 = "sha256-HASH";
}; };
``` ```
### Deployment Get hash via:
```bash ```bash
# Update NixOS configuration cd ~/projects/nixosbox
git add hosts/common/cm-dashboard.nix nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
git commit -m "Update cm-dashboard configuration" url = "URL_HERE";
git push sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
# Rebuild system (user-performed)
sudo nixos-rebuild switch --flake .
``` ```
## Monitoring Intervals ## Monitoring Intervals
- **CPU/Memory**: 2 seconds (real-time monitoring) - **Metrics Collection**: 2 seconds (CPU, memory, services)
- **Disk usage**: 300 seconds (5 minutes) - **Metric Transmission**: 2 seconds (ZMQ publish)
- **Systemd services**: 10 seconds - **Dashboard Updates**: 1 second (UI refresh)
- **SMART health**: 600 seconds (10 minutes) - **Email Notifications**: 30 seconds (batched)
- **Backup status**: 60 seconds (1 minute) - **Disk Monitoring**: 300 seconds (5 minutes)
- **Email notifications**: 30 seconds (batched) - **Service Discovery**: 300 seconds (5 minutes cache)
- **Dashboard updates**: 1 second (real-time display)
## License ## License
MIT License - see LICENSE file for details MIT License - see LICENSE file for details.

63
TODO.md
View File

@@ -1,63 +0,0 @@
# TODO
## Systemd filtering (agent)
- remove user systemd collection
- reduce number of systemctl call
- Cahnge so only services in include list are detected
- Filter on exact name
- Add support for "\*" in filtering
## System panel (agent/dashboard)
use following layout:
'''
NixOS:
Build: xxxxxx
Agen: xxxxxx
CPU:
● Load: 0.02 0.31 0.86
└─ Freq: 3000MHz
RAM:
● Usage: 33% 2.6GB/7.6GB
└─ ● /tmp: 0% 0B/2.0GB
Storage:
● /:
├─ ● nvme0n1 T: 40C • W: 4%
└─ ● 8% 75.0GB/906.2GB
'''
- Add support to show login/active users
- Add support to show timestamp/version for latest nixos rebuild
## Backup panel (dashboard)
use following layout:
'''
Latest backup:
● <timestamp>
└─ Duration: 1.3m
Disk:
● Samsung SSD 870 QVO 1TB
├─ S/N: S5RRNF0W800639Y
└─ Usage: 50.5GB/915.8GB
Repos:
● gitea (4) 5.1GB
● immich (4) 45.0GB
● kryddorten (4) 67.8MB
● mariehall2 (4) 322.7MB
● nixosbox (4) 5.5MB
● unifi (4) 5.7MB
● vaultwarden (4) 508kB
'''
## Keyboard navigation and scrolling (dashboard)
- Add keyboard navigation between panels "Shift-Tab"
- Add lower statusbar with dynamic updated shortcuts when switchng between panels
## Remote execution (agent/dashboard)
- Add support for send command via dashboard to agent to do nixos rebuid
- Add support for navigating services in dashboard and trigger start/stop/restart
- Add support for trigger backup

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.0" version = "0.1.65"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -8,8 +8,9 @@ use crate::communication::{AgentCommand, ServiceAction, ZmqHandler};
use crate::config::AgentConfig; use crate::config::AgentConfig;
use crate::metrics::MetricCollectionManager; use crate::metrics::MetricCollectionManager;
use crate::notifications::NotificationManager; use crate::notifications::NotificationManager;
use crate::service_tracker::UserStoppedServiceTracker;
use crate::status::HostStatusManager; use crate::status::HostStatusManager;
use cm_dashboard_shared::{Metric, MetricMessage}; use cm_dashboard_shared::{Metric, MetricMessage, MetricValue, Status};
pub struct Agent { pub struct Agent {
hostname: String, hostname: String,
@@ -18,6 +19,7 @@ pub struct Agent {
metric_manager: MetricCollectionManager, metric_manager: MetricCollectionManager,
notification_manager: NotificationManager, notification_manager: NotificationManager,
host_status_manager: HostStatusManager, host_status_manager: HostStatusManager,
service_tracker: UserStoppedServiceTracker,
} }
impl Agent { impl Agent {
@@ -50,6 +52,10 @@ impl Agent {
let host_status_manager = HostStatusManager::new(config.status_aggregation.clone()); let host_status_manager = HostStatusManager::new(config.status_aggregation.clone());
info!("Host status manager initialized"); info!("Host status manager initialized");
// Initialize user-stopped service tracker
let service_tracker = UserStoppedServiceTracker::init_global()?;
info!("User-stopped service tracker initialized");
Ok(Self { Ok(Self {
hostname, hostname,
config, config,
@@ -57,6 +63,7 @@ impl Agent {
metric_manager, metric_manager,
notification_manager, notification_manager,
host_status_manager, host_status_manager,
service_tracker,
}) })
} }
@@ -71,11 +78,12 @@ impl Agent {
info!("Initial metric collection completed - all data cached and ready"); info!("Initial metric collection completed - all data cached and ready");
} }
// Separate intervals for collection and transmission // Separate intervals for collection, transmission, heartbeat, and email notifications
let mut collection_interval = let mut collection_interval =
interval(Duration::from_secs(self.config.collection_interval_seconds)); interval(Duration::from_secs(self.config.collection_interval_seconds));
let mut transmission_interval = interval(Duration::from_secs(1)); // ZMQ broadcast every 1 second let mut transmission_interval = interval(Duration::from_secs(self.config.zmq.transmission_interval_seconds));
let mut notification_interval = interval(Duration::from_secs(self.config.status_aggregation.notification_interval_seconds)); let mut heartbeat_interval = interval(Duration::from_secs(self.config.zmq.heartbeat_interval_seconds));
let mut notification_interval = interval(Duration::from_secs(self.config.notifications.aggregation_interval_seconds));
loop { loop {
tokio::select! { tokio::select! {
@@ -86,13 +94,19 @@ impl Agent {
} }
} }
_ = transmission_interval.tick() => { _ = transmission_interval.tick() => {
// Send all metrics via ZMQ every 1 second // Send all metrics via ZMQ (dashboard updates only)
if let Err(e) = self.broadcast_all_metrics().await { if let Err(e) = self.broadcast_all_metrics().await {
error!("Failed to broadcast metrics: {}", e); error!("Failed to broadcast metrics: {}", e);
} }
} }
_ = heartbeat_interval.tick() => {
// Send standalone heartbeat for host connectivity detection
if let Err(e) = self.send_heartbeat().await {
error!("Failed to send heartbeat: {}", e);
}
}
_ = notification_interval.tick() => { _ = notification_interval.tick() => {
// Process batched notifications // Process batched email notifications (separate from dashboard updates)
if let Err(e) = self.host_status_manager.process_pending_notifications(&mut self.notification_manager).await { if let Err(e) = self.host_status_manager.process_pending_notifications(&mut self.notification_manager).await {
error!("Failed to process pending notifications: {}", e); error!("Failed to process pending notifications: {}", e);
} }
@@ -127,8 +141,8 @@ impl Agent {
info!("Force collected and cached {} metrics", metrics.len()); info!("Force collected and cached {} metrics", metrics.len());
// Process metrics through status manager // Process metrics through status manager (collect status data at startup)
self.process_metrics(&metrics).await; let _status_changed = self.process_metrics(&metrics).await;
Ok(()) Ok(())
} }
@@ -146,28 +160,46 @@ impl Agent {
debug!("Collected and cached {} metrics", metrics.len()); debug!("Collected and cached {} metrics", metrics.len());
// Process metrics through status manager // Process metrics through status manager and trigger immediate transmission if status changed
self.process_metrics(&metrics).await; let status_changed = self.process_metrics(&metrics).await;
if status_changed {
info!("Status change detected - triggering immediate metric transmission");
if let Err(e) = self.broadcast_all_metrics().await {
error!("Failed to broadcast metrics after status change: {}", e);
}
}
Ok(()) Ok(())
} }
async fn broadcast_all_metrics(&mut self) -> Result<()> { async fn broadcast_all_metrics(&mut self) -> Result<()> {
debug!("Broadcasting all metrics via ZMQ"); debug!("Broadcasting cached metrics via ZMQ");
// Get all current metrics from collectors // Get cached metrics (no fresh collection)
let mut metrics = self.metric_manager.collect_all_metrics().await?; let mut metrics = self.metric_manager.get_cached_metrics();
// Add the host status summary metric from status manager // Add the host status summary metric from status manager
let host_status_metric = self.host_status_manager.get_host_status_metric(); let host_status_metric = self.host_status_manager.get_host_status_metric();
metrics.push(host_status_metric); metrics.push(host_status_metric);
// Add agent version metric for cross-host version comparison
let version_metric = self.get_agent_version_metric();
metrics.push(version_metric);
// Add heartbeat metric for host connectivity detection
let heartbeat_metric = self.get_heartbeat_metric();
metrics.push(heartbeat_metric);
// Check for user-stopped services that are now active and clear their flags
self.clear_user_stopped_flags_for_active_services(&metrics);
if metrics.is_empty() { if metrics.is_empty() {
debug!("No metrics to broadcast"); debug!("No metrics to broadcast");
return Ok(()); return Ok(());
} }
debug!("Broadcasting {} metrics (including host status summary)", metrics.len()); debug!("Broadcasting {} cached metrics (including host status summary)", metrics.len());
// Create and send message with all current data // Create and send message with all current data
let message = MetricMessage::new(self.hostname.clone(), metrics); let message = MetricMessage::new(self.hostname.clone(), metrics);
@@ -177,10 +209,67 @@ impl Agent {
Ok(()) Ok(())
} }
async fn process_metrics(&mut self, metrics: &[Metric]) { async fn process_metrics(&mut self, metrics: &[Metric]) -> bool {
let mut status_changed = false;
for metric in metrics { for metric in metrics {
self.host_status_manager.process_metric(metric, &mut self.notification_manager).await; // Filter excluded metrics from email notification processing only
if self.config.notifications.exclude_email_metrics.contains(&metric.name) {
debug!("Excluding metric '{}' from email notification processing", metric.name);
continue;
}
if self.host_status_manager.process_metric(metric, &mut self.notification_manager).await {
status_changed = true;
}
} }
status_changed
}
/// Create agent version metric for cross-host version comparison
fn get_agent_version_metric(&self) -> Metric {
// Get version from executable path (same logic as main.rs get_version)
let version = self.get_agent_version();
Metric::new(
"agent_version".to_string(),
MetricValue::String(version),
Status::Ok,
)
}
/// Get agent version from Cargo package version
fn get_agent_version(&self) -> String {
// Use the version from Cargo.toml (e.g., "0.1.11")
format!("v{}", env!("CARGO_PKG_VERSION"))
}
/// Create heartbeat metric for host connectivity detection
fn get_heartbeat_metric(&self) -> Metric {
use std::time::{SystemTime, UNIX_EPOCH};
let timestamp = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs();
Metric::new(
"agent_heartbeat".to_string(),
MetricValue::Integer(timestamp as i64),
Status::Ok,
)
}
/// Send standalone heartbeat for connectivity detection
async fn send_heartbeat(&mut self) -> Result<()> {
let heartbeat_metric = self.get_heartbeat_metric();
let message = MetricMessage::new(
self.hostname.clone(),
vec![heartbeat_metric],
);
self.zmq_handler.publish_metrics(&message).await?;
debug!("Sent standalone heartbeat for connectivity detection");
Ok(())
} }
async fn handle_commands(&mut self) -> Result<()> { async fn handle_commands(&mut self) -> Result<()> {
@@ -232,31 +321,40 @@ impl Agent {
error!("Failed to execute service control: {}", e); error!("Failed to execute service control: {}", e);
} }
} }
AgentCommand::SystemRebuild { git_url, git_branch, working_dir, api_key_file } => {
info!("Processing SystemRebuild command: {} @ {} -> {}", git_url, git_branch, working_dir);
if let Err(e) = self.handle_system_rebuild(&git_url, &git_branch, &working_dir, api_key_file.as_deref()).await {
error!("Failed to execute system rebuild: {}", e);
}
}
} }
Ok(()) Ok(())
} }
/// Handle systemd service control commands /// Handle systemd service control commands
async fn handle_service_control(&self, service_name: &str, action: &ServiceAction) -> Result<()> { async fn handle_service_control(&mut self, service_name: &str, action: &ServiceAction) -> Result<()> {
let action_str = match action { let (action_str, is_user_action) = match action {
ServiceAction::Start => "start", ServiceAction::Start => ("start", false),
ServiceAction::Stop => "stop", ServiceAction::Stop => ("stop", false),
ServiceAction::Restart => "restart", ServiceAction::Status => ("status", false),
ServiceAction::Status => "status", ServiceAction::UserStart => ("start", true),
ServiceAction::UserStop => ("stop", true),
}; };
info!("Executing systemctl {} {}", action_str, service_name); info!("Executing systemctl {} {} (user action: {})", action_str, service_name, is_user_action);
// Handle user-stopped service tracking before systemctl execution (stop only)
match action {
ServiceAction::UserStop => {
info!("Marking service '{}' as user-stopped", service_name);
if let Err(e) = self.service_tracker.mark_user_stopped(service_name) {
error!("Failed to mark service as user-stopped: {}", e);
} else {
// Sync to global tracker
UserStoppedServiceTracker::update_global(&self.service_tracker);
}
}
_ => {}
}
let output = tokio::process::Command::new("sudo") let output = tokio::process::Command::new("sudo")
.arg("systemctl") .arg("systemctl")
.arg(action_str) .arg(action_str)
.arg(service_name) .arg(format!("{}.service", service_name))
.output() .output()
.await?; .await?;
@@ -265,6 +363,9 @@ impl Agent {
if !output.stdout.is_empty() { if !output.stdout.is_empty() {
debug!("stdout: {}", String::from_utf8_lossy(&output.stdout)); debug!("stdout: {}", String::from_utf8_lossy(&output.stdout));
} }
// Note: User-stopped flag will be cleared by systemd collector
// when service actually reaches 'active' state, not here
} else { } else {
let stderr = String::from_utf8_lossy(&output.stderr); let stderr = String::from_utf8_lossy(&output.stderr);
error!("Service {} {} failed: {}", service_name, action_str, stderr); error!("Service {} {} failed: {}", service_name, action_str, stderr);
@@ -272,145 +373,45 @@ impl Agent {
} }
// Force refresh metrics after service control to update service status // Force refresh metrics after service control to update service status
if matches!(action, ServiceAction::Start | ServiceAction::Stop | ServiceAction::Restart) { if matches!(action, ServiceAction::Start | ServiceAction::Stop | ServiceAction::UserStart | ServiceAction::UserStop) {
info!("Triggering metric refresh after service control"); info!("Triggering immediate metric refresh after service control");
// Note: We can't call self.collect_metrics_only() here due to borrowing issues if let Err(e) = self.collect_metrics_only().await {
// The next metric collection cycle will pick up the changes error!("Failed to refresh metrics after service control: {}", e);
} else {
info!("Service status refreshed immediately after {} {}", action_str, service_name);
}
} }
Ok(()) Ok(())
} }
/// Handle NixOS system rebuild commands with git clone approach /// Check metrics for user-stopped services that are now active and clear their flags
async fn handle_system_rebuild(&self, git_url: &str, git_branch: &str, working_dir: &str, api_key_file: Option<&str>) -> Result<()> { fn clear_user_stopped_flags_for_active_services(&mut self, metrics: &[Metric]) {
info!("Starting NixOS system rebuild: {} @ {} -> {}", git_url, git_branch, working_dir); for metric in metrics {
// Look for service status metrics that are active
if metric.name.starts_with("service_") && metric.name.ends_with("_status") {
if let MetricValue::String(status) = &metric.value {
if status == "active" {
// Extract service name from metric name (service_nginx_status -> nginx)
let service_name = metric.name
.strip_prefix("service_")
.and_then(|s| s.strip_suffix("_status"))
.unwrap_or("");
// Enable maintenance mode before rebuild if !service_name.is_empty() && UserStoppedServiceTracker::is_service_user_stopped(service_name) {
let maintenance_file = "/tmp/cm-maintenance"; info!("Service '{}' is now active - clearing user-stopped flag", service_name);
if let Err(e) = tokio::fs::File::create(maintenance_file).await { if let Err(e) = self.service_tracker.clear_user_stopped(service_name) {
error!("Failed to create maintenance mode file: {}", e); error!("Failed to clear user-stopped flag for '{}': {}", service_name, e);
} else { } else {
info!("Maintenance mode enabled"); // Sync to global tracker
} UserStoppedServiceTracker::update_global(&self.service_tracker);
debug!("Cleared user-stopped flag for service '{}'", service_name);
// Clone or update repository }
let git_result = self.ensure_git_repository(git_url, git_branch, working_dir, api_key_file).await;
// Execute nixos-rebuild if git operation succeeded - run detached but log output
let rebuild_result = if git_result.is_ok() {
info!("Git repository ready, executing nixos-rebuild in detached mode");
let log_file = std::fs::OpenOptions::new()
.create(true)
.append(true)
.open("/var/log/cm-dashboard/nixos-rebuild.log")
.map_err(|e| anyhow::anyhow!("Failed to open rebuild log: {}", e))?;
tokio::process::Command::new("nohup")
.arg("sudo")
.arg("/run/current-system/sw/bin/nixos-rebuild")
.arg("switch")
.arg("--option")
.arg("sandbox")
.arg("false")
.arg("--flake")
.arg(".")
.current_dir(working_dir)
.stdin(std::process::Stdio::null())
.stdout(std::process::Stdio::from(log_file.try_clone().unwrap()))
.stderr(std::process::Stdio::from(log_file))
.spawn()
} else {
return git_result.and_then(|_| unreachable!());
};
// Always try to remove maintenance mode file
if let Err(e) = tokio::fs::remove_file(maintenance_file).await {
if e.kind() != std::io::ErrorKind::NotFound {
error!("Failed to remove maintenance mode file: {}", e);
}
} else {
info!("Maintenance mode disabled");
}
// Check rebuild start result
match rebuild_result {
Ok(_child) => {
info!("NixOS rebuild started successfully in background");
// Don't wait for completion to avoid agent being killed during rebuild
}
Err(e) => {
error!("Failed to start nixos-rebuild: {}", e);
return Err(anyhow::anyhow!("Failed to start nixos-rebuild: {}", e));
}
}
info!("System rebuild completed, triggering metric refresh");
Ok(())
}
/// Ensure git repository is cloned and up to date with force clone approach
async fn ensure_git_repository(&self, git_url: &str, git_branch: &str, working_dir: &str, api_key_file: Option<&str>) -> Result<()> {
use std::path::Path;
// Read API key if provided
let auth_url = if let Some(key_file) = api_key_file {
match tokio::fs::read_to_string(key_file).await {
Ok(api_key) => {
let api_key = api_key.trim();
if !api_key.is_empty() {
// Convert https://gitea.cmtec.se/cm/nixosbox.git to https://token@gitea.cmtec.se/cm/nixosbox.git
if git_url.starts_with("https://") {
let url_without_protocol = &git_url[8..]; // Remove "https://"
format!("https://{}@{}", api_key, url_without_protocol)
} else {
info!("API key provided but URL is not HTTPS, using original URL");
git_url.to_string()
} }
} else {
info!("API key file is empty, using original URL");
git_url.to_string()
} }
} }
Err(e) => {
info!("Could not read API key file {}: {}, using original URL", key_file, e);
git_url.to_string()
}
}
} else {
git_url.to_string()
};
// Always remove existing directory and do fresh clone for consistent state
let working_path = Path::new(working_dir);
if working_path.exists() {
info!("Removing existing repository directory: {}", working_dir);
if let Err(e) = tokio::fs::remove_dir_all(working_path).await {
error!("Failed to remove existing directory: {}", e);
return Err(anyhow::anyhow!("Failed to remove existing directory: {}", e));
} }
} }
info!("Force cloning git repository from {} (branch: {})", git_url, git_branch);
// Force clone with depth 1 for efficiency (no history needed for deployment)
let output = tokio::process::Command::new("git")
.arg("clone")
.arg("--depth")
.arg("1")
.arg("--branch")
.arg(git_branch)
.arg(&auth_url)
.arg(working_dir)
.output()
.await?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
error!("Git clone failed: {}", stderr);
return Err(anyhow::anyhow!("Git clone failed: {}", stderr));
}
info!("Git repository cloned successfully with latest state");
Ok(())
} }
} }

View File

@@ -140,6 +140,7 @@ impl Collector for BackupCollector {
Status::Warning => "warning".to_string(), Status::Warning => "warning".to_string(),
Status::Critical => "critical".to_string(), Status::Critical => "critical".to_string(),
Status::Unknown => "unknown".to_string(), Status::Unknown => "unknown".to_string(),
Status::Offline => "offline".to_string(),
}), }),
status: overall_status, status: overall_status,
timestamp, timestamp,
@@ -202,6 +203,7 @@ impl Collector for BackupCollector {
Status::Warning => "warning".to_string(), Status::Warning => "warning".to_string(),
Status::Critical => "critical".to_string(), Status::Critical => "critical".to_string(),
Status::Unknown => "unknown".to_string(), Status::Unknown => "unknown".to_string(),
Status::Offline => "offline".to_string(),
}), }),
status: service_status, status: service_status,
timestamp, timestamp,

View File

@@ -556,8 +556,8 @@ impl Collector for DiskCollector {
// Drive wear level (for SSDs) // Drive wear level (for SSDs)
if let Some(wear) = drive.wear_level { if let Some(wear) = drive.wear_level {
let wear_status = if wear >= 90.0 { Status::Critical } let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= 80.0 { Status::Warning } else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok }; else { Status::Ok };
metrics.push(Metric { metrics.push(Metric {

View File

@@ -187,7 +187,7 @@ impl MemoryCollector {
} }
// Monitor tmpfs (/tmp) usage // Monitor tmpfs (/tmp) usage
if let Ok(tmpfs_metrics) = self.get_tmpfs_metrics() { if let Ok(tmpfs_metrics) = self.get_tmpfs_metrics(status_tracker) {
metrics.extend(tmpfs_metrics); metrics.extend(tmpfs_metrics);
} }
@@ -195,7 +195,7 @@ impl MemoryCollector {
} }
/// Get tmpfs (/tmp) usage metrics /// Get tmpfs (/tmp) usage metrics
fn get_tmpfs_metrics(&self) -> Result<Vec<Metric>, CollectorError> { fn get_tmpfs_metrics(&self, status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> {
use std::process::Command; use std::process::Command;
let output = Command::new("df") let output = Command::new("df")
@@ -249,12 +249,15 @@ impl MemoryCollector {
let mut metrics = Vec::new(); let mut metrics = Vec::new();
let timestamp = chrono::Utc::now().timestamp() as u64; let timestamp = chrono::Utc::now().timestamp() as u64;
// Calculate status using same thresholds as main memory
let tmp_status = self.calculate_usage_status("memory_tmp_usage_percent", usage_percent, status_tracker);
metrics.push(Metric { metrics.push(Metric {
name: "memory_tmp_usage_percent".to_string(), name: "memory_tmp_usage_percent".to_string(),
value: MetricValue::Float(usage_percent), value: MetricValue::Float(usage_percent),
unit: Some("%".to_string()), unit: Some("%".to_string()),
description: Some("tmpfs /tmp usage percentage".to_string()), description: Some("tmpfs /tmp usage percentage".to_string()),
status: Status::Ok, status: tmp_status,
timestamp, timestamp,
}); });

View File

@@ -10,7 +10,6 @@ use crate::config::NixOSConfig;
/// ///
/// Collects NixOS-specific system information including: /// Collects NixOS-specific system information including:
/// - NixOS version and build information /// - NixOS version and build information
/// - Currently active/logged in users
pub struct NixOSCollector { pub struct NixOSCollector {
} }
@@ -19,31 +18,6 @@ impl NixOSCollector {
Self {} Self {}
} }
/// Get NixOS build information
fn get_nixos_build_info(&self) -> Result<String, Box<dyn std::error::Error>> {
// Get nixos-version output directly
let output = Command::new("nixos-version").output()?;
if !output.status.success() {
return Err("nixos-version command failed".into());
}
let version_line = String::from_utf8_lossy(&output.stdout);
let version = version_line.trim();
if version.is_empty() {
return Err("Empty nixos-version output".into());
}
// Remove codename part (e.g., "(Warbler)")
let clean_version = if let Some(pos) = version.find(" (") {
version[..pos].to_string()
} else {
version.to_string()
};
Ok(clean_version)
}
/// Get agent hash from binary path /// Get agent hash from binary path
fn get_agent_hash(&self) -> Result<String, Box<dyn std::error::Error>> { fn get_agent_hash(&self) -> Result<String, Box<dyn std::error::Error>> {
@@ -63,6 +37,22 @@ impl NixOSCollector {
} }
/// Get configuration hash from deployed nix store system /// Get configuration hash from deployed nix store system
/// Get git commit hash from rebuild process
fn get_git_commit(&self) -> Result<String, Box<dyn std::error::Error>> {
let commit_file = "/var/lib/cm-dashboard/git-commit";
match std::fs::read_to_string(commit_file) {
Ok(content) => {
let commit_hash = content.trim();
if commit_hash.len() >= 7 {
Ok(commit_hash.to_string())
} else {
Err("Git commit hash too short".into())
}
}
Err(e) => Err(format!("Failed to read git commit file: {}", e).into())
}
}
fn get_config_hash(&self) -> Result<String, Box<dyn std::error::Error>> { fn get_config_hash(&self) -> Result<String, Box<dyn std::error::Error>> {
// Read the symlink target of /run/current-system to get nix store path // Read the symlink target of /run/current-system to get nix store path
let output = Command::new("readlink") let output = Command::new("readlink")
@@ -90,27 +80,6 @@ impl NixOSCollector {
Err("Could not extract hash from nix store path".into()) Err("Could not extract hash from nix store path".into())
} }
/// Get currently active users
fn get_active_users(&self) -> Result<Vec<String>, Box<dyn std::error::Error>> {
let output = Command::new("who").output()?;
if !output.status.success() {
return Err("who command failed".into());
}
let who_output = String::from_utf8_lossy(&output.stdout);
let mut users = std::collections::HashSet::new();
for line in who_output.lines() {
if let Some(username) = line.split_whitespace().next() {
if !username.is_empty() {
users.insert(username.to_string());
}
}
}
Ok(users.into_iter().collect())
}
} }
#[async_trait] #[async_trait]
@@ -121,56 +90,31 @@ impl Collector for NixOSCollector {
let mut metrics = Vec::new(); let mut metrics = Vec::new();
let timestamp = chrono::Utc::now().timestamp() as u64; let timestamp = chrono::Utc::now().timestamp() as u64;
// Collect NixOS build information // Collect git commit information (shows what's actually deployed)
match self.get_nixos_build_info() { match self.get_git_commit() {
Ok(build_info) => { Ok(git_commit) => {
metrics.push(Metric { metrics.push(Metric {
name: "system_nixos_build".to_string(), name: "system_nixos_build".to_string(),
value: MetricValue::String(build_info), value: MetricValue::String(git_commit),
unit: None, unit: None,
description: Some("NixOS build information".to_string()), description: Some("Git commit hash of deployed configuration".to_string()),
status: Status::Ok, status: Status::Ok,
timestamp, timestamp,
}); });
} }
Err(e) => { Err(e) => {
debug!("Failed to get NixOS build info: {}", e); debug!("Failed to get git commit: {}", e);
metrics.push(Metric { metrics.push(Metric {
name: "system_nixos_build".to_string(), name: "system_nixos_build".to_string(),
value: MetricValue::String("unknown".to_string()), value: MetricValue::String("unknown".to_string()),
unit: None, unit: None,
description: Some("NixOS build (failed to detect)".to_string()), description: Some("Git commit hash (failed to detect)".to_string()),
status: Status::Unknown, status: Status::Unknown,
timestamp, timestamp,
}); });
} }
} }
// Collect active users
match self.get_active_users() {
Ok(users) => {
let users_str = users.join(", ");
metrics.push(Metric {
name: "system_active_users".to_string(),
value: MetricValue::String(users_str),
unit: None,
description: Some("Currently active users".to_string()),
status: Status::Ok,
timestamp,
});
}
Err(e) => {
debug!("Failed to get active users: {}", e);
metrics.push(Metric {
name: "system_active_users".to_string(),
value: MetricValue::String("unknown".to_string()),
unit: None,
description: Some("Active users (failed to detect)".to_string()),
status: Status::Unknown,
timestamp,
});
}
}
// Collect config hash // Collect config hash
match self.get_config_hash() { match self.get_config_hash() {

View File

@@ -8,6 +8,7 @@ use tracing::debug;
use super::{Collector, CollectorError}; use super::{Collector, CollectorError};
use crate::config::SystemdConfig; use crate::config::SystemdConfig;
use crate::service_tracker::UserStoppedServiceTracker;
/// Systemd collector for monitoring systemd services /// Systemd collector for monitoring systemd services
pub struct SystemdCollector { pub struct SystemdCollector {
@@ -32,7 +33,7 @@ struct ServiceCacheState {
nginx_site_metrics: Vec<Metric>, nginx_site_metrics: Vec<Metric>,
/// Last time nginx sites were checked /// Last time nginx sites were checked
last_nginx_check_time: Option<Instant>, last_nginx_check_time: Option<Instant>,
/// How often to check nginx site latency (30 seconds) /// How often to check nginx site latency (configurable)
nginx_check_interval_seconds: u64, nginx_check_interval_seconds: u64,
} }
@@ -54,7 +55,7 @@ impl SystemdCollector {
discovery_interval_seconds: config.interval_seconds, discovery_interval_seconds: config.interval_seconds,
nginx_site_metrics: Vec::new(), nginx_site_metrics: Vec::new(),
last_nginx_check_time: None, last_nginx_check_time: None,
nginx_check_interval_seconds: 30, // 30 seconds for nginx sites nginx_check_interval_seconds: config.nginx_check_interval_seconds,
}), }),
config, config,
} }
@@ -136,8 +137,21 @@ impl SystemdCollector {
/// Auto-discover interesting services to monitor (internal version that doesn't update state) /// Auto-discover interesting services to monitor (internal version that doesn't update state)
fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> { fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> {
debug!("Starting systemd service discovery with status caching"); debug!("Starting systemd service discovery with status caching");
// Get all services (includes inactive, running, failed - everything)
let units_output = Command::new("systemctl") // First: Get all service unit files (includes services that have never been started)
let unit_files_output = Command::new("systemctl")
.arg("list-unit-files")
.arg("--type=service")
.arg("--no-pager")
.arg("--plain")
.output()?;
if !unit_files_output.status.success() {
return Err(anyhow::anyhow!("systemctl list-unit-files command failed"));
}
// Second: Get runtime status of all units
let units_status_output = Command::new("systemctl")
.arg("list-units") .arg("list-units")
.arg("--type=service") .arg("--type=service")
.arg("--all") .arg("--all")
@@ -145,22 +159,33 @@ impl SystemdCollector {
.arg("--plain") .arg("--plain")
.output()?; .output()?;
if !units_output.status.success() { if !units_status_output.status.success() {
return Err(anyhow::anyhow!("systemctl system command failed")); return Err(anyhow::anyhow!("systemctl list-units command failed"));
} }
let units_str = String::from_utf8(units_output.stdout)?; let unit_files_str = String::from_utf8(unit_files_output.stdout)?;
let units_status_str = String::from_utf8(units_status_output.stdout)?;
let mut services = Vec::new(); let mut services = Vec::new();
// Use configuration instead of hardcoded values // Use configuration instead of hardcoded values
let excluded_services = &self.config.excluded_services; let excluded_services = &self.config.excluded_services;
let service_name_filters = &self.config.service_name_filters; let service_name_filters = &self.config.service_name_filters;
// Parse all services and cache their status information // Parse all service unit files to get complete service list
let mut all_service_names = std::collections::HashSet::new(); let mut all_service_names = std::collections::HashSet::new();
let mut status_cache = std::collections::HashMap::new();
for line in units_str.lines() { for line in unit_files_str.lines() {
let fields: Vec<&str> = line.split_whitespace().collect();
if fields.len() >= 2 && fields[0].ends_with(".service") {
let service_name = fields[0].trim_end_matches(".service");
all_service_names.insert(service_name.to_string());
debug!("Found service unit file: {}", service_name);
}
}
// Parse runtime status for all units
let mut status_cache = std::collections::HashMap::new();
for line in units_status_str.lines() {
let fields: Vec<&str> = line.split_whitespace().collect(); let fields: Vec<&str> = line.split_whitespace().collect();
if fields.len() >= 4 && fields[0].ends_with(".service") { if fields.len() >= 4 && fields[0].ends_with(".service") {
let service_name = fields[0].trim_end_matches(".service"); let service_name = fields[0].trim_end_matches(".service");
@@ -177,8 +202,19 @@ impl SystemdCollector {
sub_state: sub_state.clone(), sub_state: sub_state.clone(),
}); });
all_service_names.insert(service_name.to_string()); debug!("Got runtime status for service: {} (load:{}, active:{}, sub:{})", service_name, load_state, active_state, sub_state);
debug!("Parsed service: {} (load:{}, active:{}, sub:{})", service_name, load_state, active_state, sub_state); }
}
// For services found in unit files but not in runtime status, set default inactive status
for service_name in &all_service_names {
if !status_cache.contains_key(service_name) {
status_cache.insert(service_name.to_string(), ServiceStatusInfo {
load_state: "not-loaded".to_string(),
active_state: "inactive".to_string(),
sub_state: "dead".to_string(),
});
debug!("Service {} found in unit files but not runtime - marked as inactive", service_name);
} }
} }
@@ -318,13 +354,37 @@ impl SystemdCollector {
Ok((active_status, detailed_info)) Ok((active_status, detailed_info))
} }
/// Calculate service status /// Calculate service status, taking user-stopped services into account
fn calculate_service_status(&self, active_status: &str) -> Status { fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status {
match active_status.to_lowercase().as_str() { match active_status.to_lowercase().as_str() {
"active" => Status::Ok, "active" => {
"inactive" | "dead" => Status::Warning, // If service is now active and was marked as user-stopped, clear the flag
if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
debug!("Service '{}' is now active - clearing user-stopped flag", service_name);
// Note: We can't directly clear here because this is a read-only context
// The agent will need to handle this differently
}
Status::Ok
},
"inactive" | "dead" => {
// Check if this service was stopped by user action
if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
debug!("Service '{}' is inactive but marked as user-stopped - treating as OK", service_name);
Status::Ok
} else {
Status::Warning
}
},
"failed" | "error" => Status::Critical, "failed" | "error" => Status::Critical,
"activating" | "deactivating" | "reloading" | "start" | "stop" | "restart" => Status::Pending, "activating" | "deactivating" | "reloading" | "start" | "stop" | "restart" => {
// For user-stopped services that are transitioning, keep them as OK during transition
if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
debug!("Service '{}' is transitioning but was user-stopped - treating as OK", service_name);
Status::Ok
} else {
Status::Pending
}
},
_ => Status::Unknown, _ => Status::Unknown,
} }
} }
@@ -445,7 +505,7 @@ impl Collector for SystemdCollector {
for service in &monitored_services { for service in &monitored_services {
match self.get_service_status(service) { match self.get_service_status(service) {
Ok((active_status, _detailed_info)) => { Ok((active_status, _detailed_info)) => {
let status = self.calculate_service_status(&active_status); let status = self.calculate_service_status(service, &active_status);
// Individual service status metric // Individual service status metric
metrics.push(Metric { metrics.push(Metric {
@@ -520,10 +580,8 @@ impl SystemdCollector {
for (site_name, url) in &sites { for (site_name, url) in &sites {
match self.check_site_latency(url) { match self.check_site_latency(url) {
Ok(latency_ms) => { Ok(latency_ms) => {
let status = if latency_ms < 500.0 { let status = if latency_ms < self.config.nginx_latency_critical_ms {
Status::Ok Status::Ok
} else if latency_ms < 2000.0 {
Status::Warning
} else { } else {
Status::Critical Status::Critical
}; };
@@ -615,10 +673,10 @@ impl SystemdCollector {
let start = Instant::now(); let start = Instant::now();
// Create HTTP client with timeouts (similar to legacy implementation) // Create HTTP client with timeouts from configuration
let client = reqwest::blocking::Client::builder() let client = reqwest::blocking::Client::builder()
.timeout(Duration::from_secs(10)) .timeout(Duration::from_secs(self.config.http_timeout_seconds))
.connect_timeout(Duration::from_secs(10)) .connect_timeout(Duration::from_secs(self.config.http_connect_timeout_seconds))
.redirect(reqwest::redirect::Policy::limited(10)) .redirect(reqwest::redirect::Policy::limited(10))
.build()?; .build()?;

View File

@@ -65,7 +65,6 @@ impl ZmqHandler {
Ok(()) Ok(())
} }
/// Send heartbeat (placeholder for future use)
/// Try to receive a command (non-blocking) /// Try to receive a command (non-blocking)
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> { pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
@@ -104,13 +103,6 @@ pub enum AgentCommand {
service_name: String, service_name: String,
action: ServiceAction, action: ServiceAction,
}, },
/// Rebuild NixOS system
SystemRebuild {
git_url: String,
git_branch: String,
working_dir: String,
api_key_file: Option<String>,
},
} }
/// Service control actions /// Service control actions
@@ -118,6 +110,7 @@ pub enum AgentCommand {
pub enum ServiceAction { pub enum ServiceAction {
Start, Start,
Stop, Stop,
Restart,
Status, Status,
UserStart, // User-initiated start (clears user-stopped flag)
UserStop, // User-initiated stop (marks as user-stopped)
} }

View File

@@ -25,8 +25,10 @@ pub struct ZmqConfig {
pub publisher_port: u16, pub publisher_port: u16,
pub command_port: u16, pub command_port: u16,
pub bind_address: String, pub bind_address: String,
pub timeout_ms: u64, pub transmission_interval_seconds: u64,
pub heartbeat_interval_ms: u64, /// Heartbeat transmission interval in seconds for host connectivity detection
#[serde(default = "default_heartbeat_interval_seconds")]
pub heartbeat_interval_seconds: u64,
} }
/// Collector configuration /// Collector configuration
@@ -104,6 +106,10 @@ pub struct SystemdConfig {
pub memory_critical_mb: f32, pub memory_critical_mb: f32,
pub service_directories: std::collections::HashMap<String, Vec<String>>, pub service_directories: std::collections::HashMap<String, Vec<String>>,
pub host_user_mapping: String, pub host_user_mapping: String,
pub nginx_check_interval_seconds: u64,
pub http_timeout_seconds: u64,
pub http_connect_timeout_seconds: u64,
pub nginx_latency_critical_ms: f32,
} }
@@ -139,6 +145,16 @@ pub struct NotificationConfig {
pub from_email: String, pub from_email: String,
pub to_email: String, pub to_email: String,
pub rate_limit_minutes: u64, pub rate_limit_minutes: u64,
/// Email notification batching interval in seconds (default: 60)
pub aggregation_interval_seconds: u64,
/// List of metric names to exclude from email notifications
#[serde(default)]
pub exclude_email_metrics: Vec<String>,
}
fn default_heartbeat_interval_seconds() -> u64 {
5
} }
impl AgentConfig { impl AgentConfig {

View File

@@ -19,10 +19,6 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
bail!("ZMQ bind address cannot be empty"); bail!("ZMQ bind address cannot be empty");
} }
if config.zmq.timeout_ms == 0 {
bail!("ZMQ timeout cannot be 0");
}
// Validate collection interval // Validate collection interval
if config.collection_interval_seconds == 0 { if config.collection_interval_seconds == 0 {
bail!("Collection interval cannot be 0"); bail!("Collection interval cannot be 0");
@@ -83,6 +79,13 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
} }
} }
// Validate systemd configuration
if config.collectors.systemd.enabled {
if config.collectors.systemd.nginx_latency_critical_ms <= 0.0 {
bail!("Nginx latency critical threshold must be positive");
}
}
// Validate SMTP configuration // Validate SMTP configuration
if config.notifications.enabled { if config.notifications.enabled {
if config.notifications.smtp_host.is_empty() { if config.notifications.smtp_host.is_empty() {

View File

@@ -9,14 +9,31 @@ mod communication;
mod config; mod config;
mod metrics; mod metrics;
mod notifications; mod notifications;
mod service_tracker;
mod status; mod status;
use agent::Agent; use agent::Agent;
/// Get version showing cm-dashboard-agent package hash for easy deployment verification
fn get_version() -> &'static str {
// Get the path of the current executable
let exe_path = std::env::current_exe().expect("Failed to get executable path");
let exe_str = exe_path.to_string_lossy();
// Extract Nix store hash from path like /nix/store/HASH-cm-dashboard-v0.1.8/bin/cm-dashboard-agent
let hash_part = exe_str.strip_prefix("/nix/store/").expect("Not a nix store path");
let hash = hash_part.split('-').next().expect("Invalid nix store path format");
assert!(hash.len() >= 8, "Hash too short");
// Return first 8 characters of nix store hash
let short_hash = hash[..8].to_string();
Box::leak(short_hash.into_boxed_str())
}
#[derive(Parser)] #[derive(Parser)]
#[command(name = "cm-dashboard-agent")] #[command(name = "cm-dashboard-agent")]
#[command(about = "CM Dashboard metrics agent with individual metric collection")] #[command(about = "CM Dashboard metrics agent with individual metric collection")]
#[command(version)] #[command(version = get_version())]
struct Cli { struct Cli {
/// Increase logging verbosity (-v, -vv) /// Increase logging verbosity (-v, -vv)
#[arg(short, long, action = clap::ArgAction::Count)] #[arg(short, long, action = clap::ArgAction::Count)]

View File

@@ -1,6 +1,7 @@
use anyhow::Result; use anyhow::Result;
use cm_dashboard_shared::{Metric, StatusTracker}; use cm_dashboard_shared::{Metric, StatusTracker};
use tracing::{error, info}; use std::time::{Duration, Instant};
use tracing::{debug, error, info};
use crate::collectors::{ use crate::collectors::{
backup::BackupCollector, cpu::CpuCollector, disk::DiskCollector, memory::MemoryCollector, backup::BackupCollector, cpu::CpuCollector, disk::DiskCollector, memory::MemoryCollector,
@@ -8,15 +9,24 @@ use crate::collectors::{
}; };
use crate::config::{AgentConfig, CollectorConfig}; use crate::config::{AgentConfig, CollectorConfig};
/// Manages all metric collectors /// Collector with timing information
struct TimedCollector {
collector: Box<dyn Collector>,
interval: Duration,
last_collection: Option<Instant>,
name: String,
}
/// Manages all metric collectors with individual intervals
pub struct MetricCollectionManager { pub struct MetricCollectionManager {
collectors: Vec<Box<dyn Collector>>, collectors: Vec<TimedCollector>,
status_tracker: StatusTracker, status_tracker: StatusTracker,
cached_metrics: Vec<Metric>,
} }
impl MetricCollectionManager { impl MetricCollectionManager {
pub async fn new(config: &CollectorConfig, _agent_config: &AgentConfig) -> Result<Self> { pub async fn new(config: &CollectorConfig, _agent_config: &AgentConfig) -> Result<Self> {
let mut collectors: Vec<Box<dyn Collector>> = Vec::new(); let mut collectors: Vec<TimedCollector> = Vec::new();
// Benchmark mode - only enable specific collector based on env var // Benchmark mode - only enable specific collector based on env var
let benchmark_mode = std::env::var("BENCHMARK_COLLECTOR").ok(); let benchmark_mode = std::env::var("BENCHMARK_COLLECTOR").ok();
@@ -26,7 +36,12 @@ impl MetricCollectionManager {
// CPU collector only // CPU collector only
if config.cpu.enabled { if config.cpu.enabled {
let cpu_collector = CpuCollector::new(config.cpu.clone()); let cpu_collector = CpuCollector::new(config.cpu.clone());
collectors.push(Box::new(cpu_collector)); collectors.push(TimedCollector {
collector: Box::new(cpu_collector),
interval: Duration::from_secs(config.cpu.interval_seconds),
last_collection: None,
name: "CPU".to_string(),
});
info!("BENCHMARK: CPU collector only"); info!("BENCHMARK: CPU collector only");
} }
} }
@@ -34,20 +49,35 @@ impl MetricCollectionManager {
// Memory collector only // Memory collector only
if config.memory.enabled { if config.memory.enabled {
let memory_collector = MemoryCollector::new(config.memory.clone()); let memory_collector = MemoryCollector::new(config.memory.clone());
collectors.push(Box::new(memory_collector)); collectors.push(TimedCollector {
collector: Box::new(memory_collector),
interval: Duration::from_secs(config.memory.interval_seconds),
last_collection: None,
name: "Memory".to_string(),
});
info!("BENCHMARK: Memory collector only"); info!("BENCHMARK: Memory collector only");
} }
} }
Some("disk") => { Some("disk") => {
// Disk collector only // Disk collector only
let disk_collector = DiskCollector::new(config.disk.clone()); let disk_collector = DiskCollector::new(config.disk.clone());
collectors.push(Box::new(disk_collector)); collectors.push(TimedCollector {
collector: Box::new(disk_collector),
interval: Duration::from_secs(config.disk.interval_seconds),
last_collection: None,
name: "Disk".to_string(),
});
info!("BENCHMARK: Disk collector only"); info!("BENCHMARK: Disk collector only");
} }
Some("systemd") => { Some("systemd") => {
// Systemd collector only // Systemd collector only
let systemd_collector = SystemdCollector::new(config.systemd.clone()); let systemd_collector = SystemdCollector::new(config.systemd.clone());
collectors.push(Box::new(systemd_collector)); collectors.push(TimedCollector {
collector: Box::new(systemd_collector),
interval: Duration::from_secs(config.systemd.interval_seconds),
last_collection: None,
name: "Systemd".to_string(),
});
info!("BENCHMARK: Systemd collector only"); info!("BENCHMARK: Systemd collector only");
} }
Some("backup") => { Some("backup") => {
@@ -57,7 +87,12 @@ impl MetricCollectionManager {
config.backup.backup_paths.first().cloned(), config.backup.backup_paths.first().cloned(),
config.backup.max_age_hours, config.backup.max_age_hours,
); );
collectors.push(Box::new(backup_collector)); collectors.push(TimedCollector {
collector: Box::new(backup_collector),
interval: Duration::from_secs(config.backup.interval_seconds),
last_collection: None,
name: "Backup".to_string(),
});
info!("BENCHMARK: Backup collector only"); info!("BENCHMARK: Backup collector only");
} }
} }
@@ -69,37 +104,67 @@ impl MetricCollectionManager {
// Normal mode - all collectors // Normal mode - all collectors
if config.cpu.enabled { if config.cpu.enabled {
let cpu_collector = CpuCollector::new(config.cpu.clone()); let cpu_collector = CpuCollector::new(config.cpu.clone());
collectors.push(Box::new(cpu_collector)); collectors.push(TimedCollector {
info!("CPU collector initialized"); collector: Box::new(cpu_collector),
interval: Duration::from_secs(config.cpu.interval_seconds),
last_collection: None,
name: "CPU".to_string(),
});
info!("CPU collector initialized with {}s interval", config.cpu.interval_seconds);
} }
if config.memory.enabled { if config.memory.enabled {
let memory_collector = MemoryCollector::new(config.memory.clone()); let memory_collector = MemoryCollector::new(config.memory.clone());
collectors.push(Box::new(memory_collector)); collectors.push(TimedCollector {
info!("Memory collector initialized"); collector: Box::new(memory_collector),
interval: Duration::from_secs(config.memory.interval_seconds),
last_collection: None,
name: "Memory".to_string(),
});
info!("Memory collector initialized with {}s interval", config.memory.interval_seconds);
} }
let disk_collector = DiskCollector::new(config.disk.clone()); let disk_collector = DiskCollector::new(config.disk.clone());
collectors.push(Box::new(disk_collector)); collectors.push(TimedCollector {
info!("Disk collector initialized"); collector: Box::new(disk_collector),
interval: Duration::from_secs(config.disk.interval_seconds),
last_collection: None,
name: "Disk".to_string(),
});
info!("Disk collector initialized with {}s interval", config.disk.interval_seconds);
let systemd_collector = SystemdCollector::new(config.systemd.clone()); let systemd_collector = SystemdCollector::new(config.systemd.clone());
collectors.push(Box::new(systemd_collector)); collectors.push(TimedCollector {
info!("Systemd collector initialized"); collector: Box::new(systemd_collector),
interval: Duration::from_secs(config.systemd.interval_seconds),
last_collection: None,
name: "Systemd".to_string(),
});
info!("Systemd collector initialized with {}s interval", config.systemd.interval_seconds);
if config.backup.enabled { if config.backup.enabled {
let backup_collector = BackupCollector::new( let backup_collector = BackupCollector::new(
config.backup.backup_paths.first().cloned(), config.backup.backup_paths.first().cloned(),
config.backup.max_age_hours, config.backup.max_age_hours,
); );
collectors.push(Box::new(backup_collector)); collectors.push(TimedCollector {
info!("Backup collector initialized"); collector: Box::new(backup_collector),
interval: Duration::from_secs(config.backup.interval_seconds),
last_collection: None,
name: "Backup".to_string(),
});
info!("Backup collector initialized with {}s interval", config.backup.interval_seconds);
} }
if config.nixos.enabled { if config.nixos.enabled {
let nixos_collector = NixOSCollector::new(config.nixos.clone()); let nixos_collector = NixOSCollector::new(config.nixos.clone());
collectors.push(Box::new(nixos_collector)); collectors.push(TimedCollector {
info!("NixOS collector initialized"); collector: Box::new(nixos_collector),
interval: Duration::from_secs(config.nixos.interval_seconds),
last_collection: None,
name: "NixOS".to_string(),
});
info!("NixOS collector initialized with {}s interval", config.nixos.interval_seconds);
} }
} }
@@ -113,29 +178,87 @@ impl MetricCollectionManager {
Ok(Self { Ok(Self {
collectors, collectors,
status_tracker: StatusTracker::new(), status_tracker: StatusTracker::new(),
cached_metrics: Vec::new(),
}) })
} }
/// Force collection from ALL collectors immediately (used at startup) /// Force collection from ALL collectors immediately (used at startup)
pub async fn collect_all_metrics_force(&mut self) -> Result<Vec<Metric>> { pub async fn collect_all_metrics_force(&mut self) -> Result<Vec<Metric>> {
self.collect_all_metrics().await
}
/// Collect metrics from all collectors
pub async fn collect_all_metrics(&mut self) -> Result<Vec<Metric>> {
let mut all_metrics = Vec::new(); let mut all_metrics = Vec::new();
let now = Instant::now();
for collector in &self.collectors { for timed_collector in &mut self.collectors {
match collector.collect(&mut self.status_tracker).await { match timed_collector.collector.collect(&mut self.status_tracker).await {
Ok(metrics) => { Ok(metrics) => {
let metric_count = metrics.len();
all_metrics.extend(metrics); all_metrics.extend(metrics);
timed_collector.last_collection = Some(now);
debug!("Force collected {} metrics from {}", metric_count, timed_collector.name);
} }
Err(e) => { Err(e) => {
error!("Collector failed: {}", e); error!("Collector {} failed: {}", timed_collector.name, e);
} }
} }
} }
// Cache the collected metrics
self.cached_metrics = all_metrics.clone();
Ok(all_metrics) Ok(all_metrics)
} }
/// Collect metrics from collectors whose intervals have elapsed
pub async fn collect_metrics_timed(&mut self) -> Result<Vec<Metric>> {
let mut all_metrics = Vec::new();
let now = Instant::now();
for timed_collector in &mut self.collectors {
let should_collect = match timed_collector.last_collection {
None => true, // First collection
Some(last_time) => now.duration_since(last_time) >= timed_collector.interval,
};
if should_collect {
match timed_collector.collector.collect(&mut self.status_tracker).await {
Ok(metrics) => {
let metric_count = metrics.len();
all_metrics.extend(metrics);
timed_collector.last_collection = Some(now);
debug!(
"Collected {} metrics from {} ({}s interval)",
metric_count,
timed_collector.name,
timed_collector.interval.as_secs()
);
}
Err(e) => {
error!("Collector {} failed: {}", timed_collector.name, e);
}
}
}
}
// Update cache with newly collected metrics
if !all_metrics.is_empty() {
// Merge new metrics with cached metrics (replace by name)
for new_metric in &all_metrics {
// Remove any existing metric with the same name
self.cached_metrics.retain(|cached| cached.name != new_metric.name);
// Add the new metric
self.cached_metrics.push(new_metric.clone());
}
}
Ok(all_metrics)
}
/// Collect metrics from all collectors (legacy method for compatibility)
pub async fn collect_all_metrics(&mut self) -> Result<Vec<Metric>> {
self.collect_metrics_timed().await
}
/// Get cached metrics without triggering fresh collection
pub fn get_cached_metrics(&self) -> Vec<Metric> {
self.cached_metrics.clone()
}
} }

View File

@@ -0,0 +1,172 @@
use anyhow::Result;
use serde::{Deserialize, Serialize};
use std::collections::HashSet;
use std::fs;
use std::path::Path;
use std::sync::{Arc, Mutex, OnceLock};
use tracing::{debug, info, warn};
/// Shared instance for global access
static GLOBAL_TRACKER: OnceLock<Arc<Mutex<UserStoppedServiceTracker>>> = OnceLock::new();
/// Tracks services that have been stopped by user action
/// These services should be treated as OK status instead of Warning
#[derive(Debug)]
pub struct UserStoppedServiceTracker {
/// Set of services stopped by user action
user_stopped_services: HashSet<String>,
/// Path to persistent storage file
storage_path: String,
}
/// Serializable data structure for persistence
#[derive(Debug, Serialize, Deserialize)]
struct UserStoppedData {
services: Vec<String>,
}
impl UserStoppedServiceTracker {
/// Create new tracker with default storage path
pub fn new() -> Self {
Self::with_storage_path("/var/lib/cm-dashboard/user-stopped-services.json")
}
/// Initialize global instance (called by agent)
pub fn init_global() -> Result<Self> {
let tracker = Self::new();
// Set global instance
let global_instance = Arc::new(Mutex::new(tracker));
if GLOBAL_TRACKER.set(global_instance).is_err() {
warn!("Global service tracker was already initialized");
}
// Return a new instance for the agent to use
Ok(Self::new())
}
/// Check if a service is user-stopped (global access for collectors)
pub fn is_service_user_stopped(service_name: &str) -> bool {
if let Some(global) = GLOBAL_TRACKER.get() {
if let Ok(tracker) = global.lock() {
tracker.is_user_stopped(service_name)
} else {
debug!("Failed to lock global service tracker");
false
}
} else {
debug!("Global service tracker not initialized");
false
}
}
/// Update global tracker (called by agent when tracker state changes)
pub fn update_global(updated_tracker: &UserStoppedServiceTracker) {
if let Some(global) = GLOBAL_TRACKER.get() {
if let Ok(mut tracker) = global.lock() {
tracker.user_stopped_services = updated_tracker.user_stopped_services.clone();
} else {
debug!("Failed to lock global service tracker for update");
}
} else {
debug!("Global service tracker not initialized for update");
}
}
/// Create new tracker with custom storage path
pub fn with_storage_path<P: AsRef<Path>>(storage_path: P) -> Self {
let storage_path = storage_path.as_ref().to_string_lossy().to_string();
let mut tracker = Self {
user_stopped_services: HashSet::new(),
storage_path,
};
// Load existing data from storage
if let Err(e) = tracker.load_from_storage() {
warn!("Failed to load user-stopped services from storage: {}", e);
info!("Starting with empty user-stopped services list");
}
tracker
}
/// Mark a service as user-stopped
pub fn mark_user_stopped(&mut self, service_name: &str) -> Result<()> {
info!("Marking service '{}' as user-stopped", service_name);
self.user_stopped_services.insert(service_name.to_string());
self.save_to_storage()?;
debug!("Service '{}' marked as user-stopped and saved to storage", service_name);
Ok(())
}
/// Clear user-stopped flag for a service (when user starts it)
pub fn clear_user_stopped(&mut self, service_name: &str) -> Result<()> {
if self.user_stopped_services.remove(service_name) {
info!("Cleared user-stopped flag for service '{}'", service_name);
self.save_to_storage()?;
debug!("Service '{}' user-stopped flag cleared and saved to storage", service_name);
} else {
debug!("Service '{}' was not marked as user-stopped", service_name);
}
Ok(())
}
/// Check if a service is marked as user-stopped
pub fn is_user_stopped(&self, service_name: &str) -> bool {
let is_stopped = self.user_stopped_services.contains(service_name);
debug!("Service '{}' user-stopped status: {}", service_name, is_stopped);
is_stopped
}
/// Save current state to persistent storage
fn save_to_storage(&self) -> Result<()> {
// Create parent directory if it doesn't exist
if let Some(parent_dir) = Path::new(&self.storage_path).parent() {
if !parent_dir.exists() {
fs::create_dir_all(parent_dir)?;
debug!("Created parent directory: {}", parent_dir.display());
}
}
let data = UserStoppedData {
services: self.user_stopped_services.iter().cloned().collect(),
};
let json_data = serde_json::to_string_pretty(&data)?;
fs::write(&self.storage_path, json_data)?;
debug!(
"Saved {} user-stopped services to {}",
data.services.len(),
self.storage_path
);
Ok(())
}
/// Load state from persistent storage
fn load_from_storage(&mut self) -> Result<()> {
if !Path::new(&self.storage_path).exists() {
debug!("Storage file {} does not exist, starting fresh", self.storage_path);
return Ok(());
}
let json_data = fs::read_to_string(&self.storage_path)?;
let data: UserStoppedData = serde_json::from_str(&json_data)?;
self.user_stopped_services = data.services.into_iter().collect();
info!(
"Loaded {} user-stopped services from {}",
self.user_stopped_services.len(),
self.storage_path
);
if !self.user_stopped_services.is_empty() {
debug!("User-stopped services: {:?}", self.user_stopped_services);
}
Ok(())
}
}

View File

@@ -9,7 +9,6 @@ use chrono::Utc;
pub struct HostStatusConfig { pub struct HostStatusConfig {
pub enabled: bool, pub enabled: bool,
pub aggregation_method: String, // "worst_case" pub aggregation_method: String, // "worst_case"
pub notification_interval_seconds: u64,
} }
impl Default for HostStatusConfig { impl Default for HostStatusConfig {
@@ -17,7 +16,6 @@ impl Default for HostStatusConfig {
Self { Self {
enabled: true, enabled: true,
aggregation_method: "worst_case".to_string(), aggregation_method: "worst_case".to_string(),
notification_interval_seconds: 30,
} }
} }
} }
@@ -160,25 +158,62 @@ impl HostStatusManager {
/// Process a metric - updates status (notifications handled separately via batching) /// Process a metric - updates status and queues for aggregated notifications if status changed
pub async fn process_metric(&mut self, metric: &Metric, _notification_manager: &mut crate::notifications::NotificationManager) { pub async fn process_metric(&mut self, metric: &Metric, _notification_manager: &mut crate::notifications::NotificationManager) -> bool {
// Just update status - notifications are handled by process_pending_notifications let old_service_status = self.service_statuses.get(&metric.name).copied();
self.update_service_status(metric.name.clone(), metric.status); let old_host_status = self.current_host_status;
let new_service_status = metric.status;
// Update status (this recalculates host status internally)
self.update_service_status(metric.name.clone(), new_service_status);
let new_host_status = self.current_host_status;
let mut status_changed = false;
// Check if service status actually changed (ignore first-time status setting)
if let Some(old_service_status) = old_service_status {
if old_service_status != new_service_status {
debug!("Service status change detected for {}: {:?} -> {:?}", metric.name, old_service_status, new_service_status);
// Queue change for aggregated notification (not immediate)
self.queue_status_change(&metric.name, old_service_status, new_service_status);
status_changed = true;
}
} else {
debug!("Initial status set for {}: {:?}", metric.name, new_service_status);
}
// Check if host status changed (this should trigger immediate transmission)
if old_host_status != new_host_status {
debug!("Host status change detected: {:?} -> {:?}", old_host_status, new_host_status);
status_changed = true;
}
status_changed // Return true if either service or host status changed
} }
/// Process pending notifications - call this at notification intervals /// Queue status change for aggregated notification
fn queue_status_change(&mut self, metric_name: &str, old_status: Status, new_status: Status) {
// Add to pending changes for aggregated notification
let entry = self.pending_changes.entry(metric_name.to_string()).or_insert((old_status, old_status, 0));
entry.1 = new_status; // Update final status
entry.2 += 1; // Increment change count
// Set batch start time if this is the first change
if self.batch_start_time.is_none() {
self.batch_start_time = Some(Instant::now());
}
}
/// Process pending notifications - legacy method, now rarely used
pub async fn process_pending_notifications(&mut self, notification_manager: &mut crate::notifications::NotificationManager) -> Result<(), Box<dyn std::error::Error + Send + Sync>> { pub async fn process_pending_notifications(&mut self, notification_manager: &mut crate::notifications::NotificationManager) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
if !self.config.enabled || self.pending_changes.is_empty() { if !self.config.enabled || self.pending_changes.is_empty() {
return Ok(()); return Ok(());
} }
let batch_start = self.batch_start_time.unwrap_or_else(Instant::now); // Process notifications immediately without interval batching
let batch_duration = batch_start.elapsed();
// Only process if enough time has passed
if batch_duration.as_secs() < self.config.notification_interval_seconds {
return Ok(());
}
// Create aggregated status changes // Create aggregated status changes
let aggregated = self.create_aggregated_changes(); let aggregated = self.create_aggregated_changes();
@@ -237,11 +272,13 @@ impl HostStatusManager {
/// Check if a status change is significant enough for notification /// Check if a status change is significant enough for notification
fn is_significant_change(&self, old_status: Status, new_status: Status) -> bool { fn is_significant_change(&self, old_status: Status, new_status: Status) -> bool {
match (old_status, new_status) { match (old_status, new_status) {
// Always notify on problems // Don't notify on transitions from Unknown (startup/restart scenario)
(Status::Unknown, _) => false,
// Always notify on problems (but not from Unknown)
(_, Status::Warning) | (_, Status::Critical) => true, (_, Status::Warning) | (_, Status::Critical) => true,
// Only notify on recovery if it's from a problem state to OK and all services are OK // Only notify on recovery if it's from a problem state to OK and all services are OK
(Status::Warning | Status::Critical, Status::Ok) => self.current_host_status == Status::Ok, (Status::Warning | Status::Critical, Status::Ok) => self.current_host_status == Status::Ok,
// Don't notify on startup or other transitions // Don't notify on other transitions
_ => false, _ => false,
} }
} }
@@ -339,8 +376,8 @@ impl HostStatusManager {
details.push('\n'); details.push('\n');
} }
// Show recoveries // Show recoveries only if host status is now OK (all services recovered)
if !recovery_changes.is_empty() { if !recovery_changes.is_empty() && aggregated.host_status_final == Status::Ok {
details.push_str(&format!("✅ RECOVERIES ({}):\n", recovery_changes.len())); details.push_str(&format!("✅ RECOVERIES ({}):\n", recovery_changes.len()));
for change in recovery_changes { for change in recovery_changes {
details.push_str(&format!(" {}\n", change)); details.push_str(&format!(" {}\n", change));

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.0" version = "0.1.65"
edition = "2021" edition = "2021"
[dependencies] [dependencies]
@@ -19,3 +19,4 @@ ratatui = { workspace = true }
crossterm = { workspace = true } crossterm = { workspace = true }
toml = { workspace = true } toml = { workspace = true }
gethostname = { workspace = true } gethostname = { workspace = true }
wake-on-lan = "0.2"

View File

@@ -67,8 +67,8 @@ impl Dashboard {
} }
}; };
// Connect to predefined hosts from configuration // Connect to configured hosts from configuration
let hosts = config.hosts.predefined_hosts.clone(); let hosts: Vec<String> = config.hosts.keys().cloned().collect();
// Try to connect to hosts but don't fail if none are available // Try to connect to hosts but don't fail if none are available
match zmq_consumer.connect_to_predefined_hosts(&hosts).await { match zmq_consumer.connect_to_predefined_hosts(&hosts).await {
@@ -91,7 +91,7 @@ impl Dashboard {
(None, None) (None, None)
} else { } else {
// Initialize TUI app // Initialize TUI app
let tui_app = TuiApp::new(); let tui_app = TuiApp::new(config.clone());
// Setup terminal // Setup terminal
if let Err(e) = enable_raw_mode() { if let Err(e) = enable_raw_mode() {
@@ -149,6 +149,8 @@ impl Dashboard {
let mut last_metrics_check = Instant::now(); let mut last_metrics_check = Instant::now();
let metrics_check_interval = Duration::from_millis(100); // Check for metrics every 100ms let metrics_check_interval = Duration::from_millis(100); // Check for metrics every 100ms
let mut last_heartbeat_check = Instant::now();
let heartbeat_check_interval = Duration::from_secs(1); // Check for host connectivity every 1 second
loop { loop {
// Handle terminal events (keyboard input) only if not headless // Handle terminal events (keyboard input) only if not headless
@@ -191,6 +193,17 @@ impl Dashboard {
break; break;
} }
} }
// Render UI immediately after handling input for responsive feedback
if let Some(ref mut terminal) = self.terminal {
if let Some(ref mut tui_app) = self.tui_app {
if let Err(e) = terminal.draw(|frame| {
tui_app.render(frame, &self.metric_store);
}) {
error!("Error rendering TUI after input: {}", e);
}
}
}
} }
// Check for new metrics // Check for new metrics
@@ -236,44 +249,57 @@ impl Dashboard {
self.metric_store self.metric_store
.update_metrics(&metric_message.hostname, metric_message.metrics); .update_metrics(&metric_message.hostname, metric_message.metrics);
// Update TUI with new hosts and metrics (only if not headless) // Check for agent version mismatches across hosts
if let Some(ref mut tui_app) = self.tui_app { if let Some((current_version, outdated_hosts)) = self.metric_store.get_version_mismatches() {
let mut connected_hosts = self for outdated_host in &outdated_hosts {
.metric_store warn!("Host {} has outdated agent version (current: {})", outdated_host, current_version);
.get_connected_hosts(Duration::from_secs(30));
// Add hosts that are rebuilding but may be temporarily disconnected
// Use extended timeout (5 minutes) for rebuilding hosts
let rebuilding_hosts = self
.metric_store
.get_connected_hosts(Duration::from_secs(300));
for host in rebuilding_hosts {
if !connected_hosts.contains(&host) {
// Check if this host is rebuilding in the UI
if tui_app.is_host_rebuilding(&host) {
connected_hosts.push(host);
}
}
} }
}
tui_app.update_hosts(connected_hosts); // Update TUI with new metrics (only if not headless)
if let Some(ref mut tui_app) = self.tui_app {
tui_app.update_metrics(&self.metric_store); tui_app.update_metrics(&self.metric_store);
} }
} }
// Also check for command output messages
if let Ok(Some(cmd_output)) = self.zmq_consumer.receive_command_output().await {
debug!(
"Received command output from {}: {}",
cmd_output.hostname,
cmd_output.output_line
);
// Command output (terminal popup removed - output not displayed)
}
last_metrics_check = Instant::now(); last_metrics_check = Instant::now();
} }
// Check for host connectivity changes (heartbeat timeouts) periodically
if last_heartbeat_check.elapsed() >= heartbeat_check_interval {
let timeout = Duration::from_secs(self.config.zmq.heartbeat_timeout_seconds);
// Clean up metrics for offline hosts
self.metric_store.cleanup_offline_hosts(timeout);
if let Some(ref mut tui_app) = self.tui_app {
let connected_hosts = self.metric_store.get_connected_hosts(timeout);
tui_app.update_hosts(connected_hosts);
}
last_heartbeat_check = Instant::now();
}
// Render TUI (only if not headless) // Render TUI (only if not headless)
if !self.headless { if !self.headless {
if let (Some(ref mut terminal), Some(ref mut tui_app)) = if let Some(ref mut terminal) = self.terminal {
(&mut self.terminal, &mut self.tui_app) if let Some(ref mut tui_app) = self.tui_app {
{ if let Err(e) = terminal.draw(|frame| {
if let Err(e) = terminal.draw(|frame| { tui_app.render(frame, &self.metric_store);
tui_app.render(frame, &self.metric_store); }) {
}) { error!("Error rendering TUI: {}", e);
error!("Error rendering TUI: {}", e); break;
break; }
} }
} }
} }
@@ -289,37 +315,19 @@ impl Dashboard {
/// Execute a UI command by sending it to the appropriate agent /// Execute a UI command by sending it to the appropriate agent
async fn execute_ui_command(&self, command: UiCommand) -> Result<()> { async fn execute_ui_command(&self, command: UiCommand) -> Result<()> {
match command { match command {
UiCommand::ServiceRestart { hostname, service_name } => {
info!("Sending restart command for service {} on {}", service_name, hostname);
let agent_command = AgentCommand::ServiceControl {
service_name,
action: ServiceAction::Restart,
};
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
}
UiCommand::ServiceStart { hostname, service_name } => { UiCommand::ServiceStart { hostname, service_name } => {
info!("Sending start command for service {} on {}", service_name, hostname); info!("Sending user start command for service {} on {}", service_name, hostname);
let agent_command = AgentCommand::ServiceControl { let agent_command = AgentCommand::ServiceControl {
service_name: service_name.clone(), service_name: service_name.clone(),
action: ServiceAction::Start, action: ServiceAction::UserStart,
}; };
self.zmq_command_sender.send_command(&hostname, agent_command).await?; self.zmq_command_sender.send_command(&hostname, agent_command).await?;
} }
UiCommand::ServiceStop { hostname, service_name } => { UiCommand::ServiceStop { hostname, service_name } => {
info!("Sending stop command for service {} on {}", service_name, hostname); info!("Sending user stop command for service {} on {}", service_name, hostname);
let agent_command = AgentCommand::ServiceControl { let agent_command = AgentCommand::ServiceControl {
service_name: service_name.clone(), service_name: service_name.clone(),
action: ServiceAction::Stop, action: ServiceAction::UserStop,
};
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
}
UiCommand::SystemRebuild { hostname } => {
info!("Sending system rebuild command to {}", hostname);
let agent_command = AgentCommand::SystemRebuild {
git_url: self.config.system.nixos_config_git_url.clone(),
git_branch: self.config.system.nixos_config_branch.clone(),
working_dir: self.config.system.nixos_config_working_dir.clone(),
api_key_file: self.config.system.nixos_config_api_key_file.clone(),
}; };
self.zmq_command_sender.send_command(&hostname, agent_command).await?; self.zmq_command_sender.send_command(&hostname, agent_command).await?;
} }

View File

@@ -1,5 +1,5 @@
use anyhow::Result; use anyhow::Result;
use cm_dashboard_shared::{MessageEnvelope, MessageType, MetricMessage}; use cm_dashboard_shared::{CommandOutputMessage, MessageEnvelope, MessageType, MetricMessage};
use tracing::{debug, error, info, warn}; use tracing::{debug, error, info, warn};
use zmq::{Context, Socket, SocketType}; use zmq::{Context, Socket, SocketType};
@@ -35,8 +35,9 @@ pub enum AgentCommand {
pub enum ServiceAction { pub enum ServiceAction {
Start, Start,
Stop, Stop,
Restart,
Status, Status,
UserStart, // User-initiated start (clears user-stopped flag)
UserStop, // User-initiated stop (marks as user-stopped)
} }
/// ZMQ consumer for receiving metrics from agents /// ZMQ consumer for receiving metrics from agents
@@ -103,6 +104,43 @@ impl ZmqConsumer {
Ok(()) Ok(())
} }
/// Receive command output from any connected agent (non-blocking)
pub async fn receive_command_output(&mut self) -> Result<Option<CommandOutputMessage>> {
match self.subscriber.recv_bytes(zmq::DONTWAIT) {
Ok(data) => {
// Deserialize envelope
let envelope: MessageEnvelope = serde_json::from_slice(&data)
.map_err(|e| anyhow::anyhow!("Failed to deserialize envelope: {}", e))?;
// Check message type
match envelope.message_type {
MessageType::CommandOutput => {
let cmd_output = envelope
.decode_command_output()
.map_err(|e| anyhow::anyhow!("Failed to decode command output: {}", e))?;
debug!(
"Received command output from {}: {}",
cmd_output.hostname,
cmd_output.output_line
);
Ok(Some(cmd_output))
}
_ => Ok(None), // Not a command output message
}
}
Err(zmq::Error::EAGAIN) => {
// No message available (non-blocking mode)
Ok(None)
}
Err(e) => {
error!("ZMQ receive error: {}", e);
Err(anyhow::anyhow!("ZMQ receive error: {}", e))
}
}
}
/// Receive metrics from any connected agent (non-blocking) /// Receive metrics from any connected agent (non-blocking)
pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> { pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> {
match self.subscriber.recv_bytes(zmq::DONTWAIT) { match self.subscriber.recv_bytes(zmq::DONTWAIT) {
@@ -132,6 +170,10 @@ impl ZmqConsumer {
debug!("Received heartbeat"); debug!("Received heartbeat");
Ok(None) // Don't return heartbeats as metrics Ok(None) // Don't return heartbeats as metrics
} }
MessageType::CommandOutput => {
debug!("Received command output (will be handled by receive_command_output)");
Ok(None) // Command output handled by separate method
}
_ => { _ => {
debug!("Received non-metrics message: {:?}", envelope.message_type); debug!("Received non-metrics message: {:?}", envelope.message_type);
Ok(None) Ok(None)

View File

@@ -6,20 +6,29 @@ use std::path::Path;
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DashboardConfig { pub struct DashboardConfig {
pub zmq: ZmqConfig, pub zmq: ZmqConfig,
pub hosts: HostsConfig, pub hosts: std::collections::HashMap<String, HostDetails>,
pub system: SystemConfig, pub system: SystemConfig,
pub ssh: SshConfig,
pub service_logs: std::collections::HashMap<String, Vec<ServiceLogConfig>>,
} }
/// ZMQ consumer configuration /// ZMQ consumer configuration
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ZmqConfig { pub struct ZmqConfig {
pub subscriber_ports: Vec<u16>, pub subscriber_ports: Vec<u16>,
/// Heartbeat timeout in seconds - hosts considered offline if no heartbeat received within this time
#[serde(default = "default_heartbeat_timeout_seconds")]
pub heartbeat_timeout_seconds: u64,
} }
/// Hosts configuration fn default_heartbeat_timeout_seconds() -> u64 {
10 // Default to 10 seconds - allows for multiple missed heartbeats
}
/// Individual host configuration details
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct HostsConfig { pub struct HostDetails {
pub predefined_hosts: Vec<String>, pub mac_address: Option<String>,
} }
/// System configuration /// System configuration
@@ -31,6 +40,20 @@ pub struct SystemConfig {
pub nixos_config_api_key_file: Option<String>, pub nixos_config_api_key_file: Option<String>,
} }
/// SSH configuration for rebuild operations
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SshConfig {
pub rebuild_user: String,
pub rebuild_alias: String,
}
/// Service log file configuration per host
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServiceLogConfig {
pub service_name: String,
pub log_file_path: String,
}
impl DashboardConfig { impl DashboardConfig {
pub fn load_from_file<P: AsRef<Path>>(path: P) -> Result<Self> { pub fn load_from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
let path = path.as_ref(); let path = path.as_ref();
@@ -52,8 +75,3 @@ impl Default for ZmqConfig {
} }
} }
impl Default for HostsConfig {
fn default() -> Self {
panic!("Dashboard configuration must be loaded from file - no hardcoded defaults allowed")
}
}

View File

@@ -1,5 +1,6 @@
use anyhow::Result; use anyhow::Result;
use clap::Parser; use clap::Parser;
use std::process;
use tracing::{error, info}; use tracing::{error, info};
use tracing_subscriber::EnvFilter; use tracing_subscriber::EnvFilter;
@@ -11,26 +12,33 @@ mod ui;
use app::Dashboard; use app::Dashboard;
/// Get version showing cm-dashboard package hash for easy rebuild verification
fn get_version() -> &'static str {
// Get the path of the current executable
let exe_path = std::env::current_exe().expect("Failed to get executable path");
let exe_str = exe_path.to_string_lossy();
// Extract Nix store hash from path like /nix/store/HASH-cm-dashboard-0.1.0/bin/cm-dashboard /// Check if running inside tmux session
let hash_part = exe_str.strip_prefix("/nix/store/").expect("Not a nix store path"); fn check_tmux_session() {
let hash = hash_part.split('-').next().expect("Invalid nix store path format"); // Check for TMUX environment variable which is set when inside a tmux session
assert!(hash.len() >= 8, "Hash too short"); if std::env::var("TMUX").is_err() {
eprintln!("╭─────────────────────────────────────────────────────────────╮");
// Return first 8 characters of nix store hash eprintln!("│ ⚠️ TMUX REQUIRED │");
let short_hash = hash[..8].to_string(); eprintln!("├─────────────────────────────────────────────────────────────┤");
Box::leak(short_hash.into_boxed_str()) eprintln!("│ CM Dashboard must be run inside a tmux session for proper │");
eprintln!("│ terminal handling and remote operation functionality. │");
eprintln!("│ │");
eprintln!("│ Please start a tmux session first: │");
eprintln!("│ tmux new-session -d -s dashboard cm-dashboard │");
eprintln!("│ tmux attach-session -t dashboard │");
eprintln!("│ │");
eprintln!("│ Or simply: │");
eprintln!("│ tmux │");
eprintln!("│ cm-dashboard │");
eprintln!("╰─────────────────────────────────────────────────────────────╯");
process::exit(1);
}
} }
#[derive(Parser)] #[derive(Parser)]
#[command(name = "cm-dashboard")] #[command(name = "cm-dashboard")]
#[command(about = "CM Dashboard TUI with individual metric consumption")] #[command(about = "CM Dashboard TUI with individual metric consumption")]
#[command(version = get_version())] #[command(version)]
struct Cli { struct Cli {
/// Increase logging verbosity (-v, -vv) /// Increase logging verbosity (-v, -vv)
#[arg(short, long, action = clap::ArgAction::Count)] #[arg(short, long, action = clap::ArgAction::Count)]
@@ -68,6 +76,11 @@ async fn main() -> Result<()> {
.init(); .init();
} }
// Check for tmux session requirement (only for TUI mode)
if !cli.headless {
check_tmux_session();
}
if cli.headless || cli.verbose > 0 { if cli.headless || cli.verbose > 0 {
info!("CM Dashboard starting with individual metrics architecture..."); info!("CM Dashboard starting with individual metrics architecture...");
} }

View File

@@ -11,8 +11,8 @@ pub struct MetricStore {
current_metrics: HashMap<String, HashMap<String, Metric>>, current_metrics: HashMap<String, HashMap<String, Metric>>,
/// Historical metrics for trending /// Historical metrics for trending
historical_metrics: HashMap<String, Vec<MetricDataPoint>>, historical_metrics: HashMap<String, Vec<MetricDataPoint>>,
/// Last update timestamp per host /// Last heartbeat timestamp per host
last_update: HashMap<String, Instant>, last_heartbeat: HashMap<String, Instant>,
/// Configuration /// Configuration
max_metrics_per_host: usize, max_metrics_per_host: usize,
history_retention: Duration, history_retention: Duration,
@@ -23,7 +23,7 @@ impl MetricStore {
Self { Self {
current_metrics: HashMap::new(), current_metrics: HashMap::new(),
historical_metrics: HashMap::new(), historical_metrics: HashMap::new(),
last_update: HashMap::new(), last_heartbeat: HashMap::new(),
max_metrics_per_host, max_metrics_per_host,
history_retention: Duration::from_secs(history_retention_hours * 3600), history_retention: Duration::from_secs(history_retention_hours * 3600),
} }
@@ -56,10 +56,13 @@ impl MetricStore {
// Add to history // Add to history
host_history.push(MetricDataPoint { received_at: now }); host_history.push(MetricDataPoint { received_at: now });
}
// Update last update timestamp // Track heartbeat metrics for connectivity detection
self.last_update.insert(hostname.to_string(), now); if metric_name == "agent_heartbeat" {
self.last_heartbeat.insert(hostname.to_string(), now);
debug!("Updated heartbeat for host {}", hostname);
}
}
// Get metrics count before cleanup // Get metrics count before cleanup
let metrics_count = host_metrics.len(); let metrics_count = host_metrics.len();
@@ -88,22 +91,46 @@ impl MetricStore {
} }
} }
/// Get connected hosts (hosts with recent updates) /// Get connected hosts (hosts with recent heartbeats)
pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> { pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> {
let now = Instant::now(); let now = Instant::now();
self.last_update self.last_heartbeat
.iter() .iter()
.filter_map(|(hostname, &last_update)| { .filter_map(|(hostname, &last_heartbeat)| {
if now.duration_since(last_update) <= timeout { if now.duration_since(last_heartbeat) <= timeout {
Some(hostname.clone()) Some(hostname.clone())
} else { } else {
debug!("Host {} considered offline - last heartbeat was {:?} ago",
hostname, now.duration_since(last_heartbeat));
None None
} }
}) })
.collect() .collect()
} }
/// Clean up data for offline hosts
pub fn cleanup_offline_hosts(&mut self, timeout: Duration) {
let now = Instant::now();
let mut hosts_to_cleanup = Vec::new();
// Find hosts that are offline (no recent heartbeat)
for (hostname, &last_heartbeat) in &self.last_heartbeat {
if now.duration_since(last_heartbeat) > timeout {
hosts_to_cleanup.push(hostname.clone());
}
}
// Clear metrics for offline hosts
for hostname in hosts_to_cleanup {
if let Some(metrics) = self.current_metrics.remove(&hostname) {
info!("Cleared {} metrics for offline host: {}", metrics.len(), hostname);
}
// Keep heartbeat timestamp for reconnection detection
// Don't remove from last_heartbeat to track when host was last seen
}
}
/// Cleanup old data and enforce limits /// Cleanup old data and enforce limits
fn cleanup_host_data(&mut self, hostname: &str) { fn cleanup_host_data(&mut self, hostname: &str) {
let now = Instant::now(); let now = Instant::now();
@@ -124,4 +151,52 @@ impl MetricStore {
} }
} }
} }
/// Get agent versions from all hosts for cross-host comparison
pub fn get_agent_versions(&self) -> HashMap<String, String> {
let mut versions = HashMap::new();
for (hostname, metrics) in &self.current_metrics {
if let Some(version_metric) = metrics.get("agent_version") {
if let cm_dashboard_shared::MetricValue::String(version) = &version_metric.value {
versions.insert(hostname.clone(), version.clone());
}
}
}
versions
}
/// Check for agent version mismatches across hosts
pub fn get_version_mismatches(&self) -> Option<(String, Vec<String>)> {
let versions = self.get_agent_versions();
if versions.len() < 2 {
return None; // Need at least 2 hosts to compare
}
// Find the most common version (assume it's the "current" version)
let mut version_counts = HashMap::new();
for version in versions.values() {
*version_counts.entry(version.clone()).or_insert(0) += 1;
}
let most_common_version = version_counts
.iter()
.max_by_key(|(_, count)| *count)
.map(|(version, _)| version.clone())?;
// Find hosts with different versions
let outdated_hosts: Vec<String> = versions
.iter()
.filter(|(_, version)| *version != &most_common_version)
.map(|(hostname, _)| hostname.clone())
.collect();
if outdated_hosts.is_empty() {
None
} else {
Some((most_common_version, outdated_hosts))
}
}
} }

View File

@@ -1,5 +1,5 @@
use anyhow::Result; use anyhow::Result;
use crossterm::event::{Event, KeyCode, KeyModifiers}; use crossterm::event::{Event, KeyCode};
use ratatui::{ use ratatui::{
layout::{Constraint, Direction, Layout, Rect}, layout::{Constraint, Direction, Layout, Rect},
style::Style, style::Style,
@@ -7,12 +7,14 @@ use ratatui::{
Frame, Frame,
}; };
use std::collections::HashMap; use std::collections::HashMap;
use std::time::{Duration, Instant}; use std::time::Instant;
use tracing::info; use tracing::info;
use wake_on_lan::MagicPacket;
pub mod theme; pub mod theme;
pub mod widgets; pub mod widgets;
use crate::config::DashboardConfig;
use crate::metrics::MetricStore; use crate::metrics::MetricStore;
use cm_dashboard_shared::{Metric, Status}; use cm_dashboard_shared::{Metric, Status};
use theme::{Components, Layout as ThemeLayout, Theme, Typography}; use theme::{Components, Layout as ThemeLayout, Theme, Typography};
@@ -21,42 +23,21 @@ use widgets::{BackupWidget, ServicesWidget, SystemWidget, Widget};
/// Commands that can be triggered from the UI /// Commands that can be triggered from the UI
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub enum UiCommand { pub enum UiCommand {
ServiceRestart { hostname: String, service_name: String },
ServiceStart { hostname: String, service_name: String }, ServiceStart { hostname: String, service_name: String },
ServiceStop { hostname: String, service_name: String }, ServiceStop { hostname: String, service_name: String },
SystemRebuild { hostname: String },
TriggerBackup { hostname: String }, TriggerBackup { hostname: String },
} }
/// Command execution status for visual feedback
#[derive(Debug, Clone)]
pub enum CommandStatus {
/// Command is executing
InProgress { command_type: CommandType, target: String, start_time: std::time::Instant },
/// Command completed successfully
Success { command_type: CommandType, completed_at: std::time::Instant },
}
/// Types of commands for status tracking /// Types of commands for status tracking
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub enum CommandType { pub enum CommandType {
ServiceRestart,
ServiceStart, ServiceStart,
ServiceStop, ServiceStop,
SystemRebuild,
BackupTrigger, BackupTrigger,
} }
/// Panel types for focus management /// Panel types for focus management
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum PanelType {
System,
Services,
Backup,
}
impl PanelType {
}
/// Widget states for a specific host /// Widget states for a specific host
#[derive(Clone)] #[derive(Clone)]
@@ -73,8 +54,8 @@ pub struct HostWidgets {
pub backup_scroll_offset: usize, pub backup_scroll_offset: usize,
/// Last update time for this host /// Last update time for this host
pub last_update: Option<Instant>, pub last_update: Option<Instant>,
/// Active command status for visual feedback /// Pending service transitions for immediate visual feedback
pub command_status: Option<CommandStatus>, pub pending_service_transitions: HashMap<String, (CommandType, String, Instant)>, // service_name -> (command_type, original_status, start_time)
} }
impl HostWidgets { impl HostWidgets {
@@ -87,11 +68,12 @@ impl HostWidgets {
services_scroll_offset: 0, services_scroll_offset: 0,
backup_scroll_offset: 0, backup_scroll_offset: 0,
last_update: None, last_update: None,
command_status: None, pending_service_transitions: HashMap::new(),
} }
} }
} }
/// Main TUI application /// Main TUI application
pub struct TuiApp { pub struct TuiApp {
/// Widget states per host (hostname -> HostWidgets) /// Widget states per host (hostname -> HostWidgets)
@@ -102,25 +84,39 @@ pub struct TuiApp {
available_hosts: Vec<String>, available_hosts: Vec<String>,
/// Host index for navigation /// Host index for navigation
host_index: usize, host_index: usize,
/// Currently focused panel
focused_panel: PanelType,
/// Should quit application /// Should quit application
should_quit: bool, should_quit: bool,
/// Track if user manually navigated away from localhost /// Track if user manually navigated away from localhost
user_navigated_away: bool, user_navigated_away: bool,
/// Dashboard configuration
config: DashboardConfig,
/// Cached localhost hostname to avoid repeated system calls
localhost: String,
} }
impl TuiApp { impl TuiApp {
pub fn new() -> Self { pub fn new(config: DashboardConfig) -> Self {
Self { let localhost = gethostname::gethostname().to_string_lossy().to_string();
let mut app = Self {
host_widgets: HashMap::new(), host_widgets: HashMap::new(),
current_host: None, current_host: None,
available_hosts: Vec::new(), available_hosts: config.hosts.keys().cloned().collect(),
host_index: 0, host_index: 0,
focused_panel: PanelType::System, // Start with System panel focused
should_quit: false, should_quit: false,
user_navigated_away: false, user_navigated_away: false,
config,
localhost,
};
// Sort predefined hosts
app.available_hosts.sort();
// Initialize with first host if available
if !app.available_hosts.is_empty() {
app.current_host = Some(app.available_hosts[0].clone());
} }
app
} }
/// Get or create host widgets for the given hostname /// Get or create host widgets for the given hostname
@@ -132,41 +128,41 @@ impl TuiApp {
/// Update widgets with metrics from store (only for current host) /// Update widgets with metrics from store (only for current host)
pub fn update_metrics(&mut self, metric_store: &MetricStore) { pub fn update_metrics(&mut self, metric_store: &MetricStore) {
// Check for command timeouts first
self.check_command_timeouts();
// Check for rebuild completion by agent hash change // Check for rebuild completion by agent hash change
self.check_rebuild_completion(metric_store);
if let Some(hostname) = self.current_host.clone() { if let Some(hostname) = self.current_host.clone() {
// Only update widgets if we have metrics for this host // Only update widgets if we have metrics for this host
let all_metrics = metric_store.get_metrics_for_host(&hostname); let all_metrics = metric_store.get_metrics_for_host(&hostname);
if !all_metrics.is_empty() { if !all_metrics.is_empty() {
// Get metrics first while hostname is borrowed // Single pass metric categorization for better performance
let cpu_metrics: Vec<&Metric> = all_metrics let mut cpu_metrics = Vec::new();
.iter() let mut memory_metrics = Vec::new();
.filter(|m| { let mut service_metrics = Vec::new();
m.name.starts_with("cpu_") let mut backup_metrics = Vec::new();
|| m.name.contains("c_state_") let mut nixos_metrics = Vec::new();
|| m.name.starts_with("process_top_") let mut disk_metrics = Vec::new();
})
.copied() for metric in all_metrics {
.collect(); if metric.name.starts_with("cpu_")
let memory_metrics: Vec<&Metric> = all_metrics || metric.name.contains("c_state_")
.iter() || metric.name.starts_with("process_top_") {
.filter(|m| m.name.starts_with("memory_") || m.name.starts_with("disk_tmp_")) cpu_metrics.push(metric);
.copied() } else if metric.name.starts_with("memory_") || metric.name.starts_with("disk_tmp_") {
.collect(); memory_metrics.push(metric);
let service_metrics: Vec<&Metric> = all_metrics } else if metric.name.starts_with("service_") {
.iter() service_metrics.push(metric);
.filter(|m| m.name.starts_with("service_")) } else if metric.name.starts_with("backup_") {
.copied() backup_metrics.push(metric);
.collect(); } else if metric.name == "system_nixos_build" || metric.name == "system_active_users" || metric.name == "agent_version" {
let all_backup_metrics: Vec<&Metric> = all_metrics nixos_metrics.push(metric);
.iter() } else if metric.name.starts_with("disk_") {
.filter(|m| m.name.starts_with("backup_")) disk_metrics.push(metric);
.copied() }
.collect(); }
// Clear completed transitions first
self.clear_completed_transitions(&hostname, &service_metrics);
// Now get host widgets and update them // Now get host widgets and update them
let host_widgets = self.get_or_create_host_widgets(&hostname); let host_widgets = self.get_or_create_host_widgets(&hostname);
@@ -174,21 +170,7 @@ impl TuiApp {
// Collect all system metrics (CPU, memory, NixOS, disk/storage) // Collect all system metrics (CPU, memory, NixOS, disk/storage)
let mut system_metrics = cpu_metrics; let mut system_metrics = cpu_metrics;
system_metrics.extend(memory_metrics); system_metrics.extend(memory_metrics);
// Add NixOS metrics - using exact matching for build display fix
let nixos_metrics: Vec<&Metric> = all_metrics
.iter()
.filter(|m| m.name == "system_nixos_build" || m.name == "system_active_users" || m.name == "system_agent_hash")
.copied()
.collect();
system_metrics.extend(nixos_metrics); system_metrics.extend(nixos_metrics);
// Add disk/storage metrics
let disk_metrics: Vec<&Metric> = all_metrics
.iter()
.filter(|m| m.name.starts_with("disk_"))
.copied()
.collect();
system_metrics.extend(disk_metrics); system_metrics.extend(disk_metrics);
host_widgets.system_widget.update_from_metrics(&system_metrics); host_widgets.system_widget.update_from_metrics(&system_metrics);
@@ -197,7 +179,7 @@ impl TuiApp {
.update_from_metrics(&service_metrics); .update_from_metrics(&service_metrics);
host_widgets host_widgets
.backup_widget .backup_widget
.update_from_metrics(&all_backup_metrics); .update_from_metrics(&backup_metrics);
host_widgets.last_update = Some(Instant::now()); host_widgets.last_update = Some(Instant::now());
} }
@@ -205,30 +187,36 @@ impl TuiApp {
} }
/// Update available hosts with localhost prioritization /// Update available hosts with localhost prioritization
pub fn update_hosts(&mut self, hosts: Vec<String>) { pub fn update_hosts(&mut self, discovered_hosts: Vec<String>) {
// Sort hosts alphabetically // Start with configured hosts (always visible)
let mut sorted_hosts = hosts.clone(); let mut all_hosts: Vec<String> = self.config.hosts.keys().cloned().collect();
// Keep hosts that are undergoing SystemRebuild even if they're offline // Add any discovered hosts that aren't already configured
for host in discovered_hosts {
if !all_hosts.contains(&host) {
all_hosts.push(host);
}
}
// Keep hosts that have pending transitions even if they're offline
for (hostname, host_widgets) in &self.host_widgets { for (hostname, host_widgets) in &self.host_widgets {
if let Some(CommandStatus::InProgress { command_type: CommandType::SystemRebuild, .. }) = &host_widgets.command_status { if !host_widgets.pending_service_transitions.is_empty() {
if !sorted_hosts.contains(hostname) { if !all_hosts.contains(hostname) {
sorted_hosts.push(hostname.clone()); all_hosts.push(hostname.clone());
} }
} }
} }
sorted_hosts.sort(); all_hosts.sort();
self.available_hosts = sorted_hosts; self.available_hosts = all_hosts;
// Get the current hostname (localhost) for auto-selection // Get the current hostname (localhost) for auto-selection
let localhost = gethostname::gethostname().to_string_lossy().to_string();
if !self.available_hosts.is_empty() { if !self.available_hosts.is_empty() {
if self.available_hosts.contains(&localhost) && !self.user_navigated_away { if self.available_hosts.contains(&self.localhost) && !self.user_navigated_away {
// Localhost is available and user hasn't navigated away - switch to it // Localhost is available and user hasn't navigated away - switch to it
self.current_host = Some(localhost.clone()); self.current_host = Some(self.localhost.clone());
// Find the actual index of localhost in the sorted list // Find the actual index of localhost in the sorted list
self.host_index = self.available_hosts.iter().position(|h| h == &localhost).unwrap_or(0); self.host_index = self.available_hosts.iter().position(|h| h == &self.localhost).unwrap_or(0);
} else if self.current_host.is_none() { } else if self.current_host.is_none() {
// No current host - select first available (which is localhost if available) // No current host - select first available (which is localhost if available)
self.current_host = Some(self.available_hosts[0].clone()); self.current_host = Some(self.available_hosts[0].clone());
@@ -261,73 +249,162 @@ impl TuiApp {
self.navigate_host(1); self.navigate_host(1);
} }
KeyCode::Char('r') => { KeyCode::Char('r') => {
match self.focused_panel { // System rebuild command - works on any panel for current host
PanelType::System => { if let Some(hostname) = self.current_host.clone() {
// System rebuild command // Create command that shows logo, rebuilds, and waits for user input
if let Some(hostname) = self.current_host.clone() { let logo_and_rebuild = format!(
self.start_command(&hostname, CommandType::SystemRebuild, hostname.clone()); "bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {}\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
return Ok(Some(UiCommand::SystemRebuild { hostname })); hostname,
} self.config.ssh.rebuild_user,
} hostname,
PanelType::Services => { self.config.ssh.rebuild_alias
// Service restart command );
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
self.start_command(&hostname, CommandType::ServiceRestart, service_name.clone()); std::process::Command::new("tmux")
return Ok(Some(UiCommand::ServiceRestart { hostname, service_name })); .arg("split-window")
} .arg("-v")
} .arg("-p")
_ => { .arg("30")
info!("Manual refresh requested"); .arg(&logo_and_rebuild)
} .spawn()
.ok(); // Ignore errors, tmux will handle them
} }
} }
KeyCode::Char('s') => { KeyCode::Char('s') => {
if self.focused_panel == PanelType::Services { // Service start command
// Service start command if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) { if self.start_command(&hostname, CommandType::ServiceStart, service_name.clone()) {
self.start_command(&hostname, CommandType::ServiceStart, service_name.clone());
return Ok(Some(UiCommand::ServiceStart { hostname, service_name })); return Ok(Some(UiCommand::ServiceStart { hostname, service_name }));
} }
} }
} }
KeyCode::Char('S') => { KeyCode::Char('S') => {
if self.focused_panel == PanelType::Services { // Service stop command
// Service stop command if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) { if self.start_command(&hostname, CommandType::ServiceStop, service_name.clone()) {
self.start_command(&hostname, CommandType::ServiceStop, service_name.clone());
return Ok(Some(UiCommand::ServiceStop { hostname, service_name })); return Ok(Some(UiCommand::ServiceStop { hostname, service_name }));
} }
} }
} }
KeyCode::Char('b') => { KeyCode::Char('J') => {
if self.focused_panel == PanelType::Backup { // Show service logs via journalctl in tmux split window
// Trigger backup if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
if let Some(hostname) = self.current_host.clone() { let journalctl_command = format!(
self.start_command(&hostname, CommandType::BackupTrigger, hostname.clone()); "bash -c \"ssh -tt {}@{} 'sudo journalctl -u {}.service -f --no-pager -n 50'; exit\"",
return Ok(Some(UiCommand::TriggerBackup { hostname })); self.config.ssh.rebuild_user,
hostname,
service_name
);
std::process::Command::new("tmux")
.arg("split-window")
.arg("-v")
.arg("-p")
.arg("30")
.arg(&journalctl_command)
.spawn()
.ok(); // Ignore errors, tmux will handle them
}
}
KeyCode::Char('L') => {
// Show custom service log file in tmux split window
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
// Check if this service has a custom log file configured
if let Some(host_logs) = self.config.service_logs.get(&hostname) {
if let Some(log_config) = host_logs.iter().find(|config| config.service_name == service_name) {
let tail_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo tail -n 50 -f {}'; exit\"",
self.config.ssh.rebuild_user,
hostname,
log_config.log_file_path
);
std::process::Command::new("tmux")
.arg("split-window")
.arg("-v")
.arg("-p")
.arg("30")
.arg(&tail_command)
.spawn()
.ok(); // Ignore errors, tmux will handle them
}
} }
} }
} }
KeyCode::Tab => { KeyCode::Char('b') => {
if key.modifiers.contains(KeyModifiers::SHIFT) { // Trigger backup
// Shift+Tab cycles through panels if let Some(hostname) = self.current_host.clone() {
self.next_panel(); self.start_command(&hostname, CommandType::BackupTrigger, hostname.clone());
} else { return Ok(Some(UiCommand::TriggerBackup { hostname }));
// Tab cycles to next host
self.navigate_host(1);
} }
} }
KeyCode::BackTab => { KeyCode::Char('w') => {
// BackTab (Shift+Tab on some terminals) also cycles panels // Wake on LAN for offline hosts
self.next_panel(); if let Some(hostname) = self.current_host.clone() {
// Check if host has MAC address configured
if let Some(host_details) = self.config.hosts.get(&hostname) {
if let Some(mac_address) = &host_details.mac_address {
// Parse MAC address and send WoL packet
let mac_bytes = Self::parse_mac_address(mac_address);
match mac_bytes {
Ok(mac) => {
match MagicPacket::new(&mac).send() {
Ok(_) => {
info!("WakeOnLAN packet sent successfully to {} ({})", hostname, mac_address);
}
Err(e) => {
tracing::error!("Failed to send WakeOnLAN packet to {}: {}", hostname, e);
}
}
}
Err(_) => {
tracing::error!("Invalid MAC address format for {}: {}", hostname, mac_address);
}
}
}
}
}
} }
KeyCode::Up => { KeyCode::Char('t') => {
// Scroll up in focused panel // Open SSH terminal session in tmux window
self.scroll_focused_panel(-1); if let Some(hostname) = self.current_host.clone() {
let ssh_command = format!(
"ssh -tt {}@{}",
self.config.ssh.rebuild_user,
hostname
);
std::process::Command::new("tmux")
.arg("split-window")
.arg("-v")
.arg("-p")
.arg("30") // Use 30% like other commands
.arg(&ssh_command)
.spawn()
.ok(); // Ignore errors, tmux will handle them
}
} }
KeyCode::Down => { KeyCode::Tab => {
// Scroll down in focused panel // Tab cycles to next host
self.scroll_focused_panel(1); self.navigate_host(1);
}
KeyCode::Up | KeyCode::Char('k') => {
// Move service selection up
if let Some(hostname) = self.current_host.clone() {
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.services_widget.select_previous();
}
}
KeyCode::Down | KeyCode::Char('j') => {
// Move service selection down
if let Some(hostname) = self.current_host.clone() {
let total_services = {
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.services_widget.get_total_services_count()
};
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.services_widget.select_next(total_services);
}
} }
_ => {} _ => {}
} }
@@ -355,9 +432,8 @@ impl TuiApp {
self.current_host = Some(self.available_hosts[self.host_index].clone()); self.current_host = Some(self.available_hosts[self.host_index].clone());
// Check if user navigated away from localhost // Check if user navigated away from localhost
let localhost = gethostname::gethostname().to_string_lossy().to_string();
if let Some(ref current) = self.current_host { if let Some(ref current) = self.current_host {
if current != &localhost { if current != &self.localhost {
self.user_navigated_away = true; self.user_navigated_away = true;
} else { } else {
self.user_navigated_away = false; // User navigated back to localhost self.user_navigated_away = false; // User navigated back to localhost
@@ -367,37 +443,7 @@ impl TuiApp {
info!("Switched to host: {}", self.current_host.as_ref().unwrap()); info!("Switched to host: {}", self.current_host.as_ref().unwrap());
} }
/// Check if a host is currently rebuilding
pub fn is_host_rebuilding(&self, hostname: &str) -> bool {
if let Some(host_widgets) = self.host_widgets.get(hostname) {
matches!(
&host_widgets.command_status,
Some(CommandStatus::InProgress { command_type: CommandType::SystemRebuild, .. })
)
} else {
false
}
}
/// Switch to next panel (Shift+Tab) - only cycles through visible panels
pub fn next_panel(&mut self) {
let visible_panels = self.get_visible_panels();
if visible_panels.len() <= 1 {
return; // Can't switch if only one or no panels visible
}
// Find current panel index in visible panels
if let Some(current_index) = visible_panels.iter().position(|&p| p == self.focused_panel) {
// Move to next visible panel
let next_index = (current_index + 1) % visible_panels.len();
self.focused_panel = visible_panels[next_index];
} else {
// Current panel not visible, switch to first visible panel
self.focused_panel = visible_panels[0];
}
info!("Switched to panel: {:?}", self.focused_panel);
}
@@ -417,144 +463,89 @@ impl TuiApp {
self.should_quit self.should_quit
} }
/// Start command execution and track status for visual feedback /// Get current service status for state-aware command validation
pub fn start_command(&mut self, hostname: &str, command_type: CommandType, target: String) { fn get_current_service_status(&self, hostname: &str, service_name: &str) -> Option<String> {
if let Some(host_widgets) = self.host_widgets.get(hostname) {
return host_widgets.services_widget.get_service_status(service_name);
}
None
}
/// Start command execution with immediate visual feedback
pub fn start_command(&mut self, hostname: &str, command_type: CommandType, target: String) -> bool {
// Get current service status to validate command
let current_status = self.get_current_service_status(hostname, &target);
// Validate if command makes sense for current state
let should_execute = match (&command_type, current_status.as_deref()) {
(CommandType::ServiceStart, Some("inactive") | Some("failed") | Some("dead")) => true,
(CommandType::ServiceStop, Some("active")) => true,
(CommandType::ServiceStart, Some("active")) => {
// Already running - don't execute
false
},
(CommandType::ServiceStop, Some("inactive") | Some("failed") | Some("dead")) => {
// Already stopped - don't execute
false
},
(_, None) => {
// Unknown service state - allow command to proceed
true
},
_ => true, // Default: allow other combinations
};
// ALWAYS store the pending transition for immediate visual feedback, even if we don't execute
if let Some(host_widgets) = self.host_widgets.get_mut(hostname) { if let Some(host_widgets) = self.host_widgets.get_mut(hostname) {
host_widgets.command_status = Some(CommandStatus::InProgress { host_widgets.pending_service_transitions.insert(
command_type, target.clone(),
target, (command_type, current_status.unwrap_or_else(|| "unknown".to_string()), Instant::now())
start_time: Instant::now(), );
});
} }
should_execute
} }
/// Mark command as completed successfully /// Clear pending transitions when real status updates arrive or timeout
pub fn complete_command(&mut self, hostname: &str) { fn clear_completed_transitions(&mut self, hostname: &str, service_metrics: &[&Metric]) {
if let Some(host_widgets) = self.host_widgets.get_mut(hostname) { if let Some(host_widgets) = self.host_widgets.get_mut(hostname) {
if let Some(CommandStatus::InProgress { command_type, .. }) = &host_widgets.command_status { let mut completed_services = Vec::new();
host_widgets.command_status = Some(CommandStatus::Success {
command_type: command_type.clone(),
completed_at: Instant::now(),
});
}
}
}
// Check each pending transition to see if real status has changed
for (service_name, (command_type, original_status, _start_time)) in &host_widgets.pending_service_transitions {
/// Check for command timeouts and automatically clear them // Look for status metric for this service
pub fn check_command_timeouts(&mut self) { for metric in service_metrics {
let now = Instant::now(); if metric.name == format!("service_{}_status", service_name) {
let mut hosts_to_clear = Vec::new(); let new_status = metric.value.as_string();
for (hostname, host_widgets) in &self.host_widgets { // Check if status has changed from original (command completed)
if let Some(CommandStatus::InProgress { command_type, start_time, .. }) = &host_widgets.command_status { if &new_status != original_status {
let timeout_duration = match command_type { // Verify it changed in the expected direction
CommandType::SystemRebuild => Duration::from_secs(300), // 5 minutes for rebuilds let expected_change = match command_type {
_ => Duration::from_secs(30), // 30 seconds for service commands CommandType::ServiceStart => &new_status == "active",
}; CommandType::ServiceStop => &new_status != "active",
_ => false,
};
if now.duration_since(*start_time) > timeout_duration { if expected_change {
hosts_to_clear.push(hostname.clone()); completed_services.push(service_name.clone());
}
}
// Also clear success/failed status after display time
else if let Some(CommandStatus::Success { completed_at, .. }) = &host_widgets.command_status {
if now.duration_since(*completed_at) > Duration::from_secs(3) {
hosts_to_clear.push(hostname.clone());
}
}
}
// Clear timed out commands
for hostname in hosts_to_clear {
if let Some(host_widgets) = self.host_widgets.get_mut(&hostname) {
host_widgets.command_status = None;
}
}
}
/// Check for rebuild completion by detecting agent hash changes
pub fn check_rebuild_completion(&mut self, metric_store: &MetricStore) {
let mut hosts_to_complete = Vec::new();
for (hostname, host_widgets) in &self.host_widgets {
if let Some(CommandStatus::InProgress { command_type: CommandType::SystemRebuild, .. }) = &host_widgets.command_status {
// Check if agent hash has changed (indicating successful rebuild)
if let Some(agent_hash_metric) = metric_store.get_metric(hostname, "system_agent_hash") {
if let cm_dashboard_shared::MetricValue::String(current_hash) = &agent_hash_metric.value {
// Compare with stored hash (if we have one)
if let Some(stored_hash) = host_widgets.system_widget.get_agent_hash() {
if current_hash != stored_hash {
// Agent hash changed - rebuild completed successfully
hosts_to_complete.push(hostname.clone());
} }
} }
break;
} }
} }
} }
}
// Mark rebuilds as completed // Remove completed transitions
for hostname in hosts_to_complete { for service_name in completed_services {
self.complete_command(&hostname); host_widgets.pending_service_transitions.remove(&service_name);
}
}
/// Scroll the focused panel up or down
pub fn scroll_focused_panel(&mut self, direction: i32) {
if let Some(hostname) = self.current_host.clone() {
let focused_panel = self.focused_panel; // Get the value before borrowing
let host_widgets = self.get_or_create_host_widgets(&hostname);
match focused_panel {
PanelType::System => {
if direction > 0 {
host_widgets.system_scroll_offset = host_widgets.system_scroll_offset.saturating_add(1);
} else {
host_widgets.system_scroll_offset = host_widgets.system_scroll_offset.saturating_sub(1);
}
info!("System panel scroll offset: {}", host_widgets.system_scroll_offset);
}
PanelType::Services => {
// For services panel, Up/Down moves selection cursor, not scroll
let total_services = host_widgets.services_widget.get_total_services_count();
if direction > 0 {
host_widgets.services_widget.select_next(total_services);
info!("Services selection moved down");
} else {
host_widgets.services_widget.select_previous();
info!("Services selection moved up");
}
}
PanelType::Backup => {
if direction > 0 {
host_widgets.backup_scroll_offset = host_widgets.backup_scroll_offset.saturating_add(1);
} else {
host_widgets.backup_scroll_offset = host_widgets.backup_scroll_offset.saturating_sub(1);
}
info!("Backup panel scroll offset: {}", host_widgets.backup_scroll_offset);
}
} }
} }
} }
/// Get list of currently visible panels
fn get_visible_panels(&self) -> Vec<PanelType> {
let mut visible_panels = vec![PanelType::System, PanelType::Services];
// Check if backup panel should be shown
if let Some(hostname) = &self.current_host {
if let Some(host_widgets) = self.host_widgets.get(hostname) {
if host_widgets.backup_widget.has_data() {
visible_panels.push(PanelType::Backup);
}
}
}
visible_panels
}
/// Render the dashboard (real btop-style multi-panel layout) /// Render the dashboard (real btop-style multi-panel layout)
pub fn render(&mut self, frame: &mut Frame, metric_store: &MetricStore) { pub fn render(&mut self, frame: &mut Frame, metric_store: &MetricStore) {
@@ -586,6 +577,21 @@ impl TuiApp {
]) ])
.split(main_chunks[1]); // main_chunks[1] is now the content area (between title and statusbar) .split(main_chunks[1]); // main_chunks[1] is now the content area (between title and statusbar)
// Check if current host is offline
let current_host_offline = if let Some(hostname) = self.current_host.clone() {
self.calculate_host_status(&hostname, metric_store) == Status::Offline
} else {
true // No host selected is considered offline
};
// If host is offline, render wake-up message instead of panels
if current_host_offline {
self.render_offline_host_message(frame, main_chunks[1]);
self.render_btop_title(frame, main_chunks[0], metric_store);
self.render_statusbar(frame, main_chunks[2]);
return;
}
// Check if backup panel should be shown // Check if backup panel should be shown
let show_backup = if let Some(hostname) = self.current_host.clone() { let show_backup = if let Some(hostname) = self.current_host.clone() {
let host_widgets = self.get_or_create_host_widgets(&hostname); let host_widgets = self.get_or_create_host_widgets(&hostname);
@@ -623,19 +629,20 @@ impl TuiApp {
// Render services widget for current host // Render services widget for current host
if let Some(hostname) = self.current_host.clone() { if let Some(hostname) = self.current_host.clone() {
let is_focused = self.focused_panel == PanelType::Services; let is_focused = true; // Always show service selection
let (scroll_offset, command_status) = { let (scroll_offset, pending_transitions) = {
let host_widgets = self.get_or_create_host_widgets(&hostname); let host_widgets = self.get_or_create_host_widgets(&hostname);
(host_widgets.services_scroll_offset, host_widgets.command_status.clone()) (host_widgets.services_scroll_offset, host_widgets.pending_service_transitions.clone())
}; };
let host_widgets = self.get_or_create_host_widgets(&hostname); let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets host_widgets
.services_widget .services_widget
.render_with_command_status(frame, content_chunks[1], is_focused, scroll_offset, command_status.as_ref()); // Services takes full right side .render_with_transitions(frame, content_chunks[1], is_focused, scroll_offset, &pending_transitions); // Services takes full right side
} }
// Render statusbar at the bottom // Render statusbar at the bottom
self.render_statusbar(frame, main_chunks[2]); // main_chunks[2] is the statusbar area self.render_statusbar(frame, main_chunks[2]); // main_chunks[2] is the statusbar area
} }
/// Render btop-style minimal title with host status colors /// Render btop-style minimal title with host status colors
@@ -646,67 +653,90 @@ impl TuiApp {
if self.available_hosts.is_empty() { if self.available_hosts.is_empty() {
let title_text = "cm-dashboard • no hosts discovered"; let title_text = "cm-dashboard • no hosts discovered";
let title = Paragraph::new(title_text).style(Typography::title()); let title = Paragraph::new(title_text)
.style(Style::default().fg(Theme::background()).bg(Theme::status_color(Status::Unknown)));
frame.render_widget(title, area); frame.render_widget(title, area);
return; return;
} }
// Create spans for each host with status indicators // Calculate worst-case status across all hosts (excluding offline)
let mut spans = vec![Span::styled("cm-dashboard • ", Typography::title())]; let mut worst_status = Status::Ok;
for host in &self.available_hosts {
let host_status = self.calculate_host_status(host, metric_store);
// Don't include offline hosts in status aggregation
if host_status != Status::Offline {
worst_status = Status::aggregate(&[worst_status, host_status]);
}
}
// Use the worst status color as background
let background_color = Theme::status_color(worst_status);
// Split the title bar into left and right sections
let chunks = Layout::default()
.direction(Direction::Horizontal)
.constraints([Constraint::Length(15), Constraint::Min(0)])
.split(area);
// Left side: "cm-dashboard" text
let left_span = Span::styled(
" cm-dashboard",
Style::default().fg(Theme::background()).bg(background_color).add_modifier(Modifier::BOLD)
);
let left_title = Paragraph::new(Line::from(vec![left_span]))
.style(Style::default().bg(background_color));
frame.render_widget(left_title, chunks[0]);
// Right side: hosts with status indicators
let mut host_spans = Vec::new();
for (i, host) in self.available_hosts.iter().enumerate() { for (i, host) in self.available_hosts.iter().enumerate() {
if i > 0 { if i > 0 {
spans.push(Span::styled(" ", Typography::title())); host_spans.push(Span::styled(
" ",
Style::default().fg(Theme::background()).bg(background_color)
));
} }
// Check if this host has a command status that affects the icon // Always show normal status icon based on metrics (no command status at host level)
let (status_icon, status_color) = if let Some(host_widgets) = self.host_widgets.get(host) { let host_status = self.calculate_host_status(host, metric_store);
match &host_widgets.command_status { let status_icon = StatusIcons::get_icon(host_status);
Some(CommandStatus::InProgress { command_type: CommandType::SystemRebuild, .. }) => {
// Show blue circular arrow during rebuild
("", Theme::highlight())
}
Some(CommandStatus::Success { command_type: CommandType::SystemRebuild, .. }) => {
// Show green checkmark for successful rebuild
("", Theme::success())
}
_ => {
// Normal status icon based on metrics
let host_status = self.calculate_host_status(host, metric_store);
(StatusIcons::get_icon(host_status), Theme::status_color(host_status))
}
}
} else {
// No host widgets yet, use normal status
let host_status = self.calculate_host_status(host, metric_store);
(StatusIcons::get_icon(host_status), Theme::status_color(host_status))
};
// Add status icon // Add status icon with background color as foreground against status background
spans.push(Span::styled( host_spans.push(Span::styled(
format!("{} ", status_icon), format!("{} ", status_icon),
Style::default().fg(status_color), Style::default().fg(Theme::background()).bg(background_color),
)); ));
if Some(host) == self.current_host.as_ref() { if Some(host) == self.current_host.as_ref() {
// Selected host in bold bright white // Selected host in bold background color against status background
spans.push(Span::styled( host_spans.push(Span::styled(
host.clone(), host.clone(),
Typography::title().add_modifier(Modifier::BOLD), Style::default()
.fg(Theme::background())
.bg(background_color)
.add_modifier(Modifier::BOLD),
)); ));
} else { } else {
// Other hosts in normal style with status color // Other hosts in normal background color against status background
spans.push(Span::styled( host_spans.push(Span::styled(
host.clone(), host.clone(),
Style::default().fg(status_color), Style::default().fg(Theme::background()).bg(background_color),
)); ));
} }
} }
let title_line = Line::from(spans); // Add right padding
let title = Paragraph::new(vec![title_line]); host_spans.push(Span::styled(
" ",
Style::default().fg(Theme::background()).bg(background_color)
));
frame.render_widget(title, area); let host_line = Line::from(host_spans);
let host_title = Paragraph::new(vec![host_line])
.style(Style::default().bg(background_color))
.alignment(ratatui::layout::Alignment::Right);
frame.render_widget(host_title, chunks[1]);
} }
/// Calculate overall status for a host based on its metrics /// Calculate overall status for a host based on its metrics
@@ -714,7 +744,7 @@ impl TuiApp {
let metrics = metric_store.get_metrics_for_host(hostname); let metrics = metric_store.get_metrics_for_host(hostname);
if metrics.is_empty() { if metrics.is_empty() {
return Status::Unknown; return Status::Offline;
} }
// First check if we have the aggregated host status summary from the agent // First check if we have the aggregated host status summary from the agent
@@ -734,7 +764,8 @@ impl TuiApp {
Status::Warning => has_warning = true, Status::Warning => has_warning = true,
Status::Pending => has_pending = true, Status::Pending => has_pending = true,
Status::Ok => ok_count += 1, Status::Ok => ok_count += 1,
Status::Unknown => {} // Ignore unknown for aggregation Status::Unknown => {}, // Ignore unknown for aggregation
Status::Offline => {}, // Ignore offline for aggregation
} }
} }
@@ -769,39 +800,22 @@ impl TuiApp {
let mut shortcuts = Vec::new(); let mut shortcuts = Vec::new();
// Global shortcuts // Global shortcuts
shortcuts.push("Tab: Switch Host".to_string()); shortcuts.push("Tab: Host".to_string());
shortcuts.push("Shift+Tab: Switch Panel".to_string()); shortcuts.push("↑↓/jk: Select".to_string());
shortcuts.push("r: Rebuild".to_string());
// Scroll shortcuts (always available) shortcuts.push("s/S: Start/Stop".to_string());
shortcuts.push("↑↓: Scroll".to_string()); shortcuts.push("J: Logs".to_string());
shortcuts.push("L: Custom".to_string());
// Panel-specific shortcuts shortcuts.push("w: Wake".to_string());
match self.focused_panel {
PanelType::System => {
shortcuts.push("R: Rebuild".to_string());
}
PanelType::Services => {
shortcuts.push("S: Start".to_string());
shortcuts.push("Shift+S: Stop".to_string());
shortcuts.push("R: Restart".to_string());
}
PanelType::Backup => {
shortcuts.push("B: Trigger Backup".to_string());
}
}
// Always show quit // Always show quit
shortcuts.push("Q: Quit".to_string()); shortcuts.push("q: Quit".to_string());
shortcuts shortcuts
} }
fn render_system_panel(&mut self, frame: &mut Frame, area: Rect, _metric_store: &MetricStore) { fn render_system_panel(&mut self, frame: &mut Frame, area: Rect, _metric_store: &MetricStore) {
let system_block = if self.focused_panel == PanelType::System { let system_block = Components::widget_block("system");
Components::focused_widget_block("system")
} else {
Components::widget_block("system")
};
let inner_area = system_block.inner(area); let inner_area = system_block.inner(area);
frame.render_widget(system_block, area); frame.render_widget(system_block, area);
// Get current host widgets, create if none exist // Get current host widgets, create if none exist
@@ -811,16 +825,12 @@ impl TuiApp {
host_widgets.system_scroll_offset host_widgets.system_scroll_offset
}; };
let host_widgets = self.get_or_create_host_widgets(&hostname); let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset); host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname);
} }
} }
fn render_backup_panel(&mut self, frame: &mut Frame, area: Rect) { fn render_backup_panel(&mut self, frame: &mut Frame, area: Rect) {
let backup_block = if self.focused_panel == PanelType::Backup { let backup_block = Components::widget_block("backup");
Components::focused_widget_block("backup")
} else {
Components::widget_block("backup")
};
let inner_area = backup_block.inner(area); let inner_area = backup_block.inner(area);
frame.render_widget(backup_block, area); frame.render_widget(backup_block, area);
@@ -835,4 +845,91 @@ impl TuiApp {
} }
} }
/// Render offline host message with wake-up option
fn render_offline_host_message(&self, frame: &mut Frame, area: Rect) {
use ratatui::layout::Alignment;
use ratatui::style::Modifier;
use ratatui::text::{Line, Span};
use ratatui::widgets::{Block, Borders, Paragraph};
// Get hostname for message
let hostname = self.current_host.as_ref()
.map(|h| h.as_str())
.unwrap_or("Unknown");
// Check if host has MAC address for wake-on-LAN
let has_mac = self.current_host.as_ref()
.and_then(|hostname| self.config.hosts.get(hostname))
.and_then(|details| details.mac_address.as_ref())
.is_some();
// Create message content
let mut lines = vec![
Line::from(Span::styled(
format!("Host '{}' is offline", hostname),
Style::default().fg(Theme::muted_text()).add_modifier(Modifier::BOLD),
)),
Line::from(""),
];
if has_mac {
lines.push(Line::from(Span::styled(
"Press 'w' to wake up host",
Style::default().fg(Theme::primary_text()).add_modifier(Modifier::BOLD),
)));
} else {
lines.push(Line::from(Span::styled(
"No MAC address configured - cannot wake up",
Style::default().fg(Theme::muted_text()),
)));
}
// Create centered message
let message = Paragraph::new(lines)
.block(Block::default()
.borders(Borders::ALL)
.border_style(Style::default().fg(Theme::muted_text()))
.title(" Offline Host ")
.title_style(Style::default().fg(Theme::muted_text()).add_modifier(Modifier::BOLD)))
.style(Style::default().bg(Theme::background()).fg(Theme::primary_text()))
.alignment(Alignment::Center);
// Center the message in the available area
let popup_area = ratatui::layout::Layout::default()
.direction(Direction::Vertical)
.constraints([
Constraint::Percentage(40),
Constraint::Length(6),
Constraint::Percentage(40),
])
.split(area)[1];
let popup_area = ratatui::layout::Layout::default()
.direction(Direction::Horizontal)
.constraints([
Constraint::Percentage(25),
Constraint::Percentage(50),
Constraint::Percentage(25),
])
.split(popup_area)[1];
frame.render_widget(message, popup_area);
}
/// Parse MAC address string (e.g., "AA:BB:CC:DD:EE:FF") to [u8; 6]
fn parse_mac_address(mac_str: &str) -> Result<[u8; 6], &'static str> {
let parts: Vec<&str> = mac_str.split(':').collect();
if parts.len() != 6 {
return Err("MAC address must have 6 parts separated by colons");
}
let mut mac = [0u8; 6];
for (i, part) in parts.iter().enumerate() {
match u8::from_str_radix(part, 16) {
Ok(byte) => mac[i] = byte,
Err(_) => return Err("Invalid hexadecimal byte in MAC address"),
}
}
Ok(mac)
}
} }

View File

@@ -147,6 +147,7 @@ impl Theme {
Status::Warning => Self::warning(), Status::Warning => Self::warning(),
Status::Critical => Self::error(), Status::Critical => Self::error(),
Status::Unknown => Self::muted_text(), Status::Unknown => Self::muted_text(),
Status::Offline => Self::muted_text(), // Dark gray for offline
} }
} }
@@ -244,8 +245,9 @@ impl StatusIcons {
Status::Ok => "", Status::Ok => "",
Status::Pending => "", // Hollow circle for pending Status::Pending => "", // Hollow circle for pending
Status::Warning => "", Status::Warning => "",
Status::Critical => "", Status::Critical => "!",
Status::Unknown => "?", Status::Unknown => "?",
Status::Offline => "", // Empty circle for offline
} }
} }
@@ -258,6 +260,7 @@ impl StatusIcons {
Status::Warning => Theme::warning(), // Yellow Status::Warning => Theme::warning(), // Yellow
Status::Critical => Theme::error(), // Red Status::Critical => Theme::error(), // Red
Status::Unknown => Theme::muted_text(), // Gray Status::Unknown => Theme::muted_text(), // Gray
Status::Offline => Theme::muted_text(), // Dark gray for offline
}; };
vec![ vec![
@@ -289,27 +292,9 @@ impl Components {
) )
} }
/// Widget block with focus indicator (blue border)
pub fn focused_widget_block(title: &str) -> Block<'_> {
Block::default()
.title(title)
.borders(Borders::ALL)
.style(Style::default().fg(Theme::highlight()).bg(Theme::background())) // Blue border for focus
.title_style(
Style::default()
.fg(Theme::highlight()) // Blue title for focus
.bg(Theme::background()),
)
}
} }
impl Typography { impl Typography {
/// Main title style (dashboard header)
pub fn title() -> Style {
Style::default()
.fg(Theme::primary_text())
.bg(Theme::background())
}
/// Widget title style (panel headers) - bold bright white /// Widget title style (panel headers) - bold bright white
pub fn widget_title() -> Style { pub fn widget_title() -> Style {

View File

@@ -259,7 +259,12 @@ impl Widget for BackupWidget {
services.sort_by(|a, b| a.name.cmp(&b.name)); services.sort_by(|a, b| a.name.cmp(&b.name));
self.service_metrics = services; self.service_metrics = services;
self.has_data = !metrics.is_empty(); // Only show backup panel if we have meaningful backup data
self.has_data = !metrics.is_empty() && (
self.last_run_timestamp.is_some() ||
self.total_repo_size_gb.is_some() ||
!self.service_metrics.is_empty()
);
debug!( debug!(
"Backup widget updated: status={:?}, services={}, total_size={:?}GB", "Backup widget updated: status={:?}, services={}, total_size={:?}GB",

View File

@@ -9,7 +9,7 @@ use tracing::debug;
use super::Widget; use super::Widget;
use crate::ui::theme::{Components, StatusIcons, Theme, Typography}; use crate::ui::theme::{Components, StatusIcons, Theme, Typography};
use crate::ui::{CommandStatus, CommandType}; use crate::ui::CommandType;
use ratatui::style::Style; use ratatui::style::Style;
/// Services widget displaying hierarchical systemd service statuses /// Services widget displaying hierarchical systemd service statuses
@@ -113,13 +113,10 @@ impl ServicesWidget {
name.to_string() name.to_string()
}; };
// Parent services always show active/inactive status // Parent services always show actual systemctl status
let status_str = match info.widget_status { let status_str = match info.widget_status {
Status::Ok => "active".to_string(),
Status::Pending => "pending".to_string(), Status::Pending => "pending".to_string(),
Status::Warning => "inactive".to_string(), _ => info.status.clone(), // Use actual status from agent (active/inactive/failed)
Status::Critical => "failed".to_string(),
Status::Unknown => "unknown".to_string(),
}; };
format!( format!(
@@ -128,26 +125,17 @@ impl ServicesWidget {
) )
} }
/// Get status icon for service, considering command status for visual feedback /// Get status icon for service, considering pending transitions for visual feedback
fn get_service_icon_and_status(&self, service_name: &str, info: &ServiceInfo, command_status: Option<&CommandStatus>) -> (String, String, ratatui::prelude::Color) { fn get_service_icon_and_status(&self, service_name: &str, info: &ServiceInfo, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) -> (String, String, ratatui::prelude::Color) {
// Check if this service is currently being operated on // Check if this service has a pending transition
if let Some(status) = command_status { if let Some((command_type, _original_status, _start_time)) = pending_transitions.get(service_name) {
match status { // Show transitional icons for pending commands
CommandStatus::InProgress { command_type, target, .. } => { let (icon, status_text) = match command_type {
if target == service_name { CommandType::ServiceStart => ("", "starting"),
// Only show special icons for service commands CommandType::ServiceStop => ("", "stopping"),
if let Some((icon, status_text)) = match command_type { _ => return (StatusIcons::get_icon(info.widget_status).to_string(), info.status.clone(), Theme::status_color(info.widget_status)), // Not a service command
CommandType::ServiceRestart => Some(("", "restarting")), };
CommandType::ServiceStart => Some(("", "starting")), return (icon.to_string(), status_text.to_string(), Theme::highlight());
CommandType::ServiceStop => Some(("", "stopping")),
_ => None, // Don't handle non-service commands here
} {
return (icon.to_string(), status_text.to_string(), Theme::highlight());
}
}
}
_ => {} // Success/Failed states will show normal status
}
} }
// Normal status display // Normal status display
@@ -158,19 +146,20 @@ impl ServicesWidget {
Status::Warning => Theme::warning(), Status::Warning => Theme::warning(),
Status::Critical => Theme::error(), Status::Critical => Theme::error(),
Status::Unknown => Theme::muted_text(), Status::Unknown => Theme::muted_text(),
Status::Offline => Theme::muted_text(),
}; };
(icon.to_string(), info.status.clone(), status_color) (icon.to_string(), info.status.clone(), status_color)
} }
/// Create spans for sub-service with icon next to name, considering command status /// Create spans for sub-service with icon next to name, considering pending transitions
fn create_sub_service_spans_with_status( fn create_sub_service_spans_with_transitions(
&self, &self,
name: &str, name: &str,
info: &ServiceInfo, info: &ServiceInfo,
is_last: bool, is_last: bool,
command_status: Option<&CommandStatus>, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>,
) -> Vec<ratatui::text::Span<'static>> { ) -> Vec<ratatui::text::Span<'static>> {
// Truncate long sub-service names to fit layout (accounting for indentation) // Truncate long sub-service names to fit layout (accounting for indentation)
let short_name = if name.len() > 18 { let short_name = if name.len() > 18 {
@@ -179,11 +168,11 @@ impl ServicesWidget {
name.to_string() name.to_string()
}; };
// Get status icon and text, considering command status // Get status icon and text, considering pending transitions
let (icon, mut status_str, status_color) = self.get_service_icon_and_status(name, info, command_status); let (icon, mut status_str, status_color) = self.get_service_icon_and_status(name, info, pending_transitions);
// For sub-services, prefer latency if available (unless command is in progress) // For sub-services, prefer latency if available (unless transition is pending)
if command_status.is_none() { if !pending_transitions.contains_key(name) {
if let Some(latency) = info.latency_ms { if let Some(latency) = info.latency_ms {
status_str = if latency < 0.0 { status_str = if latency < 0.0 {
"timeout".to_string() "timeout".to_string()
@@ -241,13 +230,14 @@ impl ServicesWidget {
/// Get currently selected service name (for actions) /// Get currently selected service name (for actions)
pub fn get_selected_service(&self) -> Option<String> { pub fn get_selected_service(&self) -> Option<String> {
// Build the same display list to find the selected service // Build the same display list to find the selected service
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>)> = Vec::new(); let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>, String)> = Vec::new();
let mut parent_services: Vec<_> = self.parent_services.iter().collect(); let mut parent_services: Vec<_> = self.parent_services.iter().collect();
parent_services.sort_by(|(a, _), (b, _)| a.cmp(b)); parent_services.sort_by(|(a, _), (b, _)| a.cmp(b));
for (parent_name, parent_info) in parent_services { for (parent_name, parent_info) in parent_services {
display_lines.push((parent_name.clone(), parent_info.widget_status, false, None)); let parent_line = self.format_parent_service_line(parent_name, parent_info);
display_lines.push((parent_line, parent_info.widget_status, false, None, parent_name.clone()));
if let Some(sub_list) = self.sub_services.get(parent_name) { if let Some(sub_list) = self.sub_services.get(parent_name) {
let mut sorted_subs = sub_list.clone(); let mut sorted_subs = sub_list.clone();
@@ -255,17 +245,19 @@ impl ServicesWidget {
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() { for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
let is_last_sub = i == sorted_subs.len() - 1; let is_last_sub = i == sorted_subs.len() - 1;
let full_sub_name = format!("{}_{}", parent_name, sub_name);
display_lines.push(( display_lines.push((
format!("{}_{}", parent_name, sub_name), // Use parent_sub format for sub-services sub_name.clone(),
sub_info.widget_status, sub_info.widget_status,
true, true,
Some((sub_info.clone(), is_last_sub)), Some((sub_info.clone(), is_last_sub)),
full_sub_name,
)); ));
} }
} }
} }
display_lines.get(self.selected_index).map(|(name, _, _, _)| name.clone()) display_lines.get(self.selected_index).map(|(_, _, _, _, raw_name)| raw_name.clone())
} }
/// Get total count of selectable services (parent services only, not sub-services) /// Get total count of selectable services (parent services only, not sub-services)
@@ -274,6 +266,26 @@ impl ServicesWidget {
self.parent_services.len() self.parent_services.len()
} }
/// Get current status of a specific service by name
pub fn get_service_status(&self, service_name: &str) -> Option<String> {
// Check if it's a parent service
if let Some(parent_info) = self.parent_services.get(service_name) {
return Some(parent_info.status.clone());
}
// Check sub-services (format: parent_sub)
for (parent_name, sub_list) in &self.sub_services {
for (sub_name, sub_info) in sub_list {
let full_sub_name = format!("{}_{}", parent_name, sub_name);
if full_sub_name == service_name {
return Some(sub_info.status.clone());
}
}
}
None
}
/// Calculate which parent service index corresponds to a display line index /// Calculate which parent service index corresponds to a display line index
fn calculate_parent_service_index(&self, display_line_index: &usize) -> usize { fn calculate_parent_service_index(&self, display_line_index: &usize) -> usize {
// Build the same display list to map line index to parent service index // Build the same display list to map line index to parent service index
@@ -427,13 +439,9 @@ impl Widget for ServicesWidget {
impl ServicesWidget { impl ServicesWidget {
/// Render with focus, scroll, and command status for visual feedback /// Render with focus, scroll, and pending transitions for visual feedback
pub fn render_with_command_status(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, command_status: Option<&CommandStatus>) { pub fn render_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) {
let services_block = if is_focused { let services_block = Components::widget_block("services");
Components::focused_widget_block("services")
} else {
Components::widget_block("services")
};
let inner_area = services_block.inner(area); let inner_area = services_block.inner(area);
frame.render_widget(services_block, area); frame.render_widget(services_block, area);
@@ -457,14 +465,14 @@ impl ServicesWidget {
return; return;
} }
// Use the existing render logic but with command status // Use the existing render logic but with pending transitions
self.render_services_with_status(frame, content_chunks[1], is_focused, scroll_offset, command_status); self.render_services_with_transitions(frame, content_chunks[1], is_focused, scroll_offset, pending_transitions);
} }
/// Render services list with command status awareness /// Render services list with pending transitions awareness
fn render_services_with_status(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, command_status: Option<&CommandStatus>) { fn render_services_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) {
// Build hierarchical service list for display (same as existing logic) // Build hierarchical service list for display - include raw service name for pending transition lookups
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>)> = Vec::new(); let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>, String)> = Vec::new(); // Added raw service name
// Sort parent services alphabetically for consistent order // Sort parent services alphabetically for consistent order
let mut parent_services: Vec<_> = self.parent_services.iter().collect(); let mut parent_services: Vec<_> = self.parent_services.iter().collect();
@@ -473,7 +481,7 @@ impl ServicesWidget {
for (parent_name, parent_info) in parent_services { for (parent_name, parent_info) in parent_services {
// Add parent service line // Add parent service line
let parent_line = self.format_parent_service_line(parent_name, parent_info); let parent_line = self.format_parent_service_line(parent_name, parent_info);
display_lines.push((parent_line, parent_info.widget_status, false, None)); // false = not sub-service display_lines.push((parent_line, parent_info.widget_status, false, None, parent_name.clone())); // Include raw name
// Add sub-services for this parent (if any) // Add sub-services for this parent (if any)
if let Some(sub_list) = self.sub_services.get(parent_name) { if let Some(sub_list) = self.sub_services.get(parent_name) {
@@ -483,12 +491,14 @@ impl ServicesWidget {
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() { for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
let is_last_sub = i == sorted_subs.len() - 1; let is_last_sub = i == sorted_subs.len() - 1;
let full_sub_name = format!("{}_{}", parent_name, sub_name);
// Store sub-service info for custom span rendering // Store sub-service info for custom span rendering
display_lines.push(( display_lines.push((
sub_name.clone(), sub_name.clone(),
sub_info.widget_status, sub_info.widget_status,
true, true,
Some((sub_info.clone(), is_last_sub)), Some((sub_info.clone(), is_last_sub)),
full_sub_name, // Raw service name for pending transition lookup
)); // true = sub-service, with is_last info )); // true = sub-service, with is_last info
} }
} }
@@ -521,7 +531,7 @@ impl ServicesWidget {
.constraints(vec![Constraint::Length(1); lines_to_show]) .constraints(vec![Constraint::Length(1); lines_to_show])
.split(area); .split(area);
for (i, (line_text, line_status, is_sub, sub_info)) in visible_lines.iter().enumerate() for (i, (line_text, line_status, is_sub, sub_info, raw_service_name)) in visible_lines.iter().enumerate()
{ {
let actual_index = effective_scroll + i; // Real index in the full list let actual_index = effective_scroll + i; // Real index in the full list
@@ -535,47 +545,48 @@ impl ServicesWidget {
}; };
let mut spans = if *is_sub && sub_info.is_some() { let mut spans = if *is_sub && sub_info.is_some() {
// Use custom sub-service span creation WITH command status // Use custom sub-service span creation WITH pending transitions
let (service_info, is_last) = sub_info.as_ref().unwrap(); let (service_info, is_last) = sub_info.as_ref().unwrap();
self.create_sub_service_spans_with_status(line_text, service_info, *is_last, command_status) self.create_sub_service_spans_with_transitions(line_text, service_info, *is_last, pending_transitions)
} else { } else {
// Parent services - check if this parent service has a command in progress // Parent services - check if this parent service has a pending transition using RAW service name
let service_spans = if let Some(status) = command_status { if pending_transitions.contains_key(raw_service_name) {
match status { // Create spans with transitional status
CommandStatus::InProgress { target, .. } => { let (icon, status_text, _) = self.get_service_icon_and_status(raw_service_name, &ServiceInfo {
if target == line_text { status: "".to_string(),
// Create spans with progress status memory_mb: None,
let (icon, status_text, status_color) = self.get_service_icon_and_status(line_text, &ServiceInfo { disk_gb: None,
status: "".to_string(), latency_ms: None,
memory_mb: None, widget_status: *line_status
disk_gb: None, }, pending_transitions);
latency_ms: None,
widget_status: *line_status // Use blue for transitional icons when not selected, background color when selected
}, command_status); let icon_color = if is_selected && !*is_sub && is_focused {
vec![ Theme::background() // Dark background color for visibility against blue selection
ratatui::text::Span::styled(format!("{} ", icon), Style::default().fg(status_color)), } else {
ratatui::text::Span::styled(line_text.clone(), Style::default().fg(Theme::primary_text())), Theme::highlight() // Blue for normal case
ratatui::text::Span::styled(format!(" {}", status_text), Style::default().fg(status_color)), };
]
} else { vec![
StatusIcons::create_status_spans(*line_status, line_text) ratatui::text::Span::styled(format!("{} ", icon), Style::default().fg(icon_color)),
} ratatui::text::Span::styled(line_text.clone(), Style::default().fg(Theme::primary_text())),
} ratatui::text::Span::styled(format!(" {}", status_text), Style::default().fg(icon_color)),
_ => StatusIcons::create_status_spans(*line_status, line_text) ]
}
} else { } else {
StatusIcons::create_status_spans(*line_status, line_text) StatusIcons::create_status_spans(*line_status, line_text)
}; }
service_spans
}; };
// Apply selection highlighting to parent services only, preserving status icon color // Apply selection highlighting to parent services only, making icons background color when selected
// Only show selection when Services panel is focused // Only show selection when Services panel is focused
// Show selection highlighting even when transitional icons are present
if is_selected && !*is_sub && is_focused { if is_selected && !*is_sub && is_focused {
for (i, span) in spans.iter_mut().enumerate() { for (i, span) in spans.iter_mut().enumerate() {
if i == 0 { if i == 0 {
// First span is the status icon - preserve its color // First span is the status icon - use background color for visibility against blue selection
span.style = span.style.bg(Theme::highlight()); span.style = span.style
.bg(Theme::highlight())
.fg(Theme::background());
} else { } else {
// Other spans (text) get full selection highlighting // Other spans (text) get full selection highlighting
span.style = span.style span.style = span.style

View File

@@ -15,7 +15,6 @@ pub struct SystemWidget {
// NixOS information // NixOS information
nixos_build: Option<String>, nixos_build: Option<String>,
config_hash: Option<String>, config_hash: Option<String>,
active_users: Option<String>,
agent_hash: Option<String>, agent_hash: Option<String>,
// CPU metrics // CPU metrics
@@ -33,6 +32,7 @@ pub struct SystemWidget {
tmp_used_gb: Option<f32>, tmp_used_gb: Option<f32>,
tmp_total_gb: Option<f32>, tmp_total_gb: Option<f32>,
memory_status: Status, memory_status: Status,
tmp_status: Status,
// Storage metrics (collected from disk metrics) // Storage metrics (collected from disk metrics)
storage_pools: Vec<StoragePool>, storage_pools: Vec<StoragePool>,
@@ -66,7 +66,6 @@ impl SystemWidget {
Self { Self {
nixos_build: None, nixos_build: None,
config_hash: None, config_hash: None,
active_users: None,
agent_hash: None, agent_hash: None,
cpu_load_1min: None, cpu_load_1min: None,
cpu_load_5min: None, cpu_load_5min: None,
@@ -80,6 +79,7 @@ impl SystemWidget {
tmp_used_gb: None, tmp_used_gb: None,
tmp_total_gb: None, tmp_total_gb: None,
memory_status: Status::Unknown, memory_status: Status::Unknown,
tmp_status: Status::Unknown,
storage_pools: Vec::new(), storage_pools: Vec::new(),
has_data: false, has_data: false,
} }
@@ -129,7 +129,7 @@ impl SystemWidget {
} }
/// Get the current agent hash for rebuild completion detection /// Get the current agent hash for rebuild completion detection
pub fn get_agent_hash(&self) -> Option<&String> { pub fn _get_agent_hash(&self) -> Option<&String> {
self.agent_hash.as_ref() self.agent_hash.as_ref()
} }
@@ -230,9 +230,30 @@ impl SystemWidget {
/// Extract pool name from disk metric name /// Extract pool name from disk metric name
fn extract_pool_name(&self, metric_name: &str) -> Option<String> { fn extract_pool_name(&self, metric_name: &str) -> Option<String> {
if let Some(captures) = metric_name.strip_prefix("disk_") { // Pattern: disk_{pool_name}_{drive_name}_{metric_type}
if let Some(pos) = captures.find('_') { // Since pool_name can contain underscores, work backwards from known metric suffixes
return Some(captures[..pos].to_string()); if metric_name.starts_with("disk_") {
// First try drive-specific metrics that have device names
if let Some(suffix_pos) = metric_name.rfind("_temperature")
.or_else(|| metric_name.rfind("_wear_percent"))
.or_else(|| metric_name.rfind("_health")) {
// Find the second-to-last underscore to get pool name
let before_suffix = &metric_name[..suffix_pos];
if let Some(drive_start) = before_suffix.rfind('_') {
return Some(metric_name[5..drive_start].to_string()); // Skip "disk_"
}
}
// For pool-level metrics (usage_percent, used_gb, total_gb), take everything before the metric suffix
else if let Some(suffix_pos) = metric_name.rfind("_usage_percent")
.or_else(|| metric_name.rfind("_used_gb"))
.or_else(|| metric_name.rfind("_total_gb")) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
// Fallback to old behavior for unknown patterns
else if let Some(captures) = metric_name.strip_prefix("disk_") {
if let Some(pos) = captures.find('_') {
return Some(captures[..pos].to_string());
}
} }
} }
None None
@@ -240,10 +261,18 @@ impl SystemWidget {
/// Extract drive name from disk metric name /// Extract drive name from disk metric name
fn extract_drive_name(&self, metric_name: &str) -> Option<String> { fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
// Pattern: disk_pool_drive_metric // Pattern: disk_{pool_name}_{drive_name}_{metric_type}
let parts: Vec<&str> = metric_name.split('_').collect(); // Since pool_name can contain underscores, work backwards from known metric suffixes
if parts.len() >= 3 && parts[0] == "disk" { if metric_name.starts_with("disk_") {
return Some(parts[2].to_string()); if let Some(suffix_pos) = metric_name.rfind("_temperature")
.or_else(|| metric_name.rfind("_wear_percent"))
.or_else(|| metric_name.rfind("_health")) {
// Find the second-to-last underscore to get the drive name
let before_suffix = &metric_name[..suffix_pos];
if let Some(drive_start) = before_suffix.rfind('_') {
return Some(before_suffix[drive_start + 1..].to_string());
}
}
} }
None None
} }
@@ -334,14 +363,9 @@ impl Widget for SystemWidget {
self.config_hash = Some(hash.clone()); self.config_hash = Some(hash.clone());
} }
} }
"system_active_users" => { "agent_version" => {
if let MetricValue::String(users) = &metric.value { if let MetricValue::String(version) = &metric.value {
self.active_users = Some(users.clone()); self.agent_hash = Some(version.clone());
}
}
"system_agent_hash" => {
if let MetricValue::String(hash) = &metric.value {
self.agent_hash = Some(hash.clone());
} }
} }
@@ -390,6 +414,7 @@ impl Widget for SystemWidget {
"memory_tmp_usage_percent" => { "memory_tmp_usage_percent" => {
if let MetricValue::Float(usage) = metric.value { if let MetricValue::Float(usage) = metric.value {
self.tmp_usage_percent = Some(usage); self.tmp_usage_percent = Some(usage);
self.tmp_status = metric.status.clone();
} }
} }
"memory_tmp_used_gb" => { "memory_tmp_used_gb" => {
@@ -414,29 +439,25 @@ impl Widget for SystemWidget {
impl SystemWidget { impl SystemWidget {
/// Render with scroll offset support /// Render with scroll offset support
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize) { pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str) {
let mut lines = Vec::new(); let mut lines = Vec::new();
// NixOS section // NixOS section
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled("NixOS:", Typography::widget_title()) Span::styled(format!("NixOS {}:", hostname), Typography::widget_title())
])); ]));
let config_text = self.config_hash.as_deref().unwrap_or("unknown"); let build_text = self.nixos_build.as_deref().unwrap_or("unknown");
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled(format!("Build: {}", config_text), Typography::secondary()) Span::styled(format!("Build: {}", build_text), Typography::secondary())
])); ]));
let agent_hash_text = self.agent_hash.as_deref().unwrap_or("unknown"); let agent_version_text = self.agent_hash.as_deref().unwrap_or("unknown");
let short_hash = if agent_hash_text.len() > 8 && agent_hash_text != "unknown" {
&agent_hash_text[..8]
} else {
agent_hash_text
};
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled(format!("Agent: {}", short_hash), Typography::secondary()) Span::styled(format!("Agent: {}", agent_version_text), Typography::secondary())
])); ]));
// CPU section // CPU section
lines.push(Line::from(vec![ lines.push(Line::from(vec![
Span::styled("CPU:", Typography::widget_title()) Span::styled("CPU:", Typography::widget_title())
@@ -472,7 +493,7 @@ impl SystemWidget {
Span::styled(" └─ ", Typography::tree()), Span::styled(" └─ ", Typography::tree()),
]; ];
tmp_spans.extend(StatusIcons::create_status_spans( tmp_spans.extend(StatusIcons::create_status_spans(
self.memory_status.clone(), self.tmp_status.clone(),
&format!("/tmp: {}", tmp_text) &format!("/tmp: {}", tmp_text)
)); ));
lines.push(Line::from(tmp_spans)); lines.push(Line::from(tmp_spans));

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.0" version = "0.1.65"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -87,6 +87,7 @@ pub enum Status {
Warning, Warning,
Critical, Critical,
Unknown, Unknown,
Offline,
} }
impl Status { impl Status {
@@ -190,6 +191,16 @@ impl HysteresisThresholds {
Status::Ok Status::Ok
} }
} }
Status::Offline => {
// Host coming back online, use normal thresholds like first measurement
if value >= self.critical_high {
Status::Critical
} else if value >= self.warning_high {
Status::Warning
} else {
Status::Ok
}
}
} }
} }
} }

View File

@@ -9,6 +9,17 @@ pub struct MetricMessage {
pub metrics: Vec<Metric>, pub metrics: Vec<Metric>,
} }
/// Command output streaming message
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CommandOutputMessage {
pub hostname: String,
pub command_id: String,
pub command_type: String,
pub output_line: String,
pub is_complete: bool,
pub timestamp: u64,
}
impl MetricMessage { impl MetricMessage {
pub fn new(hostname: String, metrics: Vec<Metric>) -> Self { pub fn new(hostname: String, metrics: Vec<Metric>) -> Self {
Self { Self {
@@ -19,6 +30,19 @@ impl MetricMessage {
} }
} }
impl CommandOutputMessage {
pub fn new(hostname: String, command_id: String, command_type: String, output_line: String, is_complete: bool) -> Self {
Self {
hostname,
command_id,
command_type,
output_line,
is_complete,
timestamp: chrono::Utc::now().timestamp() as u64,
}
}
}
/// Commands that can be sent from dashboard to agent /// Commands that can be sent from dashboard to agent
#[derive(Debug, Serialize, Deserialize)] #[derive(Debug, Serialize, Deserialize)]
pub enum Command { pub enum Command {
@@ -55,6 +79,7 @@ pub enum MessageType {
Metrics, Metrics,
Command, Command,
CommandResponse, CommandResponse,
CommandOutput,
Heartbeat, Heartbeat,
} }
@@ -80,6 +105,13 @@ impl MessageEnvelope {
}) })
} }
pub fn command_output(message: CommandOutputMessage) -> Result<Self, crate::SharedError> {
Ok(Self {
message_type: MessageType::CommandOutput,
payload: serde_json::to_vec(&message)?,
})
}
pub fn heartbeat() -> Result<Self, crate::SharedError> { pub fn heartbeat() -> Result<Self, crate::SharedError> {
Ok(Self { Ok(Self {
message_type: MessageType::Heartbeat, message_type: MessageType::Heartbeat,
@@ -113,4 +145,13 @@ impl MessageEnvelope {
}), }),
} }
} }
pub fn decode_command_output(&self) -> Result<CommandOutputMessage, crate::SharedError> {
match self.message_type {
MessageType::CommandOutput => Ok(serde_json::from_slice(&self.payload)?),
_ => Err(crate::SharedError::Protocol {
message: "Expected command output message".to_string(),
}),
}
}
} }