34 Commits

Author SHA1 Message Date
627c533724 Update to v0.1.18 with per-collector intervals and tmux check
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Implement per-collector interval timing respecting NixOS config
- Remove all hardcoded timeout/interval values and make configurable
- Add tmux session requirement check for TUI mode (bypassed for headless)
- Update agent to send config hash in Build field instead of nixos version
- Add nginx check interval, HTTP timeouts, and ZMQ transmission interval configs
- Update NixOS configuration with new configurable values

Breaking changes:
- Build field now shows nix store config hash (8 chars) instead of nixos version
- All intervals now follow individual collector configuration instead of global

New configuration fields:
- systemd.nginx_check_interval_seconds
- systemd.http_timeout_seconds
- systemd.http_connect_timeout_seconds
- zmq.transmission_interval_seconds
2025-10-28 10:08:25 +01:00
e61a845965 Replace complex SystemRebuild with simple SSH + tmux popup approach
All checks were successful
Build and Release / build-and-release (push) Successful in 2m6s
- Remove all SystemRebuild command infrastructure from agent and dashboard
- Replace with direct tmux popup execution: ssh {user}@{host} {alias}
- Add configurable SSH user and rebuild alias in dashboard config
- Eliminate agent process crashes during rebuilds
- Simplify architecture by removing ZMQ command streaming complexity
- Clean up all related dead code and fix compilation warnings

Benefits:
- Process isolation: rebuild runs independently via SSH
- Crash resilience: agent/dashboard can restart without affecting rebuilds
- Configuration flexibility: SSH user and alias configurable per deployment
- Operational simplicity: standard tmux popup interface
2025-10-27 14:25:45 +01:00
69892a2d84 Implement systemd service approach for nixos-rebuild operations
Some checks failed
Build and Release / build-and-release (push) Failing after 1m58s
- Add cm-rebuild systemd service for process isolation
- Add sudo permissions for service control and journal access
- Remove verbose flag for cleaner output
- Ensures reliable rebuild operations without agent crashes
2025-10-26 23:18:09 +01:00
b6da71b7e7 Implement real-time terminal popup for system rebuild operations
All checks were successful
Build and Release / build-and-release (push) Successful in 1m21s
- Add terminal popup UI component with 80% screen coverage and terminal styling
- Extend ZMQ protocol with CommandOutputMessage for streaming output
- Implement real-time output streaming in agent system rebuild handler
- Add keyboard controls (ESC/Q to close, ↑↓ to scroll) for popup interaction
- Fix system panel Build display to show actual NixOS build instead of config hash
- Update service filters in README with wildcard patterns for better matching
- Add periodic progress updates during nixos-rebuild execution
- Integrate command output handling in dashboard main loop
2025-10-26 11:39:03 +01:00
b310206f1f Document automated binary release system
- Replace source build instructions with release workflow
- Document tag-based release process with Gitea Actions
- Include NixOS config update process for releases
- Highlight benefits of static binary approach
2025-10-25 16:36:07 +02:00
8dd943e8f1 Fix config hash to use nix store hash and disable cache persistence 2025-10-25 12:57:47 +02:00
c48a105c28 Implement rebuild progress indicator with host persistence
- Add blue circular arrow (↻) status icon during SystemRebuild commands
- Keep rebuilding hosts visible in dashboard even when temporarily offline
- Extend connection timeout to 5 minutes for hosts undergoing rebuild
- Prevent host switching during rebuild operations
- Update status bar to show rebuild progress immediately when R key pressed
2025-10-25 10:16:39 +02:00
6a1324ba6c Update CLAUDE.md status after implementing separate service start/stop commands 2025-10-24 18:26:31 +02:00
99da289183 Implement remote command execution and visual feedback for service control
This implements the core functionality for executing remote commands through
the dashboard and providing real-time visual feedback to users.

Key Features:
- Remote service control (start/stop/restart) via existing keyboard shortcuts
- System rebuild command with maintenance mode integration
- Real-time visual feedback with service status transitions
- ZMQ command protocol extension for service and system operations

Implementation Details:
- Extended AgentCommand enum with ServiceControl and SystemRebuild variants
- Added agent-side handlers for systemctl and nixos-rebuild execution
- Implemented command status tracking system for visual feedback
- Enhanced services widget to show progress states ( restarting)
- Integrated command execution with existing keyboard navigation

Keyboard Controls:
- Services Panel: Space (start/stop), R (restart)
- System Panel: R (nixos-rebuild switch)
- Backup Panel: B (trigger backup)

Technical Architecture:
- Command flow: UI → Dashboard → ZMQ → Agent → systemctl/nixos-rebuild
- Status tracking: InProgress/Success/Failed states with visual indicators
- Maintenance mode: Automatic /tmp/cm-maintenance file management
- Service feedback: Icon transitions (● →  → ● with status text)
2025-10-23 22:55:44 +02:00
b0b1ea04a1 Update CLAUDE.md with completed keyboard navigation and service selection features
- Mark keyboard navigation and service management as completed
- Document all implemented navigation controls and features
- Update current status to reflect working build display
- Replace old future priorities with actual service management tasks
- Add focus-aware selection and visual feedback documentation
2025-10-23 22:04:15 +02:00
8cb5650fbb Implement complete keyboard navigation and UI enhancement
Phase 1 - Panel Navigation:
- Add PanelType enum and panel focus state management
- Implement Shift+Tab cycling between panels (System → Services → Backup → Network)
- Add visual focus indicators with blue borders for focused panels
- Preserve existing Tab behavior for host switching

Phase 2 - Dynamic Statusbar:
- Add bottom statusbar with context-aware shortcuts
- Display different shortcuts based on focused panel
- Global shortcuts: Tab, Shift+Tab, Up/Down arrows, Q
- Panel-specific shortcuts: R (Rebuild), Space/R (Services), B (Backup), N (Network)

Phase 3 - Scrolling Support:
- Add scroll state management per host and panel type
- Implement Up/Down arrow key scrolling within focused panels
- Smart scrolling that activates only when content exceeds panel height
- Scroll bounds checking to prevent over-scrolling

Complete keyboard navigation experience with visual feedback and contextual help.
2025-10-23 20:34:45 +02:00
84e21dc79a Update CLAUDE.md with current system panel implementation status 2025-10-23 15:47:17 +02:00
39fc9cd22f Implement unified system widget with NixOS info, CPU, RAM, and Storage
- Create NixOS collector for version and active users detection
- Add SystemWidget combining all system information in TODO.md layout
- Replace separate CPU/Memory widgets with unified system display
- Add tree structure for storage with drive temperature/wear info
- Support NixOS version, active users, load averages, memory usage
- Follow exact decimal formatting from specification
2025-10-23 14:01:14 +02:00
7bb5c1cf84 Updated documentation 2025-10-23 12:21:18 +02:00
245e546f18 Updated documentation 2025-10-23 12:12:33 +02:00
b1f294cf2f Implement storage widget tree structure with themed status icons
Add proper hierarchical tree display for storage pools and drives:
- Pool headers with status icons and type indication (Single/multi-drive)
- Individual drive lines with ├─ tree symbols and health status
- Usage summary with └─ end symbol and capacity status
- T: and W: prefixes for temperature and wear level metrics
- Themed status icons using StatusIcons::get_icon() with proper colors
- 2-space indentation for clean tree structure appearance

Replace flat storage display with beautiful tree format:
● Storage steampool (multi-drive):
  ├─ ● sdb T:35°C W:12%
  ├─ ● sdc T:38°C W:8%
  └─ ● 78.1% 1250.3GB/1600.0GB

Uses agent-calculated status from NixOS-configured thresholds.
Update CLAUDE.md with complete implementation specification.
2025-10-22 21:17:33 +02:00
41208aa2a0 Implement status aggregation with notification batching 2025-10-21 18:12:42 +02:00
a937032eb1 Remove hardcoded defaults, require configuration file
- Remove all Default implementations from agent configuration structs
- Make configuration file required for agent startup
- Update NixOS module to generate complete agent.toml configuration
- Add comprehensive configuration options to NixOS module including:
  - Service include/exclude patterns for systemd collector
  - All thresholds and intervals
  - ZMQ communication settings
  - Notification and cache configuration
- Agent now fails fast if no configuration provided
- Eliminates configuration drift between defaults and NixOS settings
2025-10-21 00:01:26 +02:00
00a8ed3da2 Implement hysteresis for metric status changes to prevent flapping
Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.

Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C

Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface

Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)

Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
2025-10-20 18:45:41 +02:00
dcca5bbea3 Fix cache tier test to match actual configuration
- Update test expectations from 5s to 2s intervals for realtime tier
- Fix comment to reflect actual 2s interval instead of outdated 5s reference
- All tests now pass correctly
2025-10-18 18:44:13 +02:00
8a36472a3d Implement real-time process monitoring and fix UI hardcoded data
This commit addresses several key issues identified during development:

Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information

Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"

Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy

UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface

Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details
2025-10-16 23:55:05 +02:00
6bc7f97375 Add refresh shortkey 'r' for on-demand metrics refresh
Implements ZMQ command protocol for dashboard-to-agent communication:
- Agents listen on port 6131 for REQ/REP commands
- Dashboard sends "refresh" command when 'r' key is pressed
- Agents force immediate collection of all metrics via force_refresh_all()
- Fresh data is broadcast immediately to dashboard
- Updated help text to show "r: Refresh all metrics"

Also includes metric-level caching architecture foundation for future
granular control over individual metric update frequencies.
2025-10-15 22:30:04 +02:00
1b572c5c1d Implement intelligent caching system for optimal CPU performance
Replace traditional 5-second polling with tiered collection strategy:
- RealTime (5s): CPU load, memory usage
- Medium (5min): Service status, disk usage
- Slow (15min): SMART data, backup status

Key improvements:
- Reduce CPU usage from 9.5% to <2%
- Cache warming for instant dashboard responsiveness
- Background refresh at 80% of tier intervals
- Thread-safe cache with automatic cleanup

Remove legacy polling code - smart caching is now the default and only mode.
Agent startup enhanced with parallel cache population for immediate data availability.

Architecture: SmartCache + CachedCollector + tiered CollectionScheduler
2025-10-15 11:21:36 +02:00
efdd713f62 Improve dashboard display and fix service issues
- Remove unreachable descriptions from failed nginx sites
- Show complete site URLs instead of truncating at first dot
- Implement service-specific disk quotas (docker: 4GB, immich: 4GB, others: 1-2GB)
- Truncate process names to show only executable name without full path
- Display only highest C-state instead of all C-states for cleaner output
- Format system RAM as xxxMB/GB (totalGB) to match services format
2025-10-15 09:36:03 +02:00
1ee398e648 Improve widget formatting and add logged-in users support
Services widget:
- Fix disk quota formatting with proper rounding instead of truncation
- Remove decimals from RAM quotas and use GB instead of G
- Change quota display to use GB consistently

Backups widget:
- Change GiB to GB for consistency
- Remove spaces between numbers and units
- Update disk usage format to match other widgets: used (totalGB)
- Remove percentage display for cleaner format

System widget:
- Add support for logged-in users in description lines
- Format C-states with "C-State:" prefix on first line, indent subsequent lines
- Add logged_in_users field to SystemSummary data structure

Documentation:
- Add example hash error output to NixOS update instructions
2025-10-14 18:59:31 +02:00
3e5e91f078 Remove SB column and improve widget formatting
Services widget:
- Remove SB (sandbox) column and related formatting function
- Fix quota formatting to show decimals when needed (1.5G not 1G)
- Remove spaces in unit display (128MB not 128 MB)

Storage widget:
- Change usage format to 23GB (932GB) for better readability

Documentation:
- Add NixOS configuration update process to CLAUDE.md
2025-10-14 18:40:12 +02:00
b0d7d5ce35 Testing 2025-10-13 11:23:49 +02:00
c68ccf023e Testing 2025-10-13 00:28:06 +02:00
57b676ad25 Testing 2025-10-13 00:16:24 +02:00
9e344fb66d Testing 2025-10-12 22:31:46 +02:00
c3dbaeead2 Optimize agent CPU usage with throttled service descriptions 2025-10-12 15:13:42 +02:00
2239badc8a Testing 2025-10-12 14:53:27 +02:00
2581435b10 Implement per-service disk usage monitoring
Replaced system-wide disk usage with accurate per-service tracking by scanning
service-specific directories. Services like sshd now correctly show minimal
disk usage instead of misleading system totals.

- Rename storage widget and add drive capacity/usage columns
- Move host display to main dashboard title for cleaner layout
- Replace separate alert displays with color-coded row highlighting
- Add per-service disk usage collection using du command
- Update services widget formatting to handle small disk values
- Restructure into workspace with dedicated agent and dashboard packages
2025-10-11 22:59:16 +02:00
656cb5943b Switch dashboard to ZMQ gossip data source 2025-10-11 13:36:46 +02:00