Replace separate service_logs_cmd with service-manage logs action
to unify service management through single script interface.
Dashboard now calls 'service-manage logs <service>' which provides
intelligent log viewing based on service state and configuration.
Replace separate J/L keys with single L key that calls configurable
service_logs_cmd from dashboard config. Script handles both journalctl
and custom log files automatically based on service configuration.
Update status bar to show all available keybindings including
previously missing backup and terminal commands.
- Replace complex SSH command patterns with simple script calls
- Create service-manage script for start/stop operations with proper logging
- Create rebuild script equivalent to rebuild_git alias with user feedback
- Update dashboard to use unified command pattern: sudo service-manage, sudo rebuild
- Simplify backup to use service management: service-manage start borgbackup
- Configure sudoers with wildcards for Nix store path compatibility
- Remove cmtec references from script names for better genericity
- Update version to 0.1.90
Replace ZMQ-based service start/stop commands with SSH execution in tmux
popups. This provides better user feedback with real-time systemctl output
while eliminating blocking operations from the main message processing loop.
Changes:
- Service start/stop now use SSH with progress display
- Added backup functionality with 'B' key
- Preserved transitional icons (↑/↓) for immediate visual feedback
- Removed all ZMQ service control commands and handlers
- Updated configuration to include backup_alias setting
- All operations (rebuild, backup, services) now use consistent SSH interface
This ensures stable heartbeat processing while providing superior user
experience with live command output and service status feedback.
Simplifies host connection configuration by removing tailscale_ip field,
connection_type preferences, and fallback retry logic. Now uses only the
ip field or hostname as fallback. Eliminates blocking TCP connectivity
tests that interfered with heartbeat processing.
This resolves intermittent host lost/found issues by removing the
connection retry timeouts that blocked the ZMQ message processing loop.
Update the auto connection type logic to try local network connections
before falling back to Tailscale. This provides better performance by
using faster local connections when available while maintaining Tailscale
as a reliable fallback.
Changes:
- Auto connection priority: local → tailscale → hostname (was tailscale → local)
- Fallback retry order updated to match new priority
- Supports omitting IP field in config for hosts without static local IP
Implement configurable network routing for both local and Tailscale networks.
Dashboard now supports intelligent connection selection with automatic fallback
between network types. Add IP configuration fields and connection routing logic
for ZMQ and SSH operations.
Features:
- Host configuration with local and Tailscale IP addresses
- Configurable connection types (local/tailscale/auto)
- Automatic fallback between network connections
- Updated ZMQ connection logic with retry support
- SSH command routing through configured IP addresses
- Add agent_heartbeat metric to agent transmission for reliable host detection
- Update dashboard to track heartbeat timestamps per host instead of general metrics
- Add configurable heartbeat_timeout_seconds to dashboard ZMQ config (default 10s)
- Remove unused timeout_ms from agent config and revert to non-blocking command reception
- Remove unused heartbeat_interval_ms from agent configuration
- Host disconnect detection now uses dedicated heartbeat metrics for improved reliability
- Bump version to 0.1.57
- Add Status::Offline enum variant for disconnected hosts
- All configured hosts now always visible showing offline status when disconnected
- Add WakeOnLAN support using wake-on-lan Rust crate
- Implement w key binding to wake offline hosts with MAC addresses
- Simplify configuration to single [hosts] section with MAC addresses only
- Change critical status icon from ◯ to ! for better visibility
- Add proper MAC address parsing and error handling
- Silent WakeOnLAN operation with logging for success/failure
Configuration format:
[hosts]
hostname = { mac_address = "AA:BB:CC:DD:EE:FF" }
- Add ServiceLogConfig structure for per-host service log paths
- Implement L key handler for custom log file viewing via tmux popup
- Update dashboard config to support service_logs HashMap
- Add tail -f command execution over SSH for real-time log streaming
- Update status line to show L: Custom shortcut
- Document configuration format in CLAUDE.md
Each service can now have custom log file paths configured per host,
accessible via L key with same tmux popup interface as journalctl.
- Remove all SystemRebuild command infrastructure from agent and dashboard
- Replace with direct tmux popup execution: ssh {user}@{host} {alias}
- Add configurable SSH user and rebuild alias in dashboard config
- Eliminate agent process crashes during rebuilds
- Simplify architecture by removing ZMQ command streaming complexity
- Clean up all related dead code and fix compilation warnings
Benefits:
- Process isolation: rebuild runs independently via SSH
- Crash resilience: agent/dashboard can restart without affecting rebuilds
- Configuration flexibility: SSH user and alias configurable per deployment
- Operational simplicity: standard tmux popup interface
- Add nixos_config_api_key_file option to NixOS configuration
- Support reading API token from file for private repositories
- Automatically inject token into HTTPS URLs (https://token@host/repo.git)
- Graceful fallback to original URL if key file missing/empty
- Default key file location: /var/lib/cm-dashboard/git-api-key
Usage: echo 'your-api-token' | sudo tee /var/lib/cm-dashboard/git-api-key
Replace direct directory access with git clone/pull approach:
- Add git configuration options (url, branch, working_dir) to NixOS module
- Update SystemConfig and AgentCommand to use git parameters
- Implement ensure_git_repository() method for clone/pull operations
- Agent clones nixosbox to /var/lib/cm-dashboard/nixos-config
- Maintains security while solving permission denied issues
The agent now manages its own copy of the configuration without
needing access to /home/cm directory.
- Remove all unused configuration options from dashboard config module
- Eliminate hardcoded defaults - dashboard now requires config file like agent
- Keep only actually used config: zmq.subscriber_ports and hosts.predefined_hosts
- Remove unused get_host_metrics function from metric store
- Clean up missing module imports (hosts, utils)
- Make dashboard fail fast if no configuration provided
- Align dashboard config approach with agent configuration pattern
Add comprehensive hysteresis support to prevent status oscillation near
threshold boundaries while maintaining responsive alerting.
Key Features:
- HysteresisThresholds with configurable upper/lower limits
- StatusTracker for per-metric status history
- Default gaps: CPU load 10%, memory 5%, disk temp 5°C
Updated Components:
- CPU load collector (5-minute average with hysteresis)
- Memory usage collector (percentage-based thresholds)
- Disk temperature collector (SMART data monitoring)
- All collectors updated to support StatusTracker interface
Cache Interval Adjustments:
- Service status: 60s → 10s (faster response)
- Disk usage: 300s → 60s (more frequent checks)
- Backup status: 900s → 60s (quicker updates)
- SMART data: moved to 600s tier (10 minutes)
Architecture:
- Individual metric status calculation in collectors
- Centralized StatusTracker in MetricCollectionManager
- Status aggregation preserved in dashboard widgets
This commit addresses several key issues identified during development:
Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information
Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"
Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy
UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface
Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details