Optimize dashboard performance for responsive Tab key navigation

- Replace 6 separate filter operations with single-pass metric categorization in update_metrics - Reduce CPU overhead from 6x to 1x work per metric update cycle - Fix Tab key sluggishness caused by competing expensive filtering operations - Maintain exact same functionality with significantly better performance - Improve UI responsiveness for host switching and navigation - Bump version to 0.1.58
Implement heartbeat-based host connectivity detection
2025-11-06 11:18:39 +01:00 · 2025-11-06 11:04:01 +01:00 · 2025-11-06 10:31:25 +01:00 · 2025-10-31 09:28:31 +01:00 · 2025-10-31 09:03:01 +01:00 · 2025-10-30 17:00:39 +01:00
32 changed files with 1119 additions and 1278 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,3 +0,0 @@
-# Agent Guide
-
-Agents working in this repo must follow the instructions in `CLAUDE.md`.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -2,276 +2,76 @@

 ## Overview

-A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection.
+A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.

-## Implementation Strategy
+## Current Features

-### Current Implementation Status
+### Core Functionality
+- **Real-time Monitoring**: CPU, RAM, Storage, and Service status
+- **Service Management**: Start/stop services with user-stopped tracking
+- **Multi-host Support**: Monitor multiple servers from single dashboard
+- **NixOS Integration**: System rebuild via SSH + tmux popup
+- **Backup Monitoring**: Borgbackup status and scheduling

-**System Panel Enhancement - COMPLETED** ✅
+### User-Stopped Service Tracking
+- Services stopped via dashboard are marked as "user-stopped"
+- User-stopped services report Status::OK instead of Warning
+- Prevents false alerts during intentional maintenance
+- Persistent storage survives agent restarts
+- Automatic flag clearing when services are restarted via dashboard

-All system panel features successfully implemented:
- ✅ **NixOS Collector**: Created collector for version and active users  
- ✅ **System Widget**: Unified widget combining NixOS, CPU, RAM, and Storage
- ✅ **Build Display**: Shows NixOS build information without codename
- ✅ **Active Users**: Displays currently logged in users
- ✅ **Tmpfs Monitoring**: Added /tmp usage to RAM section
- ✅ **Agent Deployment**: NixOS collector working in production
-
-**Keyboard Navigation and Service Management - COMPLETED** ✅
-
-All keyboard navigation and service selection features successfully implemented:
- ✅ **Panel Navigation**: Shift+Tab cycles through visible panels only (System → Services → Backup)
- ✅ **Service Selection**: Up/Down arrows navigate through parent services with visual cursor
- ✅ **Focus Management**: Selection highlighting only visible when Services panel focused
- ✅ **Status Preservation**: Service health colors maintained during selection (green/red icons)
- ✅ **Smart Panel Switching**: Only cycles through panels with data (backup panel conditional)
- ✅ **Scroll Support**: All panels support content scrolling with proper overflow indicators
-
-**Current Status - October 27, 2025:**
- All keyboard navigation features working correctly ✅
- Service selection cursor implemented with focus-aware highlighting ✅
- Panel scrolling fixed for System, Services, and Backup panels ✅
- Build display working: "Build: 25.05.20251004.3bcc93c" ✅
- Agent version display working: "Agent: v0.1.17" ✅
- Cross-host version comparison implemented ✅
- Automated binary release system working ✅
- SMART data consolidated into disk collector ✅
-
-**RESOLVED - Remote Rebuild Functionality:**
- ✅ **System Rebuild**: Now uses simple SSH + tmux popup approach
- ✅ **Process Isolation**: Rebuild runs independently via SSH, survives agent/dashboard restarts
- ✅ **Configuration**: SSH user and rebuild alias configurable in dashboard config
- ✅ **Service Control**: Works correctly for start/stop/restart of services
-
-**Solution Implemented:**
- Replaced complex SystemRebuild command infrastructure with direct tmux popup
- Uses `tmux display-popup "ssh -tt {user}@{hostname} 'bash -ic {alias}'"`
- Configurable SSH user and rebuild alias in dashboard config
- Eliminates all agent crashes during rebuilds
- Simple, reliable, and follows standard tmux interface patterns
-
-**Current Layout:**
-```
-NixOS:
-Build: 25.05.20251004.3bcc93c
-Agent: v0.1.17   # Shows agent version from Cargo.toml
-Active users: cm, simon
-CPU:
-● Load: 0.02 0.31 0.86 • 3000MHz
-RAM:
-● Usage: 33% 2.6GB/7.6GB  
-● /tmp: 0% 0B/2.0GB  
-Storage:  
-● root (Single):  
- ├─ ● nvme0n1 W: 1%
- └─ ● 18% 167.4GB/928.2GB
+### Custom Service Logs
+- Configure service-specific log file paths per host in dashboard config
+- Press `L` on any service to view custom log files via `tail -f`
+- Configuration format in dashboard config:
+```toml
+[service_logs]
+hostname1 = [
+  { service_name = "nginx", log_file_path = "/var/log/nginx/access.log" },
+  { service_name = "app", log_file_path = "/var/log/myapp/app.log" }
+]
+hostname2 = [
+  { service_name = "database", log_file_path = "/var/log/postgres/postgres.log" }
+]
 ```

-**System panel layout fully implemented with blue tree symbols ✅**
-**Tree symbols now use consistent blue theming across all panels ✅**
-**Overflow handling restored for all widgets ("... and X more") ✅**
-**Agent version display working correctly ✅**
-**Cross-host version comparison logging warnings ✅**
-**Backup panel visibility fixed - only shows when meaningful data exists ✅**
-**SSH-based rebuild system fully implemented and working ✅**
+### Service Management
+- **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services
+- **Service Actions**: 
+  - `s` - Start service (sends UserStart command)
+  - `S` - Stop service (sends UserStop command)
+  - `J` - Show service logs (journalctl in tmux popup)
+  - `L` - Show custom log files (tail -f custom paths in tmux popup)
+  - `R` - Rebuild current host
+- **Visual Status**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
+- **Transitional Icons**: Blue arrows during operations

-### Current Keyboard Navigation Implementation
-
-**Navigation Controls:**
- **Tab**: Switch between hosts (cmbox, srv01, srv02, steambox, etc.)
- **Shift+Tab**: Cycle through visible panels (System → Services → Backup → System)
- **Up/Down (System/Backup)**: Scroll through panel content
- **Up/Down (Services)**: Move service selection cursor between parent services
+### Navigation
+- **Tab**: Switch between hosts
+- **↑↓ or j/k**: Select services
+- **J**: Show service logs (journalctl)
+- **L**: Show custom log files
 - **q**: Quit dashboard

-**Panel-Specific Features:**
- **System Panel**: Scrollable content with CPU, RAM, Storage details
- **Services Panel**: Service selection cursor for parent services only (docker, nginx, postgresql, etc.)
- **Backup Panel**: Scrollable repository list with proper overflow handling
-
-**Visual Feedback:**
- **Focused Panel**: Blue border and title highlighting
- **Service Selection**: Blue background with preserved status icon colors (green ● for active, red ● for failed)
- **Focus-Aware Selection**: Selection highlighting only visible when Services panel focused
- **Dynamic Statusbar**: Context-aware shortcuts based on focused panel
-
-### Remote Command Execution - WORKING ✅
-
-**All Issues Resolved (as of 2025-10-24):**
- ✅ **ZMQ Command Protocol**: Extended with ServiceControl and SystemRebuild variants
- ✅ **Agent Handlers**: systemctl and nixos-rebuild execution with maintenance mode
- ✅ **Dashboard Integration**: Keyboard shortcuts execute commands
- ✅ **Service Control**: Fixed toggle logic - replaced with separate 's' (start) and 'S' (stop)
- ✅ **System Rebuild**: Fixed permission issues and sandboxing problems
- ✅ **Git Clone Approach**: Implemented for nixos-rebuild to avoid directory permissions
- ✅ **Visual Feedback**: Directional arrows for service status (↑ starting, ↓ stopping, ↻ restarting)
-
-### Terminal Popup for Real-time Output - IMPLEMENTED ✅
-
-**Status (as of 2025-10-26):**
- ✅ **Terminal Popup UI**: 80% screen coverage with terminal styling and color-coded output
- ✅ **ZMQ Streaming Protocol**: CommandOutputMessage for real-time output transmission
- ✅ **Keyboard Controls**: ESC/Q to close, ↑↓ to scroll, manual close (no auto-close)
- ✅ **Real-time Display**: Live streaming of command output as it happens
- ✅ **Version-based Agent Reporting**: Shows "Agent: v0.1.13" instead of nix store hash
-
-**Current Implementation Issues:**
- ❌ **Agent Process Crashes**: Agent dies during nixos-rebuild execution
- ❌ **Inconsistent Output**: Different outputs each time 'R' is pressed
- ❌ **Limited Output Visibility**: Not capturing all nixos-rebuild progress
-
-**PLANNED SOLUTION - Systemd Service Approach:**
-
-**Problem**: Direct nixos-rebuild execution in agent causes process crashes and inconsistent output.
-
-**Solution**: Create dedicated systemd service for rebuild operations.
-
-**Implementation Plan:**
-1. **NixOS Systemd Service**:
-   ```nix
-   systemd.services.cm-rebuild = {
-     description = "CM Dashboard NixOS Rebuild";
-     serviceConfig = {
-       Type = "oneshot";
-       ExecStart = "${pkgs.nixos-rebuild}/bin/nixos-rebuild switch --flake . --option sandbox false";
-       WorkingDirectory = "/var/lib/cm-dashboard/nixos-config";
-       User = "root";
-       StandardOutput = "journal";
-       StandardError = "journal";
-     };
-   };
-   ```
-
-2. **Agent Modification**:
-   - Replace direct nixos-rebuild execution with: `systemctl start cm-rebuild`
-   - Stream output via: `journalctl -u cm-rebuild -f --no-pager`
-   - Monitor service status for completion detection
-
-3. **Benefits**:
-   - **Process Isolation**: Service runs independently, won't crash agent
-   - **Consistent Output**: Always same deterministic rebuild process
-   - **Proper Logging**: systemd journal handles all output management
-   - **Resource Management**: systemd manages cleanup and resource limits
-   - **Status Tracking**: Can query service status (running/failed/success)
-
-**Next Priority**: Implement systemd service approach for reliable rebuild operations.
-
-**Keyboard Controls Status:**
- **Services Panel**: 
-  - R (restart) ✅ Working
-  - s (start) ✅ Working  
-  - S (stop) ✅ Working
- **System Panel**: R (nixos-rebuild) ✅ Working with --option sandbox false
- **Backup Panel**: B (trigger backup) ❓ Not implemented
-
-**Visual Feedback Implementation - IN PROGRESS:**
-
-Context-appropriate progress indicators for each panel:
-
-**Services Panel** (Service status transitions):
-```
-● nginx          active    →  ⏳ nginx      restarting  →  ● nginx          active
-● docker         active    →  ⏳ docker     stopping    →  ● docker         inactive  
-```
-
-**System Panel** (Build progress in NixOS section):
-```
-NixOS:
-Build: 25.05.20251004.3bcc93c    →    Build: [████████████     ] 65%
-Active users: cm, simon               Active users: cm, simon
-```
-
-**Backup Panel** (OnGoing status with progress):
-```
-Latest backup:              →    Latest backup:
-● 2024-10-23 14:32:15            ● OnGoing  
-└─ Duration: 1.3m                 └─ [██████       ] 60%
-```
-
-**Critical Configuration Hash Fix - HIGH PRIORITY:**
-
-**Problem:** Configuration hash currently shows git commit hash instead of actual deployed system hash.
-
-**Current (incorrect):** 
- Shows git hash: `db11f82` (source repository commit)
- Not accurate - doesn't reflect what's actually deployed
-
-**Target (correct):**
- Show nix store hash: `d8ivwiar` (first 8 chars from deployed system)  
- Source: `/nix/store/d8ivwiarhwhgqzskj6q2482r58z46qjf-nixos-system-cmbox-25.05.20251004.3bcc93c`
- Pattern: Extract hash from `/nix/store/HASH-nixos-system-HOSTNAME-VERSION`
-
-**Benefits:**
-1. **Deployment Verification:** Confirms rebuild actually succeeded
-2. **Accurate Status:** Shows what's truly running, not just source
-3. **Rebuild Completion Detection:** Hash change = rebuild completed
-4. **Rollback Tracking:** Each deployment has unique identifier
-
-**Implementation Required:**
-1. Agent extracts nix store hash from `ls -la /run/current-system` 
-2. Reports this as `system_config_hash` metric instead of git hash
-3. Dashboard displays first 8 characters: `Config: d8ivwiar`
-
-**Next Session Priority Tasks:**
-
-**Remaining Features:**
-1. **Fix Configuration Hash Display (CRITICAL)**:
-   - Use nix store hash instead of git commit hash
-   - Extract from `/run/current-system` -> `/nix/store/HASH-nixos-system-*`
-   - Enables proper rebuild completion detection
-
-2. **Command Response Protocol**:
-   - Agent sends command completion/failure back to dashboard via ZMQ
-   - Dashboard updates UI status from ⏳ to ● when commands complete
-   - Clear success/failure status after timeout
-
-3. **Backup Panel Features**:
-   - Implement backup trigger functionality (B key)
-   - Complete visual feedback for backup operations
-   - Add backup progress indicators
-
-**Enhancement Tasks:**
- Add confirmation dialogs for destructive actions (stop/restart/rebuild)
- Implement command history/logging
- Add keyboard shortcuts help overlay
-
-**Future Enhanced Navigation:**
- Add Page Up/Down for faster scrolling through long service lists
- Implement search/filter functionality for services
- Add jump-to-service shortcuts (first letter navigation)
-
-**Future Advanced Features:**
- Service dependency visualization
- Historical service status tracking
- Real-time log viewing integration
-
-## Core Architecture Principles - CRITICAL
+## Core Architecture Principles

 ### Individual Metrics Philosophy
-
-**NEW ARCHITECTURE**: Agent collects individual metrics, dashboard composes widgets from those metrics.
+- Agent collects individual metrics, dashboard composes widgets
+- Each metric collected, transmitted, and stored individually
+- Agent calculates status for each metric using thresholds
+- Dashboard aggregates individual metric statuses for widget status

 ### Maintenance Mode
-
-**Purpose:**
-
- Suppress email notifications during planned maintenance or backups
- Prevents false alerts when services are intentionally stopped
-
-**Implementation:**
-
 - Agent checks for `/tmp/cm-maintenance` file before sending notifications
 - File presence suppresses all email notifications while continuing monitoring
 - Dashboard continues to show real status, only notifications are blocked

-**Usage:**
-
+Usage:
 ```bash
 # Enable maintenance mode
 touch /tmp/cm-maintenance

-# Run maintenance tasks (backups, service restarts, etc.)
+# Run maintenance tasks
 systemctl stop service
 # ... maintenance work ...
 systemctl start service
@@ -280,61 +80,84 @@ systemctl start service
 rm /tmp/cm-maintenance
 ```

-**NixOS Integration:**
+## Development and Deployment Architecture

- Borgbackup script automatically creates/removes maintenance file
- Automatic cleanup via trap ensures maintenance mode doesn't stick
- All cinfiguration are shall be done from nixos config
+### Development Path
+- **Location:** `~/projects/cm-dashboard` 
+- **Purpose:** Development workflow only - for committing new code
+- **Access:** Only for developers to commit changes

-**ARCHITECTURE ENFORCEMENT**:
+### Deployment Path  
+- **Location:** `/var/lib/cm-dashboard/nixos-config`
+- **Purpose:** Production deployment only - agent clones/pulls from git
+- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild

- **ZERO legacy code reuse** - Fresh implementation following ARCHITECT.md exactly
- **Individual metrics only** - NO grouped metric structures
- **Reference-only legacy** - Study old functionality, implement new architecture
- **Clean slate mindset** - Build as if legacy codebase never existed
+### Git Flow
+```
+Development: ~/projects/cm-dashboard → git commit → git push
+Deployment:  git pull → /var/lib/cm-dashboard/nixos-config → rebuild
+```

-**Implementation Rules**:
+## Automated Binary Release System

-1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
-2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
-3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
-4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
-   **Testing & Building**:
+CM Dashboard uses automated binary releases instead of source builds.

- **Workspace builds**: `cargo build --workspace` for all testing
- **Clean compilation**: Remove `target/` between architecture changes
- **ZMQ testing**: Test agent-dashboard communication independently
- **Widget testing**: Verify UI layout matches legacy appearance exactly
+### Creating New Releases
+```bash
+cd ~/projects/cm-dashboard
+git tag v0.1.X
+git push origin v0.1.X
+```

-**NEVER in New Implementation**:
+This automatically:
+- Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
+- Creates GitHub-style release with tarball
+- Uploads binaries via Gitea API

- Copy/paste ANY code from legacy backup
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
+### NixOS Configuration Updates
+Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:

-# Important Communication Guidelines
+```nix
+version = "v0.1.X";
+src = pkgs.fetchurl {
+  url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
+  sha256 = "sha256-NEW_HASH_HERE";
+};
+```

-NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise.
+### Get Release Hash
+```bash
+cd ~/projects/nixosbox
+nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
+  url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
+  sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
+}' 2>&1 | grep "got:"
+```

-NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation.
+### Building
+
+**Testing & Building:**
+- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
+- **Clean compilation**: Remove `target/` between major changes
+
+## Important Communication Guidelines
+
+Keep responses concise and focused. Avoid extensive implementation summaries unless requested.

 ## Commit Message Guidelines

 **NEVER mention:**
-
 - Claude or any AI assistant names
 - Automation or AI-generated content
 - Any reference to automated code generation

 **ALWAYS:**
-
 - Focus purely on technical changes and their purpose
 - Use standard software development commit message format
 - Describe what was changed and why, not how it was created
 - Write from the perspective of a human developer

 **Examples:**
-
 - ❌ "Generated with Claude Code"
 - ❌ "AI-assisted implementation"
 - ❌ "Automated refactoring"
@@ -342,83 +165,22 @@ NEVER implement code without first getting explicit user agreement on the approa
 - ✅ "Restructure storage widget with improved layout"
 - ✅ "Update CPU thresholds to production values"

-## Development and Deployment Architecture
+## Implementation Rules

-**CRITICAL:** Development and deployment paths are completely separate:
+1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
+2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
+3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
+4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status

-### Development Path
- **Location:** `~/projects/nixosbox` 
- **Purpose:** Development workflow only - for committing new cm-dashboard code
- **Access:** Only for developers to commit changes
- **Code Access:** Running cm-dashboard code shall NEVER access this path
+**NEVER:**
+- Copy/paste ANY code from legacy implementations
+- Calculate status in dashboard widgets
+- Hardcode metric names in widgets (use const arrays)
+- Create files unless absolutely necessary for achieving goals
+- Create documentation files unless explicitly requested

-### Deployment Path  
- **Location:** `/var/lib/cm-dashboard/nixos-config`
- **Purpose:** Production deployment only - agent clones/pulls from git
- **Access:** Only cm-dashboard agent for deployment operations
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild
-
-### Git Flow
-```
-Development: ~/projects/nixosbox → git commit → git push
-Deployment:  git pull → /var/lib/cm-dashboard/nixos-config → rebuild
-```
-
-## Automated Binary Release System
-
-**IMPLEMENTED:** cm-dashboard now uses automated binary releases instead of source builds.
-
-### Release Workflow
-
-1. **Automated Release Creation**
-   - Gitea Actions workflow builds static binaries on tag push
-   - Creates release with `cm-dashboard-linux-x86_64.tar.gz` tarball
-   - No manual intervention required for binary generation
-
-2. **Creating New Releases**
-   ```bash
-   cd ~/projects/cm-dashboard
-   git tag v0.1.X
-   git push origin v0.1.X
-   ```
-   
-   This automatically:
-   - Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
-   - Creates GitHub-style release with tarball
-   - Uploads binaries via Gitea API
-
-3. **NixOS Configuration Updates**
-   Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
-
-   ```nix
-   version = "v0.1.X";
-   src = pkgs.fetchurl {
-     url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
-     sha256 = "sha256-NEW_HASH_HERE";
-   };
-   ```
-
-4. **Get Release Hash**
-   ```bash
-   cd ~/projects/nixosbox
-   nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
-     url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
-     sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
-   }' 2>&1 | grep "got:"
-   ```
-
-5. **Commit and Deploy**
-   ```bash
-   cd ~/projects/nixosbox
-   git add hosts/common/cm-dashboard.nix
-   git commit -m "Update cm-dashboard to v0.1.X with static binaries"
-   git push
-   ```
-
-### Benefits
-
- **No compilation overhead** on each host
- **Consistent static binaries** across all hosts
- **Faster deployments** - download vs compile
- **No library dependency issues** - static linking
- **Automated pipeline** - tag push triggers everything
+**ALWAYS:**
+- Prefer editing existing files to creating new ones
+- Follow existing code conventions and patterns
+- Use existing libraries and utilities
+- Follow security best practices
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -270,7 +270,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"

 [[package]]
 name = "cm-dashboard"
-version = "0.1.30"
+version = "0.1.57"
 dependencies = [
 "anyhow",
 "chrono",
@@ -286,12 +286,13 @@ dependencies = [
 "toml",
 "tracing",
 "tracing-subscriber",
+ "wake-on-lan",
 "zmq",
 ]

 [[package]]
 name = "cm-dashboard-agent"
-version = "0.1.30"
+version = "0.1.57"
 dependencies = [
 "anyhow",
 "async-trait",
@@ -314,7 +315,7 @@ dependencies = [

 [[package]]
 name = "cm-dashboard-shared"
-version = "0.1.30"
+version = "0.1.57"
 dependencies = [
 "chrono",
 "serde",
@@ -2064,6 +2065,12 @@ version = "0.9.5"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"

+[[package]]
+name = "wake-on-lan"
+version = "0.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1ccf60b60ad7e5b1b37372c5134cbcab4db0706c231d212e0c643a077462bc8f"
+
 [[package]]
 name = "walkdir"
 version = "2.5.0"
--- a/README.md
+++ b/README.md
@@ -1,88 +1,106 @@
 # CM Dashboard

-A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ.
+A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.

-## Current Implementation
+## Features

-This is a complete rewrite implementing an **individual metrics architecture** where:
+### Core Monitoring
+- **Real-time metrics**: CPU, RAM, Storage, and Service status
+- **Multi-host support**: Monitor multiple servers from single dashboard  
+- **Service management**: Start/stop services with intelligent status tracking
+- **NixOS integration**: System rebuild via SSH + tmux popup
+- **Backup monitoring**: Borgbackup status and scheduling
+- **Email notifications**: Intelligent batching prevents spam

- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status
- **Dashboard** subscribes to specific metrics and composes widgets
- **Status Aggregation** provides intelligent email notifications with batching
- **Persistent Cache** prevents false notifications on restart
+### User-Stopped Service Tracking
+Services stopped via the dashboard are intelligently tracked to prevent false alerts:

-## Dashboard Interface
+- **Smart status reporting**: User-stopped services show as Status::OK instead of Warning
+- **Persistent storage**: Tracking survives agent restarts via JSON storage
+- **Automatic management**: Flags cleared when services restarted via dashboard
+- **Maintenance friendly**: No false alerts during intentional service operations
+
+## Architecture
+
+### Individual Metrics Philosophy
+- **Agent**: Collects individual metrics, calculates status using thresholds
+- **Dashboard**: Subscribes to specific metrics, composes widgets from individual data
+- **ZMQ Communication**: Efficient real-time metric transmission
+- **Status Aggregation**: Host-level status calculated from all service metrics
+
+### Components
+
+```
+┌─────────────────┐    ZMQ     ┌─────────────────┐
+│                 │◄──────────►│                 │
+│   Agent         │  Metrics   │   Dashboard     │
+│   - Collectors  │            │   - TUI         │
+│   - Status      │            │   - Widgets     │
+│   - Tracking    │            │   - Commands    │
+│                 │            │                 │
+└─────────────────┘            └─────────────────┘
+         │                              │
+         ▼                              ▼
+┌─────────────────┐            ┌─────────────────┐
+│ JSON Storage    │            │ SSH + tmux      │
+│ - User-stopped  │            │ - Remote rebuild│
+│ - Cache         │            │ - Process       │
+│ - State         │            │   isolation     │
+└─────────────────┘            └─────────────────┘
+```
+
+### Service Control Flow
+
+1. **User Action**: Dashboard sends `UserStart`/`UserStop` commands
+2. **Agent Processing**: 
+   - Marks service as user-stopped (if stopping)
+   - Executes `systemctl start/stop service`
+   - Syncs state to global tracker
+3. **Status Calculation**: 
+   - Systemd collector checks user-stopped flag
+   - Reports Status::OK for user-stopped inactive services
+   - Normal Warning status for system failures
+
+## Interface

 ```
 cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
 ┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
-│CPU:                                ││Service:                  Status:  RAM:   Disk:  │
-│● Load: 0.10 0.52 0.88 • 400.0 MHz  ││● docker                  active   27M    496MB  │
-│RAM:                                ││● docker-registry         active   19M    496MB  │
-│● Used: 30% 2.3GB/7.6GB             ││● gitea                   active   579M   2.6GB  │
-│● tmp: 0.0% 0B/2.0GB                ││● gitea-runner-default    active   11M    2.6GB  │
-│Disk nvme0n1:                       ││● haasp-core              active   9M     1MB    │
-│● Health: PASSED                    ││● haasp-mqtt              active   3M     1MB    │
-│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid           active   10M    1MB    │
-│● Usage @boot: 5.9% • 0.1/1.0 GB    ││● immich-server           active   240M   45.1GB │
-│                                    ││● mosquitto               active   1M     1MB    │
-│                                    ││● mysql                   active   38M    225MB  │
-│                                    ││● nginx                   active   28M    24MB   │
-│                                    ││  ├─ ● gitea.cmtec.se     51ms                   │
-│                                    ││  ├─ ● haasp.cmtec.se     43ms                   │
-│                                    ││  ├─ ● haasp.net          43ms                   │
-│                                    ││  ├─ ● pages.cmtec.se     45ms                   │
-└────────────────────────────────────┘│  ├─ ● photos.cmtec.se    41ms                   │
-┌backup──────────────────────────────┐│  ├─ ● unifi.cmtec.se     46ms                   │
-│Latest backup:                      ││  ├─ ● vault.cmtec.se     47ms                   │
-│● Status: OK                        ││  ├─ ● www.kryddorten.se  81ms                   │
-│Duration: 54s • Last: 4h ago        ││  ├─ ● www.mariehall2.se  86ms                   │
-│Disk usage: 48.2GB/915.8GB          ││● postgresql              active   112M   357MB  │
-│P/N: Samsung SSD 870 QVO 1TB        ││● redis-immich            active   8M     45.1GB │
-│S/N: S5RRNF0W800639Y                ││● sshd                    active   2M     0      │
-│● gitea 2 archives 2.7GB            ││● unifi                   active   594M   495MB  │
-│● immich 2 archives 45.0GB          ││● vaultwarden             active   12M    1MB    │
-│● kryddorten 2 archives 67.6MB      ││                                                 │
-│● mariehall2 2 archives 321.8MB     ││                                                 │
-│● nixosbox 2 archives 4.5MB         ││                                                 │
-│● unifi 2 archives 2.9MB            ││                                                 │
-│● vaultwarden 2 archives 305kB      ││                                                 │
+│NixOS:                              ││Service:                  Status:  RAM:   Disk:  │
+│Build: 25.05.20251004.3bcc93c       ││● docker                  active   27M    496MB  │
+│Agent: v0.1.43                      ││● gitea                   active   579M   2.6GB  │
+│Active users: cm, simon             ││● nginx                   active   28M    24MB   │
+│CPU:                                ││  ├─ ● gitea.cmtec.se     51ms                   │
+│● Load: 0.10 0.52 0.88 • 3000MHz    ││  ├─ ● photos.cmtec.se    41ms                   │
+│RAM:                                ││● postgresql              active   112M   357MB  │
+│● Usage: 33% 2.6GB/7.6GB            ││● redis-immich            user-stopped           │
+│● /tmp: 0% 0B/2.0GB                 ││● sshd                    active   2M     0      │
+│Storage:                            ││● unifi                   active   594M   495MB  │
+│● root (Single):                    ││                                                 │
+│ ├─ ● nvme0n1 W: 1%                 ││                                                 │
+│ └─ ● 18% 167.4GB/928.2GB           ││                                                 │
 └────────────────────────────────────┘└─────────────────────────────────────────────────┘
 ```

-**Navigation**: `←→` switch hosts, `r` refresh, `q` quit
+### Navigation
+- **Tab**: Switch between hosts
+- **↑↓ or j/k**: Navigate services
+- **s**: Start selected service (UserStart)  
+- **S**: Stop selected service (UserStop)
+- **J**: Show service logs (journalctl in tmux popup)
+- **R**: Rebuild current host
+- **q**: Quit

-## Features
-
- **Real-time monitoring** - Dashboard updates every 1-2 seconds
- **Individual metric collection** - Granular data for flexible dashboard composition
- **Intelligent status aggregation** - Host-level status calculated from all services
- **Smart email notifications** - Batched, detailed alerts with service groupings
- **Persistent state** - Prevents false notifications on restarts
- **ZMQ communication** - Efficient agent-to-dashboard messaging
- **Clean TUI** - Terminal-based dashboard with color-coded status indicators
-
-## Architecture
-
-### Core Components
-
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
- **Status Aggregation** - Intelligent batching and notification management
- **Persistent Cache** - Maintains state across restarts
-
-### Status Levels
-
- **🟢 Ok** - Service running normally
- **🔵 Pending** - Service starting/stopping/reloading
- **🟡 Warning** - Service issues (high load, memory, disk usage)
- **🔴 Critical** - Service failed or critical thresholds exceeded
- **❓ Unknown** - Service state cannot be determined
+### Status Indicators
+- **Green ●**: Active service
+- **Yellow ◐**: Inactive service (system issue)
+- **Red ◯**: Failed service
+- **Blue arrows**: Service transitioning (↑ starting, ↓ stopping, ↻ restarting)
+- **"user-stopped"**: Service stopped via dashboard (Status::OK)

 ## Quick Start

-### Build
+### Building

 ```bash
 # With Nix (recommended)
@@ -93,21 +111,20 @@ sudo apt install libssl-dev pkg-config  # Ubuntu/Debian
 cargo build --workspace
 ```

-### Run
+### Running

 ```bash
-# Start agent (requires configuration file)
+# Start agent (requires configuration)
 ./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml

-# Start dashboard
-./target/debug/cm-dashboard --config /path/to/dashboard.toml
+# Start dashboard (inside tmux session)
+tmux
+./target/debug/cm-dashboard --config /etc/cm-dashboard/dashboard.toml
 ```

 ## Configuration

-### Agent Configuration (`agent.toml`)
-
-The agent requires a comprehensive TOML configuration file:
+### Agent Configuration

 ```toml
 collection_interval_seconds = 2
@@ -116,50 +133,27 @@ collection_interval_seconds = 2
 publisher_port = 6130
 command_port = 6131
 bind_address = "0.0.0.0"
-timeout_ms = 5000
-heartbeat_interval_ms = 30000
+transmission_interval_seconds = 2

 [collectors.cpu]
 enabled = true
 interval_seconds = 2
-load_warning_threshold = 9.0
+load_warning_threshold = 5.0
 load_critical_threshold = 10.0
-temperature_warning_threshold = 100.0
-temperature_critical_threshold = 110.0

 [collectors.memory]
 enabled = true
 interval_seconds = 2
 usage_warning_percent = 80.0
-usage_critical_percent = 95.0
-
-[collectors.disk]
-enabled = true
-interval_seconds = 300
-usage_warning_percent = 80.0
 usage_critical_percent = 90.0

-[[collectors.disk.filesystems]]
-name = "root"
-uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79"
-mount_point = "/"
-fs_type = "ext4"
-monitor = true
-
 [collectors.systemd]
 enabled = true
 interval_seconds = 10
-memory_warning_mb = 1000.0
-memory_critical_mb = 2000.0
-service_name_filters = [
-  "nginx*", "postgresql*", "redis*", "docker*", "sshd*", 
-  "gitea*", "immich*", "haasp*", "mosquitto*", "mysql*", 
-  "unifi*", "vaultwarden*"
-]
-excluded_services = [
-  "nginx-config-reload", "sshd-keygen", "systemd-", 
-  "getty@", "user@", "dbus-", "NetworkManager-"
-]
+service_name_filters = ["nginx*", "postgresql*", "docker*", "sshd*"]
+excluded_services = ["nginx-config-reload", "systemd-", "getty@"]
+nginx_latency_critical_ms = 1000.0
+http_timeout_seconds = 10

 [notifications]
 enabled = true
@@ -167,251 +161,202 @@ smtp_host = "localhost"
 smtp_port = 25
 from_email = "{hostname}@example.com"
 to_email = "admin@example.com"
-rate_limit_minutes = 0
-trigger_on_warnings = true
-trigger_on_failures = true
-recovery_requires_all_ok = true
-suppress_individual_recoveries = true
-
-[status_aggregation]
-enabled = true
-aggregation_method = "worst_case"
-notification_interval_seconds = 30
-
-[cache]
-persist_path = "/var/lib/cm-dashboard/cache.json"
+aggregation_interval_seconds = 30
 ```

-### Dashboard Configuration (`dashboard.toml`)
+### Dashboard Configuration

 ```toml
 [zmq]
-hosts = [
-  { name = "server1", address = "192.168.1.100", port = 6130 },
-  { name = "server2", address = "192.168.1.101", port = 6130 }
-]
-connection_timeout_ms = 5000
-reconnect_interval_ms = 10000
+subscriber_ports = [6130]
+
+[hosts]
+predefined_hosts = ["cmbox", "srv01", "srv02"]

 [ui]
-refresh_interval_ms = 1000
-theme = "dark"
+ssh_user = "cm"
+rebuild_alias = "nixos-rebuild-cmtec"
 ```

-## Collectors
+## Technical Implementation

-The agent implements several specialized collectors:
+### Collectors

-### CPU Collector (`cpu.rs`)
+#### Systemd Collector
+- **Service Discovery**: Uses `systemctl list-unit-files` + `list-units --all`
+- **Status Calculation**: Checks user-stopped flag before assigning Warning status
+- **Memory Tracking**: Per-service memory usage via `systemctl show`
+- **Sub-services**: Nginx site latency, Docker containers
+- **User-stopped Integration**: `UserStoppedServiceTracker::is_service_user_stopped()`

- Load average (1, 5, 15 minute)
- CPU temperature monitoring
- Real-time process monitoring (top CPU consumers)
- Status calculation with configurable thresholds
+#### User-Stopped Service Tracker
+- **Storage**: `/var/lib/cm-dashboard/user-stopped-services.json`
+- **Thread Safety**: Global singleton with `Arc<Mutex<>>`
+- **Persistence**: Automatic save on state changes
+- **Global Access**: Static methods for collector integration

-### Memory Collector (`memory.rs`)
+#### Other Collectors
+- **CPU**: Load average, temperature, frequency monitoring
+- **Memory**: RAM/swap usage, tmpfs monitoring  
+- **Disk**: Filesystem usage, SMART health data
+- **NixOS**: Build version, active users, agent version
+- **Backup**: Borgbackup repository status and metrics

- RAM usage (total, used, available)
- Swap monitoring
- Real-time process monitoring (top RAM consumers)
- Memory pressure detection
+### ZMQ Protocol

-### Disk Collector (`disk.rs`)
+```rust
+// Metric Message
+#[derive(Serialize, Deserialize)]
+pub struct MetricMessage {
+    pub hostname: String,
+    pub timestamp: u64,
+    pub metrics: Vec<Metric>,
+}

- Filesystem usage per mount point
- SMART health monitoring
- Temperature and wear tracking
- Configurable filesystem monitoring
+// Service Commands
+pub enum AgentCommand {
+    ServiceControl {
+        service_name: String,
+        action: ServiceAction,
+    },
+    SystemRebuild { /* SSH config */ },
+    CollectNow,
+}

-### Systemd Collector (`systemd.rs`)
+pub enum ServiceAction {
+    Start,           // System-initiated
+    Stop,            // System-initiated  
+    UserStart,       // User via dashboard (clears user-stopped)
+    UserStop,        // User via dashboard (marks user-stopped)
+    Status,
+}
+```

- Service status monitoring (`active`, `inactive`, `failed`)
- Memory usage per service
- Service filtering and exclusions
- Handles transitional states (`Status::Pending`)
+### Maintenance Mode

-### Backup Collector (`backup.rs`)
+Suppress notifications during planned maintenance:

- Reads TOML status files from backup systems
- Archive age verification
- Disk usage tracking
- Repository health monitoring
+```bash
+# Enable maintenance mode
+touch /tmp/cm-maintenance
+
+# Perform maintenance
+systemctl stop service
+# ... work ...
+systemctl start service  
+
+# Disable maintenance mode
+rm /tmp/cm-maintenance
+```

 ## Email Notifications

 ### Intelligent Batching
+- **Real-time dashboard**: Immediate status updates
+- **Batched emails**: Aggregated every 30 seconds
+- **Smart grouping**: Services organized by severity
+- **Recovery suppression**: Reduces notification spam

-The system implements smart notification batching to prevent email spam:
-
- **Real-time dashboard updates** - Status changes appear immediately
- **Batched email notifications** - Aggregated every 30 seconds
- **Detailed groupings** - Services organized by severity
-
-### Example Alert Email
-
+### Example Alert
 ```
-Subject: Status Alert: 2 critical, 1 warning, 15 started
+Subject: Status Alert: 1 critical, 2 warnings, 0 recoveries

 Status Summary (30s duration)
 Host Status: Ok → Warning

-🔴 CRITICAL ISSUES (2):
-  postgresql: Ok → Critical
-  nginx: Warning → Critical
+🔴 CRITICAL ISSUES (1):
+  postgresql: Ok → Critical (memory usage 95%)

-🟡 WARNINGS (1):
-  redis: Ok → Warning (memory usage 85%)
+🟡 WARNINGS (2):
+  nginx: Ok → Warning (high load 8.5)
+  redis: user-stopped → Warning (restarted by system)

 ✅ RECOVERIES (0):

-🟢 SERVICE STARTUPS (15):
-  docker: Unknown → Ok
-  sshd: Unknown → Ok
-  ...
-
 --
-CM Dashboard Agent
-Generated at 2025-10-21 19:42:42 CET
+CM Dashboard Agent v0.1.43
 ```

-## Individual Metrics Architecture
-
-The system follows a **metrics-first architecture**:
-
-### Agent Side
-
-```rust
-// Agent collects individual metrics
-vec![
-    Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok),
-    Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning),
-    Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok),
-]
-```
-
-### Dashboard Side
-
-```rust
-// Widgets subscribe to specific metrics
-impl Widget for CpuWidget {
-    fn update_from_metrics(&mut self, metrics: &[&Metric]) {
-        for metric in metrics {
-            match metric.name.as_str() {
-                "cpu_load_1min" => self.load_1min = metric.value.as_f32(),
-                "cpu_load_5min" => self.load_5min = metric.value.as_f32(),
-                "cpu_temperature_celsius" => self.temperature = metric.value.as_f32(),
-                _ => {}
-            }
-        }
-    }
-}
-```
-
-## Persistent Cache
-
-The cache system prevents false notifications:
-
- **Automatic saving** - Saves when service status changes
- **Persistent storage** - Maintains state across agent restarts
- **Simple design** - No complex TTL or cleanup logic
- **Status preservation** - Prevents duplicate notifications
-
 ## Development

 ### Project Structure
-
 ```
 cm-dashboard/
-├── agent/                  # Metrics collection agent
+├── agent/                     # Metrics collection agent
 │   ├── src/
-│   │   ├── collectors/     # CPU, memory, disk, systemd, backup
-│   │   ├── status/         # Status aggregation and notifications
-│   │   ├── cache/          # Persistent metric caching
-│   │   ├── config/         # TOML configuration loading
-│   │   └── notifications/  # Email notification system
-├── dashboard/              # TUI dashboard application
+│   │   ├── collectors/        # CPU, memory, disk, systemd, backup, nixos
+│   │   ├── service_tracker.rs # User-stopped service tracking
+│   │   ├── status/            # Status aggregation and notifications
+│   │   ├── config/            # TOML configuration loading
+│   │   └── communication/     # ZMQ message handling
+├── dashboard/                 # TUI dashboard application  
 │   ├── src/
-│   │   ├── ui/widgets/     # CPU, memory, services, backup widgets
-│   │   ├── metrics/        # Metric storage and filtering
-│   │   └── communication/  # ZMQ metric consumption
-├── shared/                 # Shared types and utilities
+│   │   ├── ui/widgets/        # CPU, memory, services, backup, system
+│   │   ├── communication/     # ZMQ consumption and commands
+│   │   └── app.rs            # Main application loop
+├── shared/                    # Shared types and utilities
 │   └── src/
-│       ├── metrics.rs      # Metric, Status, and Value types
-│       ├── protocol.rs     # ZMQ message format
-│       └── cache.rs        # Cache configuration
-└── README.md              # This file
+│       ├── metrics.rs         # Metric, Status, StatusTracker types
+│       ├── protocol.rs        # ZMQ message format
+│       └── cache.rs           # Cache configuration
+└── CLAUDE.md                  # Development guidelines and rules
 ```

-### Building
-
+### Testing
 ```bash
-# Debug build
-cargo build --workspace
+# Build and test
+nix-shell -p openssl pkg-config --run "cargo build --workspace"
+nix-shell -p openssl pkg-config --run "cargo test --workspace"

-# Release build
-cargo build --workspace --release
-
-# Run tests
-cargo test --workspace
-
-# Check code formatting
-cargo fmt --all -- --check
-
-# Run clippy linter
+# Code quality
+cargo fmt --all
 cargo clippy --workspace -- -D warnings
 ```

-### Dependencies
+## Deployment

- **tokio** - Async runtime
- **zmq** - Message passing between agent and dashboard
- **ratatui** - Terminal user interface
- **serde** - Serialization for metrics and config
- **anyhow/thiserror** - Error handling
- **tracing** - Structured logging
- **lettre** - SMTP email notifications
- **clap** - Command-line argument parsing
- **toml** - Configuration file parsing
+### Automated Binary Releases
+```bash
+# Create new release
+cd ~/projects/cm-dashboard
+git tag v0.1.X
+git push origin v0.1.X
+```

-## NixOS Integration
+This triggers automated:
+- Static binary compilation with `RUSTFLAGS="-C target-feature=+crt-static"`
+- GitHub-style release creation
+- Tarball upload to Gitea

-This project is designed for declarative deployment via NixOS:
-
-### Configuration Generation
-
-The NixOS module automatically generates the agent configuration:
+### NixOS Integration
+Update `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:

 ```nix
-# hosts/common/cm-dashboard.nix
-services.cm-dashboard-agent = {
-  enable = true;
-  port = 6130;
+version = "v0.1.43";
+src = pkgs.fetchurl {
+  url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
+  sha256 = "sha256-HASH";
 };
 ```

-### Deployment
-
+Get hash via:
 ```bash
-# Update NixOS configuration
-git add hosts/common/cm-dashboard.nix
-git commit -m "Update cm-dashboard configuration"
-git push
-
-# Rebuild system (user-performed)
-sudo nixos-rebuild switch --flake .
+cd ~/projects/nixosbox
+nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
+  url = "URL_HERE";
+  sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
+}' 2>&1 | grep "got:"
 ```

 ## Monitoring Intervals

- **CPU/Memory**: 2 seconds (real-time monitoring)
- **Disk usage**: 300 seconds (5 minutes)
- **Systemd services**: 10 seconds
- **SMART health**: 600 seconds (10 minutes)
- **Backup status**: 60 seconds (1 minute)
- **Email notifications**: 30 seconds (batched)
- **Dashboard updates**: 1 second (real-time display)
+- **Metrics Collection**: 2 seconds (CPU, memory, services)
+- **Metric Transmission**: 2 seconds (ZMQ publish)
+- **Dashboard Updates**: 1 second (UI refresh)
+- **Email Notifications**: 30 seconds (batched)
+- **Disk Monitoring**: 300 seconds (5 minutes)
+- **Service Discovery**: 300 seconds (5 minutes cache)

 ## License

-MIT License - see LICENSE file for details
-
+MIT License - see LICENSE file for details.
--- a/TODO.md
+++ b/TODO.md
@@ -1,63 +0,0 @@
-# TODO
-
-## Systemd filtering (agent)
-
- remove user systemd collection
- reduce number of systemctl call
- Cahnge so only services in include list are detected
- Filter on exact name
- Add support for "\*" in filtering
-
-## System panel (agent/dashboard)
-
-use following layout:
-'''
-NixOS:
-Build: xxxxxx
-Agen: xxxxxx
-CPU:
-● Load: 0.02 0.31 0.86
-└─ Freq: 3000MHz
-RAM:
-● Usage: 33% 2.6GB/7.6GB  
- └─ ● /tmp: 0% 0B/2.0GB
-Storage:
-● /:  
- ├─ ● nvme0n1 T: 40C • W: 4%  
- └─ ● 8% 75.0GB/906.2GB
-'''
-
- Add support to show login/active users
- Add support to show timestamp/version for latest nixos rebuild
-
-## Backup panel (dashboard)
-
-use following layout:
-'''
-Latest backup:  
-● <timestamp>
-└─ Duration: 1.3m
-Disk:
-● Samsung SSD 870 QVO 1TB  
- ├─ S/N: S5RRNF0W800639Y
-└─ Usage: 50.5GB/915.8GB
-Repos:
-● gitea (4) 5.1GB  
-● immich (4) 45.0GB  
-● kryddorten (4) 67.8MB  
-● mariehall2 (4) 322.7MB
-● nixosbox (4) 5.5MB  
-● unifi (4) 5.7MB  
-● vaultwarden (4) 508kB
-'''
-
-## Keyboard navigation and scrolling (dashboard)
-
- Add keyboard navigation between panels "Shift-Tab"
- Add lower statusbar with dynamic updated shortcuts when switchng between panels
-
-## Remote execution (agent/dashboard)
-
- Add support for send command via dashboard to agent to do nixos rebuid
- Add support for navigating services in dashboard and trigger start/stop/restart
- Add support for trigger backup
--- a/agent/Cargo.toml
+++ b/agent/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "cm-dashboard-agent"
-version = "0.1.31"
+version = "0.1.58"
 edition = "2021"

 [dependencies]
--- a/agent/src/agent.rs
+++ b/agent/src/agent.rs
@@ -8,6 +8,7 @@ use crate::communication::{AgentCommand, ServiceAction, ZmqHandler};
 use crate::config::AgentConfig;
 use crate::metrics::MetricCollectionManager;
 use crate::notifications::NotificationManager;
+use crate::service_tracker::UserStoppedServiceTracker;
 use crate::status::HostStatusManager;
 use cm_dashboard_shared::{Metric, MetricMessage, MetricValue, Status};

@@ -18,6 +19,7 @@ pub struct Agent {
    metric_manager: MetricCollectionManager,
    notification_manager: NotificationManager,
    host_status_manager: HostStatusManager,
+    service_tracker: UserStoppedServiceTracker,
 }

 impl Agent {
@@ -50,6 +52,10 @@ impl Agent {
        let host_status_manager = HostStatusManager::new(config.status_aggregation.clone());
        info!("Host status manager initialized");

+        // Initialize user-stopped service tracker
+        let service_tracker = UserStoppedServiceTracker::init_global()?;
+        info!("User-stopped service tracker initialized");
+
        Ok(Self {
            hostname,
            config,
@@ -57,6 +63,7 @@ impl Agent {
            metric_manager,
            notification_manager,
            host_status_manager,
+            service_tracker,
        })
    }

@@ -173,6 +180,13 @@ impl Agent {
        let version_metric = self.get_agent_version_metric();
        metrics.push(version_metric);

+        // Add heartbeat metric for host connectivity detection
+        let heartbeat_metric = self.get_heartbeat_metric();
+        metrics.push(heartbeat_metric);
+
+        // Check for user-stopped services that are now active and clear their flags
+        self.clear_user_stopped_flags_for_active_services(&metrics);
+
        if metrics.is_empty() {
            debug!("No metrics to broadcast");
            return Ok(());
@@ -191,6 +205,12 @@ impl Agent {
    async fn process_metrics(&mut self, metrics: &[Metric]) -> bool {
        let mut status_changed = false;
        for metric in metrics {
+            // Filter excluded metrics from email notification processing only
+            if self.config.exclude_email_metrics.contains(&metric.name) {
+                debug!("Excluding metric '{}' from email notification processing", metric.name);
+                continue;
+            }
+            
            if self.host_status_manager.process_metric(metric, &mut self.notification_manager).await {
                status_changed = true;
            }
@@ -216,6 +236,22 @@ impl Agent {
        format!("v{}", env!("CARGO_PKG_VERSION"))
    }

+    /// Create heartbeat metric for host connectivity detection
+    fn get_heartbeat_metric(&self) -> Metric {
+        use std::time::{SystemTime, UNIX_EPOCH};
+        
+        let timestamp = SystemTime::now()
+            .duration_since(UNIX_EPOCH)
+            .unwrap()
+            .as_secs();
+        
+        Metric::new(
+            "agent_heartbeat".to_string(),
+            MetricValue::Integer(timestamp as i64),
+            Status::Ok,
+        )
+    }
+
    async fn handle_commands(&mut self) -> Result<()> {
        // Try to receive commands (non-blocking)
        match self.zmq_handler.try_receive_command() {
@@ -271,18 +307,34 @@ impl Agent {

    /// Handle systemd service control commands
    async fn handle_service_control(&mut self, service_name: &str, action: &ServiceAction) -> Result<()> {
-        let action_str = match action {
-            ServiceAction::Start => "start",
-            ServiceAction::Stop => "stop", 
-            ServiceAction::Status => "status",
+        let (action_str, is_user_action) = match action {
+            ServiceAction::Start => ("start", false),
+            ServiceAction::Stop => ("stop", false), 
+            ServiceAction::Status => ("status", false),
+            ServiceAction::UserStart => ("start", true),
+            ServiceAction::UserStop => ("stop", true),
        };

-        info!("Executing systemctl {} {}", action_str, service_name);
+        info!("Executing systemctl {} {} (user action: {})", action_str, service_name, is_user_action);
+
+        // Handle user-stopped service tracking before systemctl execution (stop only)
+        match action {
+            ServiceAction::UserStop => {
+                info!("Marking service '{}' as user-stopped", service_name);
+                if let Err(e) = self.service_tracker.mark_user_stopped(service_name) {
+                    error!("Failed to mark service as user-stopped: {}", e);
+                } else {
+                    // Sync to global tracker
+                    UserStoppedServiceTracker::update_global(&self.service_tracker);
+                }
+            }
+            _ => {}
+        }

        let output = tokio::process::Command::new("sudo")
            .arg("systemctl")
            .arg(action_str)
-            .arg(service_name)
+            .arg(format!("{}.service", service_name))
            .output()
            .await?;

@@ -291,6 +343,9 @@ impl Agent {
            if !output.stdout.is_empty() {
                debug!("stdout: {}", String::from_utf8_lossy(&output.stdout));
            }
+            
+            // Note: User-stopped flag will be cleared by systemd collector 
+            // when service actually reaches 'active' state, not here
        } else {
            let stderr = String::from_utf8_lossy(&output.stderr);
            error!("Service {} {} failed: {}", service_name, action_str, stderr);
@@ -298,7 +353,7 @@ impl Agent {
        }

        // Force refresh metrics after service control to update service status
-        if matches!(action, ServiceAction::Start | ServiceAction::Stop) {
+        if matches!(action, ServiceAction::Start | ServiceAction::Stop | ServiceAction::UserStart | ServiceAction::UserStop) {
            info!("Triggering immediate metric refresh after service control");
            if let Err(e) = self.collect_metrics_only().await {
                error!("Failed to refresh metrics after service control: {}", e);
@@ -310,4 +365,33 @@ impl Agent {
        Ok(())
    }

+    /// Check metrics for user-stopped services that are now active and clear their flags
+    fn clear_user_stopped_flags_for_active_services(&mut self, metrics: &[Metric]) {
+        for metric in metrics {
+            // Look for service status metrics that are active
+            if metric.name.starts_with("service_") && metric.name.ends_with("_status") {
+                if let MetricValue::String(status) = &metric.value {
+                    if status == "active" {
+                        // Extract service name from metric name (service_nginx_status -> nginx)
+                        let service_name = metric.name
+                            .strip_prefix("service_")
+                            .and_then(|s| s.strip_suffix("_status"))
+                            .unwrap_or("");
+                        
+                        if !service_name.is_empty() && UserStoppedServiceTracker::is_service_user_stopped(service_name) {
+                            info!("Service '{}' is now active - clearing user-stopped flag", service_name);
+                            if let Err(e) = self.service_tracker.clear_user_stopped(service_name) {
+                                error!("Failed to clear user-stopped flag for '{}': {}", service_name, e);
+                            } else {
+                                // Sync to global tracker
+                                UserStoppedServiceTracker::update_global(&self.service_tracker);
+                                debug!("Cleared user-stopped flag for service '{}'", service_name);
+                            }
+                        }
+                    }
+                }
+            }
+        }
+    }
+
 }
--- a/agent/src/collectors/backup.rs
+++ b/agent/src/collectors/backup.rs
@@ -140,6 +140,7 @@ impl Collector for BackupCollector {
                Status::Warning => "warning".to_string(),
                Status::Critical => "critical".to_string(),
                Status::Unknown => "unknown".to_string(),
+                Status::Offline => "offline".to_string(),
            }),
            status: overall_status,
            timestamp,
@@ -202,6 +203,7 @@ impl Collector for BackupCollector {
                    Status::Warning => "warning".to_string(),
                    Status::Critical => "critical".to_string(),
                    Status::Unknown => "unknown".to_string(),
+                    Status::Offline => "offline".to_string(),
                }),
                status: service_status,
                timestamp,
--- a/agent/src/collectors/nixos.rs
+++ b/agent/src/collectors/nixos.rs
@@ -37,6 +37,22 @@ impl NixOSCollector {
    }

    /// Get configuration hash from deployed nix store system
+    /// Get git commit hash from rebuild process
+    fn get_git_commit(&self) -> Result<String, Box<dyn std::error::Error>> {
+        let commit_file = "/var/lib/cm-dashboard/git-commit";
+        match std::fs::read_to_string(commit_file) {
+            Ok(content) => {
+                let commit_hash = content.trim();
+                if commit_hash.len() >= 7 {
+                    Ok(commit_hash.to_string())
+                } else {
+                    Err("Git commit hash too short".into())
+                }
+            }
+            Err(e) => Err(format!("Failed to read git commit file: {}", e).into())
+        }
+    }
+
    fn get_config_hash(&self) -> Result<String, Box<dyn std::error::Error>> {
        // Read the symlink target of /run/current-system to get nix store path
        let output = Command::new("readlink")
@@ -74,25 +90,25 @@ impl Collector for NixOSCollector {
        let mut metrics = Vec::new();
        let timestamp = chrono::Utc::now().timestamp() as u64;

-        // Collect NixOS build information (config hash)
-        match self.get_config_hash() {
-            Ok(config_hash) => {
+        // Collect git commit information (shows what's actually deployed)
+        match self.get_git_commit() {
+            Ok(git_commit) => {
                metrics.push(Metric {
                    name: "system_nixos_build".to_string(),
-                    value: MetricValue::String(config_hash),
+                    value: MetricValue::String(git_commit),
                    unit: None,
-                    description: Some("NixOS deployed configuration hash".to_string()),
+                    description: Some("Git commit hash of deployed configuration".to_string()),
                    status: Status::Ok,
                    timestamp,
                });
            }
            Err(e) => {
-                debug!("Failed to get config hash: {}", e);
+                debug!("Failed to get git commit: {}", e);
                metrics.push(Metric {
                    name: "system_nixos_build".to_string(),
                    value: MetricValue::String("unknown".to_string()),
                    unit: None,
-                    description: Some("NixOS config hash (failed to detect)".to_string()),
+                    description: Some("Git commit hash (failed to detect)".to_string()),
                    status: Status::Unknown,
                    timestamp,
                });
--- a/agent/src/collectors/systemd.rs
+++ b/agent/src/collectors/systemd.rs
@@ -8,6 +8,7 @@ use tracing::debug;

 use super::{Collector, CollectorError};
 use crate::config::SystemdConfig;
+use crate::service_tracker::UserStoppedServiceTracker;

 /// Systemd collector for monitoring systemd services
 pub struct SystemdCollector {
@@ -136,45 +137,84 @@ impl SystemdCollector {
    /// Auto-discover interesting services to monitor (internal version that doesn't update state)
    fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> {
        debug!("Starting systemd service discovery with status caching");
-        // Get all service unit files (includes services that have never been started)
-        let units_output = Command::new("systemctl")
+        
+        // First: Get all service unit files (includes services that have never been started)
+        let unit_files_output = Command::new("systemctl")
            .arg("list-unit-files")
            .arg("--type=service")
            .arg("--no-pager")
            .arg("--plain")
            .output()?;

-        if !units_output.status.success() {
-            return Err(anyhow::anyhow!("systemctl system command failed"));
+        if !unit_files_output.status.success() {
+            return Err(anyhow::anyhow!("systemctl list-unit-files command failed"));
        }

-        let units_str = String::from_utf8(units_output.stdout)?;
+        // Second: Get runtime status of all units
+        let units_status_output = Command::new("systemctl")
+            .arg("list-units")
+            .arg("--type=service")
+            .arg("--all")
+            .arg("--no-pager")
+            .arg("--plain")
+            .output()?;
+
+        if !units_status_output.status.success() {
+            return Err(anyhow::anyhow!("systemctl list-units command failed"));
+        }
+
+        let unit_files_str = String::from_utf8(unit_files_output.stdout)?;
+        let units_status_str = String::from_utf8(units_status_output.stdout)?;
        let mut services = Vec::new();

        // Use configuration instead of hardcoded values
        let excluded_services = &self.config.excluded_services;
        let service_name_filters = &self.config.service_name_filters;

-        // Parse all services and cache their status information
+        // Parse all service unit files to get complete service list
        let mut all_service_names = std::collections::HashSet::new();
-        let mut status_cache = std::collections::HashMap::new();
        
-        for line in units_str.lines() {
+        for line in unit_files_str.lines() {
            let fields: Vec<&str> = line.split_whitespace().collect();
            if fields.len() >= 2 && fields[0].ends_with(".service") {
                let service_name = fields[0].trim_end_matches(".service");
-                let unit_file_state = fields.get(1).unwrap_or(&"unknown").to_string();
+                all_service_names.insert(service_name.to_string());
+                debug!("Found service unit file: {}", service_name);
+            }
+        }
+
+        // Parse runtime status for all units
+        let mut status_cache = std::collections::HashMap::new();
+        for line in units_status_str.lines() {
+            let fields: Vec<&str> = line.split_whitespace().collect();
+            if fields.len() >= 4 && fields[0].ends_with(".service") {
+                let service_name = fields[0].trim_end_matches(".service");
                
-                // For unit files, we don't have runtime status info yet - will be fetched individually
-                // Set placeholder values for status cache (actual status will be fetched when collecting metrics)
+                // Extract status information from systemctl list-units output
+                let load_state = fields.get(1).unwrap_or(&"unknown").to_string();
+                let active_state = fields.get(2).unwrap_or(&"unknown").to_string();
+                let sub_state = fields.get(3).unwrap_or(&"unknown").to_string();
+
+                // Cache the status information
                status_cache.insert(service_name.to_string(), ServiceStatusInfo {
-                    load_state: "unknown".to_string(),  // Will be determined when we check individual status
-                    active_state: "unknown".to_string(), // Will be determined when we check individual status
-                    sub_state: unit_file_state.clone(), // Use unit file state as placeholder
+                    load_state: load_state.clone(),
+                    active_state: active_state.clone(),
+                    sub_state: sub_state.clone(),
                });

-                all_service_names.insert(service_name.to_string());
-                debug!("Found service unit file: {} (file_state: {})", service_name, unit_file_state);
+                debug!("Got runtime status for service: {} (load:{}, active:{}, sub:{})", service_name, load_state, active_state, sub_state);
+            }
+        }
+
+        // For services found in unit files but not in runtime status, set default inactive status
+        for service_name in &all_service_names {
+            if !status_cache.contains_key(service_name) {
+                status_cache.insert(service_name.to_string(), ServiceStatusInfo {
+                    load_state: "not-loaded".to_string(),
+                    active_state: "inactive".to_string(),
+                    sub_state: "dead".to_string(),
+                });
+                debug!("Service {} found in unit files but not runtime - marked as inactive", service_name);
            }
        }

@@ -314,13 +354,37 @@ impl SystemdCollector {
        Ok((active_status, detailed_info))
    }

-    /// Calculate service status
-    fn calculate_service_status(&self, active_status: &str) -> Status {
+    /// Calculate service status, taking user-stopped services into account
+    fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status {
        match active_status.to_lowercase().as_str() {
-            "active" => Status::Ok,
-            "inactive" | "dead" => Status::Warning,
+            "active" => {
+                // If service is now active and was marked as user-stopped, clear the flag
+                if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
+                    debug!("Service '{}' is now active - clearing user-stopped flag", service_name);
+                    // Note: We can't directly clear here because this is a read-only context
+                    // The agent will need to handle this differently
+                }
+                Status::Ok
+            },
+            "inactive" | "dead" => {
+                // Check if this service was stopped by user action
+                if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
+                    debug!("Service '{}' is inactive but marked as user-stopped - treating as OK", service_name);
+                    Status::Ok
+                } else {
+                    Status::Warning
+                }
+            },
            "failed" | "error" => Status::Critical,
-            "activating" | "deactivating" | "reloading" | "start" | "stop" | "restart" => Status::Pending,
+            "activating" | "deactivating" | "reloading" | "start" | "stop" | "restart" => {
+                // For user-stopped services that are transitioning, keep them as OK during transition
+                if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
+                    debug!("Service '{}' is transitioning but was user-stopped - treating as OK", service_name);
+                    Status::Ok
+                } else {
+                    Status::Pending
+                }
+            },
            _ => Status::Unknown,
        }
    }
@@ -441,7 +505,7 @@ impl Collector for SystemdCollector {
        for service in &monitored_services {
            match self.get_service_status(service) {
                Ok((active_status, _detailed_info)) => {
-                    let status = self.calculate_service_status(&active_status);
+                    let status = self.calculate_service_status(service, &active_status);

                    // Individual service status metric
                    metrics.push(Metric {
@@ -516,10 +580,8 @@ impl SystemdCollector {
        for (site_name, url) in &sites {
            match self.check_site_latency(url) {
                Ok(latency_ms) => {
-                    let status = if latency_ms < 500.0 {
+                    let status = if latency_ms < self.config.nginx_latency_critical_ms {
                        Status::Ok
-                    } else if latency_ms < 2000.0 {
-                        Status::Warning
                    } else {
                        Status::Critical
                    };
--- a/agent/src/communication/mod.rs
+++ b/agent/src/communication/mod.rs
@@ -66,8 +66,6 @@ impl ZmqHandler {
    }


-    /// Send heartbeat (placeholder for future use)
-
    /// Try to receive a command (non-blocking)
    pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
        match self.command_receiver.recv_bytes(zmq::DONTWAIT) {
@@ -113,4 +111,6 @@ pub enum ServiceAction {
    Start,
    Stop,
    Status,
+    UserStart,  // User-initiated start (clears user-stopped flag)
+    UserStop,   // User-initiated stop (marks as user-stopped)
 }
--- a/agent/src/config/mod.rs
+++ b/agent/src/config/mod.rs
@@ -17,6 +17,9 @@ pub struct AgentConfig {
    pub notifications: NotificationConfig,
    pub status_aggregation: HostStatusConfig,
    pub collection_interval_seconds: u64,
+    /// List of metric names to exclude from email notifications
+    #[serde(default)]
+    pub exclude_email_metrics: Vec<String>,
 }

 /// ZMQ communication configuration
@@ -25,8 +28,6 @@ pub struct ZmqConfig {
    pub publisher_port: u16,
    pub command_port: u16,
    pub bind_address: String,
-    pub timeout_ms: u64,
-    pub heartbeat_interval_ms: u64,
    pub transmission_interval_seconds: u64,
 }

@@ -108,6 +109,7 @@ pub struct SystemdConfig {
    pub nginx_check_interval_seconds: u64,
    pub http_timeout_seconds: u64,
    pub http_connect_timeout_seconds: u64,
+    pub nginx_latency_critical_ms: f32,
 }


--- a/agent/src/config/validation.rs
+++ b/agent/src/config/validation.rs
@@ -19,10 +19,6 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
        bail!("ZMQ bind address cannot be empty");
    }

-    if config.zmq.timeout_ms == 0 {
-        bail!("ZMQ timeout cannot be 0");
-    }
-
    // Validate collection interval
    if config.collection_interval_seconds == 0 {
        bail!("Collection interval cannot be 0");
@@ -83,6 +79,13 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
        }
    }

+    // Validate systemd configuration
+    if config.collectors.systemd.enabled {
+        if config.collectors.systemd.nginx_latency_critical_ms <= 0.0 {
+            bail!("Nginx latency critical threshold must be positive");
+        }
+    }
+
    // Validate SMTP configuration
    if config.notifications.enabled {
        if config.notifications.smtp_host.is_empty() {
--- a/agent/src/main.rs
+++ b/agent/src/main.rs
@@ -9,6 +9,7 @@ mod communication;
 mod config;
 mod metrics;
 mod notifications;
+mod service_tracker;
 mod status;

 use agent::Agent;
--- a/agent/src/service_tracker.rs
+++ b/agent/src/service_tracker.rs
@@ -0,0 +1,172 @@
+use anyhow::Result;
+use serde::{Deserialize, Serialize};
+use std::collections::HashSet;
+use std::fs;
+use std::path::Path;
+use std::sync::{Arc, Mutex, OnceLock};
+use tracing::{debug, info, warn};
+
+/// Shared instance for global access
+static GLOBAL_TRACKER: OnceLock<Arc<Mutex<UserStoppedServiceTracker>>> = OnceLock::new();
+
+/// Tracks services that have been stopped by user action
+/// These services should be treated as OK status instead of Warning
+#[derive(Debug)]
+pub struct UserStoppedServiceTracker {
+    /// Set of services stopped by user action
+    user_stopped_services: HashSet<String>,
+    /// Path to persistent storage file
+    storage_path: String,
+}
+
+/// Serializable data structure for persistence
+#[derive(Debug, Serialize, Deserialize)]
+struct UserStoppedData {
+    services: Vec<String>,
+}
+
+impl UserStoppedServiceTracker {
+    /// Create new tracker with default storage path
+    pub fn new() -> Self {
+        Self::with_storage_path("/var/lib/cm-dashboard/user-stopped-services.json")
+    }
+
+    /// Initialize global instance (called by agent)
+    pub fn init_global() -> Result<Self> {
+        let tracker = Self::new();
+        
+        // Set global instance
+        let global_instance = Arc::new(Mutex::new(tracker));
+        if GLOBAL_TRACKER.set(global_instance).is_err() {
+            warn!("Global service tracker was already initialized");
+        }
+        
+        // Return a new instance for the agent to use
+        Ok(Self::new())
+    }
+
+    /// Check if a service is user-stopped (global access for collectors)
+    pub fn is_service_user_stopped(service_name: &str) -> bool {
+        if let Some(global) = GLOBAL_TRACKER.get() {
+            if let Ok(tracker) = global.lock() {
+                tracker.is_user_stopped(service_name)
+            } else {
+                debug!("Failed to lock global service tracker");
+                false
+            }
+        } else {
+            debug!("Global service tracker not initialized");
+            false
+        }
+    }
+
+    /// Update global tracker (called by agent when tracker state changes)
+    pub fn update_global(updated_tracker: &UserStoppedServiceTracker) {
+        if let Some(global) = GLOBAL_TRACKER.get() {
+            if let Ok(mut tracker) = global.lock() {
+                tracker.user_stopped_services = updated_tracker.user_stopped_services.clone();
+            } else {
+                debug!("Failed to lock global service tracker for update");
+            }
+        } else {
+            debug!("Global service tracker not initialized for update");
+        }
+    }
+
+    /// Create new tracker with custom storage path
+    pub fn with_storage_path<P: AsRef<Path>>(storage_path: P) -> Self {
+        let storage_path = storage_path.as_ref().to_string_lossy().to_string();
+        let mut tracker = Self {
+            user_stopped_services: HashSet::new(),
+            storage_path,
+        };
+
+        // Load existing data from storage
+        if let Err(e) = tracker.load_from_storage() {
+            warn!("Failed to load user-stopped services from storage: {}", e);
+            info!("Starting with empty user-stopped services list");
+        }
+
+        tracker
+    }
+
+    /// Mark a service as user-stopped
+    pub fn mark_user_stopped(&mut self, service_name: &str) -> Result<()> {
+        info!("Marking service '{}' as user-stopped", service_name);
+        self.user_stopped_services.insert(service_name.to_string());
+        self.save_to_storage()?;
+        debug!("Service '{}' marked as user-stopped and saved to storage", service_name);
+        Ok(())
+    }
+
+    /// Clear user-stopped flag for a service (when user starts it)
+    pub fn clear_user_stopped(&mut self, service_name: &str) -> Result<()> {
+        if self.user_stopped_services.remove(service_name) {
+            info!("Cleared user-stopped flag for service '{}'", service_name);
+            self.save_to_storage()?;
+            debug!("Service '{}' user-stopped flag cleared and saved to storage", service_name);
+        } else {
+            debug!("Service '{}' was not marked as user-stopped", service_name);
+        }
+        Ok(())
+    }
+
+    /// Check if a service is marked as user-stopped
+    pub fn is_user_stopped(&self, service_name: &str) -> bool {
+        let is_stopped = self.user_stopped_services.contains(service_name);
+        debug!("Service '{}' user-stopped status: {}", service_name, is_stopped);
+        is_stopped
+    }
+
+
+    /// Save current state to persistent storage
+    fn save_to_storage(&self) -> Result<()> {
+        // Create parent directory if it doesn't exist
+        if let Some(parent_dir) = Path::new(&self.storage_path).parent() {
+            if !parent_dir.exists() {
+                fs::create_dir_all(parent_dir)?;
+                debug!("Created parent directory: {}", parent_dir.display());
+            }
+        }
+
+        let data = UserStoppedData {
+            services: self.user_stopped_services.iter().cloned().collect(),
+        };
+
+        let json_data = serde_json::to_string_pretty(&data)?;
+        fs::write(&self.storage_path, json_data)?;
+
+        debug!(
+            "Saved {} user-stopped services to {}",
+            data.services.len(),
+            self.storage_path
+        );
+        Ok(())
+    }
+
+    /// Load state from persistent storage
+    fn load_from_storage(&mut self) -> Result<()> {
+        if !Path::new(&self.storage_path).exists() {
+            debug!("Storage file {} does not exist, starting fresh", self.storage_path);
+            return Ok(());
+        }
+
+        let json_data = fs::read_to_string(&self.storage_path)?;
+        let data: UserStoppedData = serde_json::from_str(&json_data)?;
+
+        self.user_stopped_services = data.services.into_iter().collect();
+
+        info!(
+            "Loaded {} user-stopped services from {}",
+            self.user_stopped_services.len(),
+            self.storage_path
+        );
+
+        if !self.user_stopped_services.is_empty() {
+            debug!("User-stopped services: {:?}", self.user_stopped_services);
+        }
+
+        Ok(())
+    }
+}
+
--- a/agent/src/status/mod.rs
+++ b/agent/src/status/mod.rs
@@ -272,11 +272,13 @@ impl HostStatusManager {
    /// Check if a status change is significant enough for notification
    fn is_significant_change(&self, old_status: Status, new_status: Status) -> bool {
        match (old_status, new_status) {
-            // Always notify on problems
+            // Don't notify on transitions from Unknown (startup/restart scenario)
+            (Status::Unknown, _) => false,
+            // Always notify on problems (but not from Unknown)
            (_, Status::Warning) | (_, Status::Critical) => true,
            // Only notify on recovery if it's from a problem state to OK and all services are OK
            (Status::Warning | Status::Critical, Status::Ok) => self.current_host_status == Status::Ok,
-            // Don't notify on startup or other transitions
+            // Don't notify on other transitions
            _ => false,
        }
    }
@@ -374,8 +376,8 @@ impl HostStatusManager {
            details.push('\n');
        }

-        // Show recoveries
-        if !recovery_changes.is_empty() {
+        // Show recoveries only if host status is now OK (all services recovered)
+        if !recovery_changes.is_empty() && aggregated.host_status_final == Status::Ok {
            details.push_str(&format!("✅ RECOVERIES ({}):\n", recovery_changes.len()));
            for change in recovery_changes {
                details.push_str(&format!("  {}\n", change));
--- a/dashboard/Cargo.toml
+++ b/dashboard/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "cm-dashboard"
-version = "0.1.31"
+version = "0.1.58"
 edition = "2021"

 [dependencies]
@@ -18,4 +18,5 @@ tracing-subscriber = { workspace = true }
 ratatui = { workspace = true }
 crossterm = { workspace = true }
 toml = { workspace = true }
-gethostname = { workspace = true }
+gethostname = { workspace = true }
+wake-on-lan = "0.2"
--- a/dashboard/src/app.rs
+++ b/dashboard/src/app.rs
@@ -22,7 +22,7 @@ pub struct Dashboard {
    terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>,
    headless: bool,
    initial_commands_sent: std::collections::HashSet<String>,
-    _config: DashboardConfig,
+    config: DashboardConfig,
 }

 impl Dashboard {
@@ -67,8 +67,8 @@ impl Dashboard {
            }
        };

-        // Connect to predefined hosts from configuration
-        let hosts = config.hosts.predefined_hosts.clone();
+        // Connect to configured hosts from configuration
+        let hosts: Vec<String> = config.hosts.keys().cloned().collect();

        // Try to connect to hosts but don't fail if none are available
        match zmq_consumer.connect_to_predefined_hosts(&hosts).await {
@@ -133,7 +133,7 @@ impl Dashboard {
            terminal,
            headless,
            initial_commands_sent: std::collections::HashSet::new(),
-            _config: config,
+            config,
        })
    }

@@ -247,7 +247,7 @@ impl Dashboard {
                    if let Some(ref mut tui_app) = self.tui_app {
                        let connected_hosts = self
                            .metric_store
-                            .get_connected_hosts(Duration::from_secs(30));
+                            .get_connected_hosts(Duration::from_secs(self.config.zmq.heartbeat_timeout_seconds));
                        
                        
                        tui_app.update_hosts(connected_hosts);
@@ -295,18 +295,18 @@ impl Dashboard {
    async fn execute_ui_command(&self, command: UiCommand) -> Result<()> {
        match command {
            UiCommand::ServiceStart { hostname, service_name } => {
-                info!("Sending start command for service {} on {}", service_name, hostname);
+                info!("Sending user start command for service {} on {}", service_name, hostname);
                let agent_command = AgentCommand::ServiceControl {
                    service_name: service_name.clone(),
-                    action: ServiceAction::Start,
+                    action: ServiceAction::UserStart,
                };
                self.zmq_command_sender.send_command(&hostname, agent_command).await?;
            }
            UiCommand::ServiceStop { hostname, service_name } => {
-                info!("Sending stop command for service {} on {}", service_name, hostname);
+                info!("Sending user stop command for service {} on {}", service_name, hostname);
                let agent_command = AgentCommand::ServiceControl {
                    service_name: service_name.clone(),
-                    action: ServiceAction::Stop,
+                    action: ServiceAction::UserStop,
                };
                self.zmq_command_sender.send_command(&hostname, agent_command).await?;
            }
--- a/dashboard/src/communication/mod.rs
+++ b/dashboard/src/communication/mod.rs
@@ -36,6 +36,8 @@ pub enum ServiceAction {
    Start,
    Stop,
    Status,
+    UserStart,  // User-initiated start (clears user-stopped flag)
+    UserStop,   // User-initiated stop (marks as user-stopped)
 }

 /// ZMQ consumer for receiving metrics from agents
@@ -139,9 +141,9 @@ impl ZmqConsumer {
        }
    }

-    /// Receive metrics from any connected agent (non-blocking)
+    /// Receive metrics from any connected agent (with timeout)
    pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> {
-        match self.subscriber.recv_bytes(zmq::DONTWAIT) {
+        match self.subscriber.recv_bytes(0) {
            Ok(data) => {
                debug!("Received {} bytes from ZMQ", data.len());

--- a/dashboard/src/config/mod.rs
+++ b/dashboard/src/config/mod.rs
@@ -6,21 +6,29 @@ use std::path::Path;
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct DashboardConfig {
    pub zmq: ZmqConfig,
-    pub hosts: HostsConfig,
+    pub hosts: std::collections::HashMap<String, HostDetails>,
    pub system: SystemConfig,
    pub ssh: SshConfig,
+    pub service_logs: std::collections::HashMap<String, Vec<ServiceLogConfig>>,
 }

 /// ZMQ consumer configuration
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct ZmqConfig {
    pub subscriber_ports: Vec<u16>,
+    /// Heartbeat timeout in seconds - hosts considered offline if no heartbeat received within this time
+    #[serde(default = "default_heartbeat_timeout_seconds")]
+    pub heartbeat_timeout_seconds: u64,
 }

-/// Hosts configuration
+fn default_heartbeat_timeout_seconds() -> u64 {
+    10 // Default to 10 seconds - allows for multiple missed heartbeats
+}
+
+/// Individual host configuration details
 #[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct HostsConfig {
-    pub predefined_hosts: Vec<String>,
+pub struct HostDetails {
+    pub mac_address: Option<String>,
 }

 /// System configuration
@@ -39,6 +47,13 @@ pub struct SshConfig {
    pub rebuild_alias: String,
 }

+/// Service log file configuration per host
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ServiceLogConfig {
+    pub service_name: String,
+    pub log_file_path: String,
+}
+
 impl DashboardConfig {
    pub fn load_from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
        let path = path.as_ref();
@@ -60,8 +75,3 @@ impl Default for ZmqConfig {
    }
 }

-impl Default for HostsConfig {
-    fn default() -> Self {
-        panic!("Dashboard configuration must be loaded from file - no hardcoded defaults allowed")
-    }
-}
--- a/dashboard/src/main.rs
+++ b/dashboard/src/main.rs
@@ -12,10 +12,6 @@ mod ui;

 use app::Dashboard;

-/// Get hardcoded version
-fn get_version() -> &'static str {
-    "v0.1.31"
-}

 /// Check if running inside tmux session
 fn check_tmux_session() {
@@ -42,7 +38,7 @@ fn check_tmux_session() {
 #[derive(Parser)]
 #[command(name = "cm-dashboard")]
 #[command(about = "CM Dashboard TUI with individual metric consumption")]
-#[command(version = get_version())]
+#[command(version)]
 struct Cli {
    /// Increase logging verbosity (-v, -vv)
    #[arg(short, long, action = clap::ArgAction::Count)]
--- a/dashboard/src/metrics/store.rs
+++ b/dashboard/src/metrics/store.rs
@@ -11,8 +11,8 @@ pub struct MetricStore {
    current_metrics: HashMap<String, HashMap<String, Metric>>,
    /// Historical metrics for trending
    historical_metrics: HashMap<String, Vec<MetricDataPoint>>,
-    /// Last update timestamp per host
-    last_update: HashMap<String, Instant>,
+    /// Last heartbeat timestamp per host
+    last_heartbeat: HashMap<String, Instant>,
    /// Configuration
    max_metrics_per_host: usize,
    history_retention: Duration,
@@ -23,7 +23,7 @@ impl MetricStore {
        Self {
            current_metrics: HashMap::new(),
            historical_metrics: HashMap::new(),
-            last_update: HashMap::new(),
+            last_heartbeat: HashMap::new(),
            max_metrics_per_host,
            history_retention: Duration::from_secs(history_retention_hours * 3600),
        }
@@ -56,10 +56,13 @@ impl MetricStore {

            // Add to history
            host_history.push(MetricDataPoint { received_at: now });
-        }

-        // Update last update timestamp
-        self.last_update.insert(hostname.to_string(), now);
+            // Track heartbeat metrics for connectivity detection
+            if metric_name == "agent_heartbeat" {
+                self.last_heartbeat.insert(hostname.to_string(), now);
+                debug!("Updated heartbeat for host {}", hostname);
+            }
+        }

        // Get metrics count before cleanup
        let metrics_count = host_metrics.len();
@@ -88,16 +91,18 @@ impl MetricStore {
        }
    }

-    /// Get connected hosts (hosts with recent updates)
+    /// Get connected hosts (hosts with recent heartbeats)
    pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> {
        let now = Instant::now();

-        self.last_update
+        self.last_heartbeat
            .iter()
-            .filter_map(|(hostname, &last_update)| {
-                if now.duration_since(last_update) <= timeout {
+            .filter_map(|(hostname, &last_heartbeat)| {
+                if now.duration_since(last_heartbeat) <= timeout {
                    Some(hostname.clone())
                } else {
+                    debug!("Host {} considered offline - last heartbeat was {:?} ago", 
+                           hostname, now.duration_since(last_heartbeat));
                    None
                }
            })
--- a/dashboard/src/ui/mod.rs
+++ b/dashboard/src/ui/mod.rs
@@ -1,5 +1,5 @@
 use anyhow::Result;
-use crossterm::event::{Event, KeyCode, KeyModifiers};
+use crossterm::event::{Event, KeyCode};
 use ratatui::{
    layout::{Constraint, Direction, Layout, Rect},
    style::Style,
@@ -9,6 +9,7 @@ use ratatui::{
 use std::collections::HashMap;
 use std::time::Instant;
 use tracing::info;
+use wake_on_lan::MagicPacket;

 pub mod theme;
 pub mod widgets;
@@ -37,15 +38,6 @@ pub enum CommandType {
 }

 /// Panel types for focus management
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum PanelType {
-    System,
-    Services,
-    Backup,
-}
-
-impl PanelType {
-}

 /// Widget states for a specific host
 #[derive(Clone)]
@@ -92,8 +84,6 @@ pub struct TuiApp {
    available_hosts: Vec<String>,
    /// Host index for navigation
    host_index: usize,
-    /// Currently focused panel
-    focused_panel: PanelType,
    /// Should quit application
    should_quit: bool,
    /// Track if user manually navigated away from localhost
@@ -104,16 +94,25 @@ pub struct TuiApp {

 impl TuiApp {
    pub fn new(config: DashboardConfig) -> Self {
-        Self {
+        let mut app = Self {
            host_widgets: HashMap::new(),
            current_host: None,
-            available_hosts: Vec::new(),
+            available_hosts: config.hosts.keys().cloned().collect(),
            host_index: 0,
-            focused_panel: PanelType::System, // Start with System panel focused
            should_quit: false,
            user_navigated_away: false,
            config,
+        };
+        
+        // Sort predefined hosts
+        app.available_hosts.sort();
+        
+        // Initialize with first host if available
+        if !app.available_hosts.is_empty() {
+            app.current_host = Some(app.available_hosts[0].clone());
        }
+        
+        app
    }

    /// Get or create host widgets for the given hostname
@@ -132,31 +131,31 @@ impl TuiApp {
            // Only update widgets if we have metrics for this host
            let all_metrics = metric_store.get_metrics_for_host(&hostname);
            if !all_metrics.is_empty() {
-                // Get metrics first while hostname is borrowed
-                let cpu_metrics: Vec<&Metric> = all_metrics
-                    .iter()
-                    .filter(|m| {
-                        m.name.starts_with("cpu_")
-                            || m.name.contains("c_state_")
-                            || m.name.starts_with("process_top_")
-                    })
-                    .copied()
-                    .collect();
-                let memory_metrics: Vec<&Metric> = all_metrics
-                    .iter()
-                    .filter(|m| m.name.starts_with("memory_") || m.name.starts_with("disk_tmp_"))
-                    .copied()
-                    .collect();
-                let service_metrics: Vec<&Metric> = all_metrics
-                    .iter()
-                    .filter(|m| m.name.starts_with("service_"))
-                    .copied()
-                    .collect();
-                let all_backup_metrics: Vec<&Metric> = all_metrics
-                    .iter()
-                    .filter(|m| m.name.starts_with("backup_"))
-                    .copied()
-                    .collect();
+                // Single pass metric categorization for better performance
+                let mut cpu_metrics = Vec::new();
+                let mut memory_metrics = Vec::new();
+                let mut service_metrics = Vec::new();
+                let mut backup_metrics = Vec::new();
+                let mut nixos_metrics = Vec::new();
+                let mut disk_metrics = Vec::new();
+                
+                for metric in all_metrics {
+                    if metric.name.starts_with("cpu_") 
+                        || metric.name.contains("c_state_") 
+                        || metric.name.starts_with("process_top_") {
+                        cpu_metrics.push(metric);
+                    } else if metric.name.starts_with("memory_") || metric.name.starts_with("disk_tmp_") {
+                        memory_metrics.push(metric);
+                    } else if metric.name.starts_with("service_") {
+                        service_metrics.push(metric);
+                    } else if metric.name.starts_with("backup_") {
+                        backup_metrics.push(metric);
+                    } else if metric.name == "system_nixos_build" || metric.name == "system_active_users" || metric.name == "agent_version" {
+                        nixos_metrics.push(metric);
+                    } else if metric.name.starts_with("disk_") {
+                        disk_metrics.push(metric);
+                    }
+                }

                // Clear completed transitions first
                self.clear_completed_transitions(&hostname, &service_metrics);
@@ -167,21 +166,7 @@ impl TuiApp {
                // Collect all system metrics (CPU, memory, NixOS, disk/storage)
                let mut system_metrics = cpu_metrics;
                system_metrics.extend(memory_metrics);
-                
-                // Add NixOS metrics - using exact matching for build display fix
-                let nixos_metrics: Vec<&Metric> = all_metrics
-                    .iter()
-                    .filter(|m| m.name == "system_nixos_build" || m.name == "system_active_users" || m.name == "agent_version")
-                    .copied()
-                    .collect();
                system_metrics.extend(nixos_metrics);
-                
-                // Add disk/storage metrics
-                let disk_metrics: Vec<&Metric> = all_metrics
-                    .iter()
-                    .filter(|m| m.name.starts_with("disk_"))
-                    .copied()
-                    .collect();
                system_metrics.extend(disk_metrics);

                host_widgets.system_widget.update_from_metrics(&system_metrics);
@@ -190,7 +175,7 @@ impl TuiApp {
                    .update_from_metrics(&service_metrics);
                host_widgets
                    .backup_widget
-                    .update_from_metrics(&all_backup_metrics);
+                    .update_from_metrics(&backup_metrics);

                host_widgets.last_update = Some(Instant::now());
            }
@@ -198,21 +183,28 @@ impl TuiApp {
    }

    /// Update available hosts with localhost prioritization
-    pub fn update_hosts(&mut self, hosts: Vec<String>) {
-        // Sort hosts alphabetically
-        let mut sorted_hosts = hosts.clone();
+    pub fn update_hosts(&mut self, discovered_hosts: Vec<String>) {
+        // Start with configured hosts (always visible)
+        let mut all_hosts: Vec<String> = self.config.hosts.keys().cloned().collect();
+        
+        // Add any discovered hosts that aren't already configured
+        for host in discovered_hosts {
+            if !all_hosts.contains(&host) {
+                all_hosts.push(host);
+            }
+        }
        
        // Keep hosts that have pending transitions even if they're offline
        for (hostname, host_widgets) in &self.host_widgets {
            if !host_widgets.pending_service_transitions.is_empty() {
-                if !sorted_hosts.contains(hostname) {
-                    sorted_hosts.push(hostname.clone());
+                if !all_hosts.contains(hostname) {
+                    all_hosts.push(hostname.clone());
                }
            }
        }
        
-        sorted_hosts.sort();
-        self.available_hosts = sorted_hosts;
+        all_hosts.sort();
+        self.available_hosts = all_hosts;
        
        // Get the current hostname (localhost) for auto-selection
        let localhost = gethostname::gethostname().to_string_lossy().to_string();
@@ -256,69 +248,141 @@ impl TuiApp {
                KeyCode::Char('r') => {
                    // System rebuild command - works on any panel for current host
                    if let Some(hostname) = self.current_host.clone() {
-                        // Launch tmux popup with SSH using config values
-                        let ssh_command = format!(
-                            "ssh -tt {}@{} 'bash -ic {}'",
+                        // Create command that shows logo, rebuilds, and waits for user input
+                        let logo_and_rebuild = format!(
+                            "bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {}\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
+                            hostname,
                            self.config.ssh.rebuild_user,
                            hostname,
                            self.config.ssh.rebuild_alias
                        );
+                        
                        std::process::Command::new("tmux")
-                            .arg("display-popup")
-                            .arg(&ssh_command)
+                            .arg("split-window")
+                            .arg("-v")
+                            .arg("-p")
+                            .arg("30")
+                            .arg(&logo_and_rebuild)
                            .spawn()
                            .ok(); // Ignore errors, tmux will handle them
                    }
                }
                KeyCode::Char('s') => {
-                    if self.focused_panel == PanelType::Services {
-                        // Service start command
-                        if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
-                            if self.start_command(&hostname, CommandType::ServiceStart, service_name.clone()) {
-                                return Ok(Some(UiCommand::ServiceStart { hostname, service_name }));
-                            }
+                    // Service start command
+                    if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
+                        if self.start_command(&hostname, CommandType::ServiceStart, service_name.clone()) {
+                            return Ok(Some(UiCommand::ServiceStart { hostname, service_name }));
                        }
                    }
                }
                KeyCode::Char('S') => {
-                    if self.focused_panel == PanelType::Services {
-                        // Service stop command
-                        if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
-                            if self.start_command(&hostname, CommandType::ServiceStop, service_name.clone()) {
-                                return Ok(Some(UiCommand::ServiceStop { hostname, service_name }));
+                    // Service stop command
+                    if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
+                        if self.start_command(&hostname, CommandType::ServiceStop, service_name.clone()) {
+                            return Ok(Some(UiCommand::ServiceStop { hostname, service_name }));
+                        }
+                    }
+                }
+                KeyCode::Char('J') => {
+                    // Show service logs via journalctl in tmux split window
+                    if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
+                        let journalctl_command = format!(
+                            "bash -c \"ssh -tt {}@{} 'sudo journalctl -u {}.service -f --no-pager -n 50'; exit\"",
+                            self.config.ssh.rebuild_user,
+                            hostname,
+                            service_name
+                        );
+                        
+                        std::process::Command::new("tmux")
+                            .arg("split-window")
+                            .arg("-v")
+                            .arg("-p")
+                            .arg("30")
+                            .arg(&journalctl_command)
+                            .spawn()
+                            .ok(); // Ignore errors, tmux will handle them
+                    }
+                }
+                KeyCode::Char('L') => {
+                    // Show custom service log file in tmux split window
+                    if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
+                        // Check if this service has a custom log file configured
+                        if let Some(host_logs) = self.config.service_logs.get(&hostname) {
+                            if let Some(log_config) = host_logs.iter().find(|config| config.service_name == service_name) {
+                                let tail_command = format!(
+                                    "bash -c \"ssh -tt {}@{} 'sudo tail -n 50 -f {}'; exit\"",
+                                    self.config.ssh.rebuild_user,
+                                    hostname,
+                                    log_config.log_file_path
+                                );
+                                
+                                std::process::Command::new("tmux")
+                                    .arg("split-window")
+                                    .arg("-v")
+                                    .arg("-p")
+                                    .arg("30")
+                                    .arg(&tail_command)
+                                    .spawn()
+                                    .ok(); // Ignore errors, tmux will handle them
                            }
                        }
                    }
                }
                KeyCode::Char('b') => {
-                    if self.focused_panel == PanelType::Backup {
-                        // Trigger backup
-                        if let Some(hostname) = self.current_host.clone() {
-                            self.start_command(&hostname, CommandType::BackupTrigger, hostname.clone());
-                            return Ok(Some(UiCommand::TriggerBackup { hostname }));
+                    // Trigger backup
+                    if let Some(hostname) = self.current_host.clone() {
+                        self.start_command(&hostname, CommandType::BackupTrigger, hostname.clone());
+                        return Ok(Some(UiCommand::TriggerBackup { hostname }));
+                    }
+                }
+                KeyCode::Char('w') => {
+                    // Wake on LAN for offline hosts
+                    if let Some(hostname) = self.current_host.clone() {
+                        // Check if host has MAC address configured
+                        if let Some(host_details) = self.config.hosts.get(&hostname) {
+                            if let Some(mac_address) = &host_details.mac_address {
+                                // Parse MAC address and send WoL packet
+                                let mac_bytes = Self::parse_mac_address(mac_address);
+                                match mac_bytes {
+                                    Ok(mac) => {
+                                        match MagicPacket::new(&mac).send() {
+                                            Ok(_) => {
+                                                info!("WakeOnLAN packet sent successfully to {} ({})", hostname, mac_address);
+                                            }
+                                            Err(e) => {
+                                                tracing::error!("Failed to send WakeOnLAN packet to {}: {}", hostname, e);
+                                            }
+                                        }
+                                    }
+                                    Err(_) => {
+                                        tracing::error!("Invalid MAC address format for {}: {}", hostname, mac_address);
+                                    }
+                                }
+                            }
                        }
                    }
                }
                KeyCode::Tab => {
-                    if key.modifiers.contains(KeyModifiers::SHIFT) {
-                        // Shift+Tab cycles through panels
-                        self.next_panel();
-                    } else {
-                        // Tab cycles to next host
-                        self.navigate_host(1);
+                    // Tab cycles to next host
+                    self.navigate_host(1);
+                }
+                KeyCode::Up | KeyCode::Char('k') => {
+                    // Move service selection up
+                    if let Some(hostname) = self.current_host.clone() {
+                        let host_widgets = self.get_or_create_host_widgets(&hostname);
+                        host_widgets.services_widget.select_previous();
                    }
                }
-                KeyCode::BackTab => {
-                    // BackTab (Shift+Tab on some terminals) also cycles panels
-                    self.next_panel();
-                }
-                KeyCode::Up => {
-                    // Scroll up in focused panel
-                    self.scroll_focused_panel(-1);
-                }
-                KeyCode::Down => {
-                    // Scroll down in focused panel
-                    self.scroll_focused_panel(1);
+                KeyCode::Down | KeyCode::Char('j') => {
+                    // Move service selection down
+                    if let Some(hostname) = self.current_host.clone() {
+                        let total_services = {
+                            let host_widgets = self.get_or_create_host_widgets(&hostname);
+                            host_widgets.services_widget.get_total_services_count()
+                        };
+                        let host_widgets = self.get_or_create_host_widgets(&hostname);
+                        host_widgets.services_widget.select_next(total_services);
+                    }
                }
                _ => {}
            }
@@ -359,25 +423,6 @@ impl TuiApp {
    }


-    /// Switch to next panel (Shift+Tab) - only cycles through visible panels
-    pub fn next_panel(&mut self) {
-        let visible_panels = self.get_visible_panels();
-        if visible_panels.len() <= 1 {
-            return; // Can't switch if only one or no panels visible
-        }
-        
-        // Find current panel index in visible panels
-        if let Some(current_index) = visible_panels.iter().position(|&p| p == self.focused_panel) {
-            // Move to next visible panel
-            let next_index = (current_index + 1) % visible_panels.len();
-            self.focused_panel = visible_panels[next_index];
-        } else {
-            // Current panel not visible, switch to first visible panel
-            self.focused_panel = visible_panels[0];
-        }
-        
-        info!("Switched to panel: {:?}", self.focused_panel);
-    }



@@ -478,61 +523,8 @@ impl TuiApp {
        }
    }

-    /// Scroll the focused panel up or down
-    pub fn scroll_focused_panel(&mut self, direction: i32) {
-        if let Some(hostname) = self.current_host.clone() {
-            let focused_panel = self.focused_panel; // Get the value before borrowing
-            let host_widgets = self.get_or_create_host_widgets(&hostname);
-            
-            match focused_panel {
-                PanelType::System => {
-                    if direction > 0 {
-                        host_widgets.system_scroll_offset = host_widgets.system_scroll_offset.saturating_add(1);
-                    } else {
-                        host_widgets.system_scroll_offset = host_widgets.system_scroll_offset.saturating_sub(1);
-                    }
-                    info!("System panel scroll offset: {}", host_widgets.system_scroll_offset);
-                }
-                PanelType::Services => {
-                    // For services panel, Up/Down moves selection cursor, not scroll
-                    let total_services = host_widgets.services_widget.get_total_services_count();
-                    
-                    if direction > 0 {
-                        host_widgets.services_widget.select_next(total_services);
-                        info!("Services selection moved down");
-                    } else {
-                        host_widgets.services_widget.select_previous();
-                        info!("Services selection moved up");
-                    }
-                }
-                PanelType::Backup => {
-                    if direction > 0 {
-                        host_widgets.backup_scroll_offset = host_widgets.backup_scroll_offset.saturating_add(1);
-                    } else {
-                        host_widgets.backup_scroll_offset = host_widgets.backup_scroll_offset.saturating_sub(1);
-                    }
-                    info!("Backup panel scroll offset: {}", host_widgets.backup_scroll_offset);
-                }
-            }
-        }
-    }


-    /// Get list of currently visible panels
-    fn get_visible_panels(&self) -> Vec<PanelType> {
-        let mut visible_panels = vec![PanelType::System, PanelType::Services];
-        
-        // Check if backup panel should be shown
-        if let Some(hostname) = &self.current_host {
-            if let Some(host_widgets) = self.host_widgets.get(hostname) {
-                if host_widgets.backup_widget.has_data() {
-                    visible_panels.push(PanelType::Backup);
-                }
-            }
-        }
-        
-        visible_panels
-    }

    /// Render the dashboard (real btop-style multi-panel layout)
    pub fn render(&mut self, frame: &mut Frame, metric_store: &MetricStore) {
@@ -601,7 +593,7 @@ impl TuiApp {

        // Render services widget for current host
        if let Some(hostname) = self.current_host.clone() {
-            let is_focused = self.focused_panel == PanelType::Services;
+            let is_focused = true; // Always show service selection
            let (scroll_offset, pending_transitions) = {
                let host_widgets = self.get_or_create_host_widgets(&hostname);
                (host_widgets.services_scroll_offset, host_widgets.pending_service_transitions.clone())
@@ -625,48 +617,90 @@ impl TuiApp {

        if self.available_hosts.is_empty() {
            let title_text = "cm-dashboard • no hosts discovered";
-            let title = Paragraph::new(title_text).style(Typography::title());
+            let title = Paragraph::new(title_text)
+                .style(Style::default().fg(Theme::background()).bg(Theme::status_color(Status::Unknown)));
            frame.render_widget(title, area);
            return;
        }

-        // Create spans for each host with status indicators
-        let mut spans = vec![Span::styled("cm-dashboard • ", Typography::title())];
+        // Calculate worst-case status across all hosts (excluding offline)
+        let mut worst_status = Status::Ok;
+        for host in &self.available_hosts {
+            let host_status = self.calculate_host_status(host, metric_store);
+            // Don't include offline hosts in status aggregation
+            if host_status != Status::Offline {
+                worst_status = Status::aggregate(&[worst_status, host_status]);
+            }
+        }

+        // Use the worst status color as background
+        let background_color = Theme::status_color(worst_status);
+
+        // Split the title bar into left and right sections
+        let chunks = Layout::default()
+            .direction(Direction::Horizontal)
+            .constraints([Constraint::Length(15), Constraint::Min(0)])
+            .split(area);
+
+        // Left side: "cm-dashboard" text
+        let left_span = Span::styled(
+            " cm-dashboard", 
+            Style::default().fg(Theme::background()).bg(background_color).add_modifier(Modifier::BOLD)
+        );
+        let left_title = Paragraph::new(Line::from(vec![left_span]))
+            .style(Style::default().bg(background_color));
+        frame.render_widget(left_title, chunks[0]);
+
+        // Right side: hosts with status indicators
+        let mut host_spans = Vec::new();
+        
        for (i, host) in self.available_hosts.iter().enumerate() {
            if i > 0 {
-                spans.push(Span::styled(" ", Typography::title()));
+                host_spans.push(Span::styled(
+                    " ", 
+                    Style::default().fg(Theme::background()).bg(background_color)
+                ));
            }

            // Always show normal status icon based on metrics (no command status at host level)
            let host_status = self.calculate_host_status(host, metric_store);
-            let (status_icon, status_color) = (StatusIcons::get_icon(host_status), Theme::status_color(host_status));
+            let status_icon = StatusIcons::get_icon(host_status);

-            // Add status icon
-            spans.push(Span::styled(
+            // Add status icon with background color as foreground against status background
+            host_spans.push(Span::styled(
                format!("{} ", status_icon),
-                Style::default().fg(status_color),
+                Style::default().fg(Theme::background()).bg(background_color),
            ));

            if Some(host) == self.current_host.as_ref() {
-                // Selected host in bold bright white
-                spans.push(Span::styled(
+                // Selected host in bold background color against status background
+                host_spans.push(Span::styled(
                    host.clone(),
-                    Typography::title().add_modifier(Modifier::BOLD),
+                    Style::default()
+                        .fg(Theme::background())
+                        .bg(background_color)
+                        .add_modifier(Modifier::BOLD),
                ));
            } else {
-                // Other hosts in normal style with status color
-                spans.push(Span::styled(
+                // Other hosts in normal background color against status background
+                host_spans.push(Span::styled(
                    host.clone(),
-                    Style::default().fg(status_color),
+                    Style::default().fg(Theme::background()).bg(background_color),
                ));
            }
        }

-        let title_line = Line::from(spans);
-        let title = Paragraph::new(vec![title_line]);
+        // Add right padding
+        host_spans.push(Span::styled(
+            " ", 
+            Style::default().fg(Theme::background()).bg(background_color)
+        ));

-        frame.render_widget(title, area);
+        let host_line = Line::from(host_spans);
+        let host_title = Paragraph::new(vec![host_line])
+            .style(Style::default().bg(background_color))
+            .alignment(ratatui::layout::Alignment::Right);
+        frame.render_widget(host_title, chunks[1]);
    }

    /// Calculate overall status for a host based on its metrics
@@ -674,7 +708,7 @@ impl TuiApp {
        let metrics = metric_store.get_metrics_for_host(hostname);

        if metrics.is_empty() {
-            return Status::Unknown;
+            return Status::Offline;
        }

        // First check if we have the aggregated host status summary from the agent
@@ -694,7 +728,8 @@ impl TuiApp {
                Status::Warning => has_warning = true,
                Status::Pending => has_pending = true,
                Status::Ok => ok_count += 1,
-                Status::Unknown => {} // Ignore unknown for aggregation
+                Status::Unknown => {}, // Ignore unknown for aggregation
+                Status::Offline => {}, // Ignore offline for aggregation
            }
        }

@@ -729,39 +764,22 @@ impl TuiApp {
        let mut shortcuts = Vec::new();
        
        // Global shortcuts
-        shortcuts.push("Tab: Switch Host".to_string());
-        shortcuts.push("Shift+Tab: Switch Panel".to_string());
-        
-        // Scroll shortcuts (always available)
-        shortcuts.push("↑↓: Scroll".to_string());
-        
-        // Global rebuild shortcut (works on any panel)
-        shortcuts.push("R: Rebuild Host".to_string());
-        
-        // Panel-specific shortcuts
-        match self.focused_panel {
-            PanelType::Services => {
-                shortcuts.push("S: Start".to_string());
-                shortcuts.push("Shift+S: Stop".to_string());
-            }
-            PanelType::Backup => {
-                shortcuts.push("B: Trigger Backup".to_string());
-            }
-            _ => {}
-        }
+        shortcuts.push("Tab: Host".to_string());
+        shortcuts.push("↑↓/jk: Select".to_string());
+        shortcuts.push("r: Rebuild".to_string());
+        shortcuts.push("s/S: Start/Stop".to_string());
+        shortcuts.push("J: Logs".to_string());
+        shortcuts.push("L: Custom".to_string());
+        shortcuts.push("w: Wake".to_string());
        
        // Always show quit
-        shortcuts.push("Q: Quit".to_string());
+        shortcuts.push("q: Quit".to_string());
        
        shortcuts
    }

    fn render_system_panel(&mut self, frame: &mut Frame, area: Rect, _metric_store: &MetricStore) {
-        let system_block = if self.focused_panel == PanelType::System {
-            Components::focused_widget_block("system")
-        } else {
-            Components::widget_block("system")
-        };
+        let system_block = Components::widget_block("system");
        let inner_area = system_block.inner(area);
        frame.render_widget(system_block, area);
        // Get current host widgets, create if none exist
@@ -771,16 +789,12 @@ impl TuiApp {
                host_widgets.system_scroll_offset
            };
            let host_widgets = self.get_or_create_host_widgets(&hostname);
-            host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset);
+            host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname);
        }
    }

    fn render_backup_panel(&mut self, frame: &mut Frame, area: Rect) {
-        let backup_block = if self.focused_panel == PanelType::Backup {
-            Components::focused_widget_block("backup")
-        } else {
-            Components::widget_block("backup")
-        };
+        let backup_block = Components::widget_block("backup");
        let inner_area = backup_block.inner(area);
        frame.render_widget(backup_block, area);

@@ -795,5 +809,20 @@ impl TuiApp {
        }
    }

+    /// Parse MAC address string (e.g., "AA:BB:CC:DD:EE:FF") to [u8; 6]
+    fn parse_mac_address(mac_str: &str) -> Result<[u8; 6], &'static str> {
+        let parts: Vec<&str> = mac_str.split(':').collect();
+        if parts.len() != 6 {
+            return Err("MAC address must have 6 parts separated by colons");
+        }

+        let mut mac = [0u8; 6];
+        for (i, part) in parts.iter().enumerate() {
+            match u8::from_str_radix(part, 16) {
+                Ok(byte) => mac[i] = byte,
+                Err(_) => return Err("Invalid hexadecimal byte in MAC address"),
+            }
+        }
+        Ok(mac)
+    }
 }
--- a/dashboard/src/ui/theme.rs
+++ b/dashboard/src/ui/theme.rs
@@ -147,6 +147,7 @@ impl Theme {
            Status::Warning => Self::warning(),
            Status::Critical => Self::error(),
            Status::Unknown => Self::muted_text(),
+            Status::Offline => Self::muted_text(), // Dark gray for offline
        }
    }

@@ -244,8 +245,9 @@ impl StatusIcons {
            Status::Ok => "●",
            Status::Pending => "◉", // Hollow circle for pending
            Status::Warning => "◐",
-            Status::Critical => "◯",
+            Status::Critical => "!",
            Status::Unknown => "?",
+            Status::Offline => "○", // Empty circle for offline
        }
    }

@@ -258,6 +260,7 @@ impl StatusIcons {
            Status::Warning => Theme::warning(),    // Yellow
            Status::Critical => Theme::error(),     // Red
            Status::Unknown => Theme::muted_text(), // Gray
+            Status::Offline => Theme::muted_text(), // Dark gray for offline
        };

        vec![
@@ -289,27 +292,9 @@ impl Components {
            )
    }

-    /// Widget block with focus indicator (blue border)
-    pub fn focused_widget_block(title: &str) -> Block<'_> {
-        Block::default()
-            .title(title)
-            .borders(Borders::ALL)
-            .style(Style::default().fg(Theme::highlight()).bg(Theme::background())) // Blue border for focus
-            .title_style(
-                Style::default()
-                    .fg(Theme::highlight()) // Blue title for focus
-                    .bg(Theme::background()),
-            )
-    }
 }

 impl Typography {
-    /// Main title style (dashboard header)
-    pub fn title() -> Style {
-        Style::default()
-            .fg(Theme::primary_text())
-            .bg(Theme::background())
-    }

    /// Widget title style (panel headers) - bold bright white
    pub fn widget_title() -> Style {
--- a/dashboard/src/ui/widgets/services.rs
+++ b/dashboard/src/ui/widgets/services.rs
@@ -113,13 +113,10 @@ impl ServicesWidget {
            name.to_string()
        };

-        // Parent services always show active/inactive status
+        // Parent services always show actual systemctl status
        let status_str = match info.widget_status {
-            Status::Ok => "active".to_string(),
            Status::Pending => "pending".to_string(),
-            Status::Warning => "inactive".to_string(),
-            Status::Critical => "failed".to_string(),
-            Status::Unknown => "unknown".to_string(),
+            _ => info.status.clone(), // Use actual status from agent (active/inactive/failed)
        };

        format!(
@@ -149,6 +146,7 @@ impl ServicesWidget {
            Status::Warning => Theme::warning(),
            Status::Critical => Theme::error(),
            Status::Unknown => Theme::muted_text(),
+            Status::Offline => Theme::muted_text(),
        };
        
        (icon.to_string(), info.status.clone(), status_color)
@@ -443,11 +441,7 @@ impl ServicesWidget {

    /// Render with focus, scroll, and pending transitions for visual feedback
    pub fn render_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) {
-        let services_block = if is_focused {
-            Components::focused_widget_block("services")
-        } else {
-            Components::widget_block("services")
-        };
+        let services_block = Components::widget_block("services");
        let inner_area = services_block.inner(area);
        frame.render_widget(services_block, area);

@@ -583,14 +577,16 @@ impl ServicesWidget {
                    }
                };
                
-                // Apply selection highlighting to parent services only, preserving status icon color
+                // Apply selection highlighting to parent services only, making icons background color when selected
                // Only show selection when Services panel is focused
                // Show selection highlighting even when transitional icons are present
                if is_selected && !*is_sub && is_focused {
                    for (i, span) in spans.iter_mut().enumerate() {
                        if i == 0 {
-                            // First span is the status icon - preserve its color
-                            span.style = span.style.bg(Theme::highlight());
+                            // First span is the status icon - use background color for visibility against blue selection
+                            span.style = span.style
+                                .bg(Theme::highlight())
+                                .fg(Theme::background());
                        } else {
                            // Other spans (text) get full selection highlighting
                            span.style = span.style
--- a/dashboard/src/ui/widgets/system.rs
+++ b/dashboard/src/ui/widgets/system.rs
@@ -230,9 +230,30 @@ impl SystemWidget {

    /// Extract pool name from disk metric name
    fn extract_pool_name(&self, metric_name: &str) -> Option<String> {
-        if let Some(captures) = metric_name.strip_prefix("disk_") {
-            if let Some(pos) = captures.find('_') {
-                return Some(captures[..pos].to_string());
+        // Pattern: disk_{pool_name}_{drive_name}_{metric_type}
+        // Since pool_name can contain underscores, work backwards from known metric suffixes
+        if metric_name.starts_with("disk_") {
+            // First try drive-specific metrics that have device names
+            if let Some(suffix_pos) = metric_name.rfind("_temperature")
+                .or_else(|| metric_name.rfind("_wear_percent"))
+                .or_else(|| metric_name.rfind("_health")) {
+                // Find the second-to-last underscore to get pool name
+                let before_suffix = &metric_name[..suffix_pos];
+                if let Some(drive_start) = before_suffix.rfind('_') {
+                    return Some(metric_name[5..drive_start].to_string()); // Skip "disk_"
+                }
+            }
+            // For pool-level metrics (usage_percent, used_gb, total_gb), take everything before the metric suffix
+            else if let Some(suffix_pos) = metric_name.rfind("_usage_percent")
+                .or_else(|| metric_name.rfind("_used_gb"))
+                .or_else(|| metric_name.rfind("_total_gb")) {
+                return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
+            }
+            // Fallback to old behavior for unknown patterns
+            else if let Some(captures) = metric_name.strip_prefix("disk_") {
+                if let Some(pos) = captures.find('_') {
+                    return Some(captures[..pos].to_string());
+                }
            }
        }
        None
@@ -240,10 +261,18 @@ impl SystemWidget {

    /// Extract drive name from disk metric name  
    fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
-        // Pattern: disk_pool_drive_metric
-        let parts: Vec<&str> = metric_name.split('_').collect();
-        if parts.len() >= 3 && parts[0] == "disk" {
-            return Some(parts[2].to_string());
+        // Pattern: disk_{pool_name}_{drive_name}_{metric_type}
+        // Since pool_name can contain underscores, work backwards from known metric suffixes
+        if metric_name.starts_with("disk_") {
+            if let Some(suffix_pos) = metric_name.rfind("_temperature")
+                .or_else(|| metric_name.rfind("_wear_percent"))
+                .or_else(|| metric_name.rfind("_health")) {
+                // Find the second-to-last underscore to get the drive name
+                let before_suffix = &metric_name[..suffix_pos];
+                if let Some(drive_start) = before_suffix.rfind('_') {
+                    return Some(before_suffix[drive_start + 1..].to_string());
+                }
+            }
        }
        None
    }
@@ -410,12 +439,12 @@ impl Widget for SystemWidget {

 impl SystemWidget {
    /// Render with scroll offset support
-    pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize) {
+    pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str) {
        let mut lines = Vec::new();

        // NixOS section
        lines.push(Line::from(vec![
-            Span::styled("NixOS:", Typography::widget_title())
+            Span::styled(format!("NixOS {}:", hostname), Typography::widget_title())
        ]));
        
        let build_text = self.nixos_build.as_deref().unwrap_or("unknown");
--- a/hardcoded_values_removed.md
+++ b/hardcoded_values_removed.md
@@ -1,88 +0,0 @@
-# Hardcoded Values Removed - Configuration Summary
-
-## ✅ All Hardcoded Values Converted to Configuration
-
-### **1. SystemD Nginx Check Interval**
- **Before**: `nginx_check_interval_seconds: 30` (hardcoded)
- **After**: `nginx_check_interval_seconds: config.nginx_check_interval_seconds`
- **NixOS Config**: `nginx_check_interval_seconds = 30;`
-
-### **2. ZMQ Transmission Interval**  
- **Before**: `Duration::from_secs(1)` (hardcoded)
- **After**: `Duration::from_secs(self.config.zmq.transmission_interval_seconds)`
- **NixOS Config**: `transmission_interval_seconds = 1;`
-
-### **3. HTTP Timeouts in SystemD Collector**
- **Before**: 
-  ```rust
-  .timeout(Duration::from_secs(10))
-  .connect_timeout(Duration::from_secs(10))
-  ```
- **After**:
-  ```rust
-  .timeout(Duration::from_secs(self.config.http_timeout_seconds))
-  .connect_timeout(Duration::from_secs(self.config.http_connect_timeout_seconds))
-  ```
- **NixOS Config**: 
-  ```nix
-  http_timeout_seconds = 10;
-  http_connect_timeout_seconds = 10;
-  ```
-
-## **Configuration Structure Changes**
-
-### **SystemdConfig** (agent/src/config/mod.rs)
-```rust
-pub struct SystemdConfig {
-    // ... existing fields ...
-    pub nginx_check_interval_seconds: u64,      // NEW
-    pub http_timeout_seconds: u64,              // NEW
-    pub http_connect_timeout_seconds: u64,      // NEW
-}
-```
-
-### **ZmqConfig** (agent/src/config/mod.rs)
-```rust
-pub struct ZmqConfig {
-    // ... existing fields ...
-    pub transmission_interval_seconds: u64,     // NEW
-}
-```
-
-## **NixOS Configuration Updates**
-
-### **ZMQ Section** (hosts/common/cm-dashboard.nix)
-```nix
-zmq = {
-  # ... existing fields ...
-  transmission_interval_seconds = 1;           # NEW
-};
-```
-
-### **SystemD Section** (hosts/common/cm-dashboard.nix)
-```nix
-systemd = {
-  # ... existing fields ...
-  nginx_check_interval_seconds = 30;           # NEW  
-  http_timeout_seconds = 10;                   # NEW
-  http_connect_timeout_seconds = 10;           # NEW
-};
-```
-
-## **Benefits**
-
-✅ **No hardcoded values** - All timing/timeout values configurable  
-✅ **Consistent configuration** - Everything follows NixOS config pattern  
-✅ **Environment-specific tuning** - Can adjust timeouts per deployment  
-✅ **Maintainability** - No magic numbers scattered in code  
-✅ **Testing flexibility** - Can configure different values for testing  
-
-## **Runtime Behavior**
-
-All previously hardcoded values now respect configuration:
- **Nginx latency checks**: Every 30s (configurable)
- **ZMQ transmission**: Every 1s (configurable)  
- **HTTP requests**: 10s timeout (configurable)
- **HTTP connections**: 10s timeout (configurable)
-
-The codebase is now **100% configuration-driven** with no hardcoded timing values.
--- a/shared/Cargo.toml
+++ b/shared/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "cm-dashboard-shared"
-version = "0.1.31"
+version = "0.1.58"
 edition = "2021"

 [dependencies]
--- a/shared/src/metrics.rs
+++ b/shared/src/metrics.rs
@@ -87,6 +87,7 @@ pub enum Status {
    Warning,
    Critical,
    Unknown,
+    Offline,
 }

 impl Status {
@@ -190,6 +191,16 @@ impl HysteresisThresholds {
                    Status::Ok
                }
            }
+            Status::Offline => {
+                // Host coming back online, use normal thresholds like first measurement
+                if value >= self.critical_high {
+                    Status::Critical
+                } else if value >= self.warning_high {
+                    Status::Warning
+                } else {
+                    Status::Ok
+                }
+            }
        }
    }
 }
--- a/test_intervals.sh
+++ b/test_intervals.sh
@@ -1,42 +0,0 @@
-#!/bin/bash
-
-# Test script to verify collector intervals are working correctly
-# Expected behavior:
-# - CPU/Memory: Every 2 seconds
-# - Systemd/Network: Every 10 seconds  
-# - Backup/NixOS: Every 60 seconds
-# - Disk: Every 300 seconds (5 minutes)
-
-echo "=== Testing Collector Interval Implementation ==="
-echo "Expected intervals from NixOS config:"
-echo "  CPU: 2s, Memory: 2s"
-echo "  Systemd: 10s, Network: 10s" 
-echo "  Backup: 60s, NixOS: 60s"
-echo "  Disk: 300s (5m)"
-echo ""
-
-# Note: Cannot run actual agent without proper config, but we can verify the code logic
-echo "✅ Code Implementation Status:"
-echo "  - TimedCollector struct with interval tracking: IMPLEMENTED"
-echo "  - Individual collector intervals from config: IMPLEMENTED"  
-echo "  - collect_metrics_timed() respects intervals: IMPLEMENTED"
-echo "  - Debug logging shows interval compliance: IMPLEMENTED"
-echo ""
-
-echo "🔍 Key Implementation Details:"
-echo "  - MetricCollectionManager now tracks last_collection time per collector"
-echo "  - Each collector gets Duration::from_secs(config.{collector}.interval_seconds)"
-echo "  - Only collectors with elapsed >= interval are called"
-echo "  - Debug logs show actual collection with interval info"
-echo ""
-
-echo "📊 Expected Runtime Behavior:"
-echo "  At 0s:  All collectors run (startup)"
-echo "  At 2s:  CPU, Memory run"
-echo "  At 4s:  CPU, Memory run"  
-echo "  At 10s: CPU, Memory, Systemd, Network run"
-echo "  At 60s: CPU, Memory, Systemd, Network, Backup, NixOS run"
-echo "  At 300s: All collectors run including Disk"
-echo ""
-
-echo "✅ CONCLUSION: Codebase now follows NixOS configuration intervals correctly!"
--- a/test_tmux_check.rs
+++ b/test_tmux_check.rs
@@ -1,32 +0,0 @@
-#!/usr/bin/env rust-script
-
-use std::process;
-
-/// Check if running inside tmux session
-fn check_tmux_session() {
-    // Check for TMUX environment variable which is set when inside a tmux session
-    if std::env::var("TMUX").is_err() {
-        eprintln!("╭─────────────────────────────────────────────────────────────╮");
-        eprintln!("│                        ⚠️  TMUX REQUIRED                      │");
-        eprintln!("├─────────────────────────────────────────────────────────────┤");
-        eprintln!("│  CM Dashboard must be run inside a tmux session for proper   │");
-        eprintln!("│  terminal handling and remote operation functionality.       │");
-        eprintln!("│                                                             │");
-        eprintln!("│  Please start a tmux session first:                        │");
-        eprintln!("│    tmux new-session -d -s dashboard cm-dashboard           │");
-        eprintln!("│    tmux attach-session -t dashboard                        │");
-        eprintln!("│                                                             │");
-        eprintln!("│  Or simply:                                                 │");
-        eprintln!("│    tmux                                                     │");
-        eprintln!("│    cm-dashboard                                             │");
-        eprintln!("╰─────────────────────────────────────────────────────────────╯");
-        process::exit(1);
-    } else {
-        println!("✅ Running inside tmux session - OK");
-    }
-}
-
-fn main() {
-    println!("Testing tmux check function...");
-    check_tmux_session();
-}
--- a/test_tmux_simulation.sh
+++ b/test_tmux_simulation.sh
@@ -1,53 +0,0 @@
-#!/bin/bash
-
-echo "=== TMUX Check Implementation Test ==="
-echo ""
-
-echo "📋 Testing tmux check logic:"
-echo ""
-
-echo "1. Current environment:"
-if [ -n "$TMUX" ]; then
-    echo "   ✅ Running inside tmux session"
-    echo "   TMUX variable: $TMUX"
-else
-    echo "   ❌ NOT running inside tmux session"
-    echo "   TMUX variable: (not set)"
-fi
-echo ""
-
-echo "2. Simulating dashboard tmux check logic:"
-echo ""
-
-# Simulate the Rust check logic
-if [ -z "$TMUX" ]; then
-    echo "   Dashboard would show:"
-    echo "   ╭─────────────────────────────────────────────────────────────╮"
-    echo "   │                        ⚠️  TMUX REQUIRED                      │"
-    echo "   ├─────────────────────────────────────────────────────────────┤"
-    echo "   │  CM Dashboard must be run inside a tmux session for proper   │"
-    echo "   │  terminal handling and remote operation functionality.       │"
-    echo "   │                                                             │"
-    echo "   │  Please start a tmux session first:                        │"
-    echo "   │    tmux new-session -d -s dashboard cm-dashboard           │"
-    echo "   │    tmux attach-session -t dashboard                        │"
-    echo "   │                                                             │"
-    echo "   │  Or simply:                                                 │"
-    echo "   │    tmux                                                     │"
-    echo "   │    cm-dashboard                                             │"
-    echo "   ╰─────────────────────────────────────────────────────────────╯"
-    echo "   Then exit with code 1"
-else
-    echo "   ✅ Dashboard tmux check would PASS - continuing normally"
-fi
-echo ""
-
-echo "3. Implementation status:"
-echo "   ✅ check_tmux_session() function added to dashboard/src/main.rs"
-echo "   ✅ Called early in main() but only for TUI mode (not headless)"
-echo "   ✅ Uses std::env::var(\"TMUX\") to detect tmux session"
-echo "   ✅ Shows helpful error message with usage instructions"
-echo "   ✅ Exits with code 1 if not in tmux"
-echo ""
-
-echo "✅ TMUX check implementation complete!"
Author	SHA1	Message	Date
Christoffer Martinsson	f874264e13	Optimize dashboard performance for responsive Tab key navigation All checks were successful Build and Release / build-and-release (push) Successful in 1m32s Details - Replace 6 separate filter operations with single-pass metric categorization in update_metrics - Reduce CPU overhead from 6x to 1x work per metric update cycle - Fix Tab key sluggishness caused by competing expensive filtering operations - Maintain exact same functionality with significantly better performance - Improve UI responsiveness for host switching and navigation - Bump version to 0.1.58	2025-11-06 11:18:39 +01:00
Christoffer Martinsson	5f6e47ece5	Implement heartbeat-based host connectivity detection All checks were successful Build and Release / build-and-release (push) Successful in 2m8s Details - Add agent_heartbeat metric to agent transmission for reliable host detection - Update dashboard to track heartbeat timestamps per host instead of general metrics - Add configurable heartbeat_timeout_seconds to dashboard ZMQ config (default 10s) - Remove unused timeout_ms from agent config and revert to non-blocking command reception - Remove unused heartbeat_interval_ms from agent configuration - Host disconnect detection now uses dedicated heartbeat metrics for improved reliability - Bump version to 0.1.57	2025-11-06 11:04:01 +01:00
Christoffer Martinsson	0e7cf24dbb	Add exclude_email_metrics configuration option All checks were successful Build and Release / build-and-release (push) Successful in 2m34s Details - Add exclude_email_metrics field to AgentConfig for filtering email notifications - Metrics matching excluded names skip notification processing but still appear in dashboard - Optional field with serde(default) for backward compatibility - Bump version to 0.1.56	2025-11-06 10:31:25 +01:00
Christoffer Martinsson	2d080a2f51	Implement WakeOnLAN functionality and offline status handling All checks were successful Build and Release / build-and-release (push) Successful in 1m35s Details - Add WakeOnLAN support for offline hosts using 'w' key - Configure MAC addresses for all infrastructure hosts - Implement Status::Offline for disconnected hosts - Exclude offline hosts from status aggregation to prevent false alerts - Update versions to 0.1.55	2025-10-31 09:28:31 +01:00
Christoffer Martinsson	6179bd51a7	Implement WakeOnLAN functionality with simplified configuration All checks were successful Build and Release / build-and-release (push) Successful in 2m32s Details - Add Status::Offline enum variant for disconnected hosts - All configured hosts now always visible showing offline status when disconnected - Add WakeOnLAN support using wake-on-lan Rust crate - Implement w key binding to wake offline hosts with MAC addresses - Simplify configuration to single [hosts] section with MAC addresses only - Change critical status icon from ◯ to ! for better visibility - Add proper MAC address parsing and error handling - Silent WakeOnLAN operation with logging for success/failure Configuration format: [hosts] hostname = { mac_address = "AA:BB:CC:DD:EE:FF" }	2025-10-31 09:03:01 +01:00
Christoffer Martinsson	57de4c366a	Bump version to 0.1.53 All checks were successful Build and Release / build-and-release (push) Successful in 2m10s Details	2025-10-30 17:00:39 +01:00
Christoffer Martinsson	e18778e962	Fix string syntax error in rebuild command - Replace raw string with escaped string to fix compilation error - Maintain same functionality with proper string formatting	2025-10-30 16:59:41 +01:00
Christoffer Martinsson	e4469a0ebf	Replace tmux popups with split windows for better log navigation Some checks failed Build and Release / build-and-release (push) Failing after 1m9s Details - Change J/L log commands from popups to split windows for scrolling support - Change rebuild command from popup to split window with consistent 30% height - Add auto-close behavior with bash -c "command; exit" wrapper for logs - Add "press any key to close" prompt with visual separators for rebuild - Enable proper tmux copy mode and navigation in all split windows Users can now scroll through logs, copy text, and resize windows while maintaining clean auto-close behavior for all operations.	2025-10-30 15:30:58 +01:00
Christoffer Martinsson	6fedf4c7fc	Add sudo support and line count to log viewing commands All checks were successful Build and Release / build-and-release (push) Successful in 1m12s Details - Add sudo to journalctl command for proper systemd log access - Add sudo to tail command for system log file access - Add -n 50 to tail command to match journalctl behavior - Both J and L keys now show last 50 lines before following Ensures consistent behavior and proper permissions for all log viewing.	2025-10-30 13:26:04 +01:00
Christoffer Martinsson	3f6dffa66e	Add custom service log file support with L key All checks were successful Build and Release / build-and-release (push) Successful in 2m7s Details - Add ServiceLogConfig structure for per-host service log paths - Implement L key handler for custom log file viewing via tmux popup - Update dashboard config to support service_logs HashMap - Add tail -f command execution over SSH for real-time log streaming - Update status line to show L: Custom shortcut - Document configuration format in CLAUDE.md Each service can now have custom log file paths configured per host, accessible via L key with same tmux popup interface as journalctl.	2025-10-30 13:12:36 +01:00
Christoffer Martinsson	1b64fbde3d	Fix tmux popup title flag for service logs feature All checks were successful Build and Release / build-and-release (push) Successful in 1m47s Details Fix journalctl popup that was failing with 'can't find session' error: Issue Resolution: - Change tmux display-popup flag from -t to -T for setting popup title - -t flag was incorrectly trying to target a session named 'Logs: servicename' - -T flag correctly sets the popup window title The J key (Shift+j) service logs feature now works properly, opening an 80% tmux popup with journalctl -f for real-time log viewing. Bump version to v0.1.49	2025-10-30 12:42:58 +01:00
Christoffer Martinsson	4f4c3b0d6e	Improve notification behavior during startup and recovery All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Fix notification issues for better operational experience: Startup Notification Suppression: - Suppress notifications for transitions from Status::Unknown during agent/server startup - Prevents notification spam when services transition from Unknown to Warning/Critical on restart - Only real status changes (not initial discovery) trigger notifications - Maintains alerting for actual service state changes after startup Recovery Notification Refinement: - Recovery notifications only sent when ALL services reach OK status - Individual service recoveries suppressed if other services still have problems - Ensures recovery notifications indicate complete system health restoration - Prevents premature celebration when partial recoveries occur Result: Clean startup experience without false alerts and meaningful recovery notifications that truly indicate full system health restoration. Bump version to v0.1.48	2025-10-30 12:35:23 +01:00
Christoffer Martinsson	bd20f0cae1	Fix user-stopped flag timing and service transition handling All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Correct user-stopped service behavior during startup transitions: User-Stopped Flag Timing Fix: - Clear user-stopped flag only when service actually becomes active, not when start command succeeds - Remove premature flag clearing from service control handler - Add automatic flag clearing when service status metrics show active state - Services retain user-stopped status during activating/transitioning states Service Transition Handling: - User-stopped services in activating state now report Status::OK instead of Status::Pending - Prevents host warnings during legitimate service startup transitions - Maintains accurate status reporting throughout service lifecycle - Failed service starts preserve user-stopped flags correctly Journalctl Popup Fix: - Fix terminal corruption when using J key for service logs - Correct command quoting to prevent tmux popup interference - Stable popup display without dashboard interface corruption Result: Clean service startup experience with no false warnings and proper user-stopped tracking throughout the entire service lifecycle. Bump version to v0.1.47	2025-10-30 12:05:54 +01:00
Christoffer Martinsson	11c9a5f9d2	Add service logs feature and improve tmux popup sizing All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details New Features: - Add journalctl service logs viewer via Shift+J key - Opens tmux popup with real-time log streaming using journalctl -f - Shows last 50 lines and follows new log entries for selected service - Popup titled 'Logs: service.service' for clear context Improvements: - Increase tmux popup size to 80% width and height for better readability - Applies to both rebuild (R) and logs (J) popups - Compact status line text to fit new J: Logs shortcut - Updated documentation with new key binding Navigation Updates: - J: Show service logs (journalctl in tmux popup) - Status line: Tab: Host • ↑↓/jk: Select • r: Rebuild • s/S: Start/Stop • J: Logs • q: Quit Bump version to v0.1.46	2025-10-30 11:21:14 +01:00
Christoffer Martinsson	aeae60146d	Fix user-stopped service display and flag timing issues All checks were successful Build and Release / build-and-release (push) Successful in 2m10s Details Improve user-stopped service tracking behavior: Service Display Fix: - Services widget now shows actual systemctl status (active/inactive) - Use info.status instead of hardcoded text based on widget_status - User-stopped services correctly display 'inactive' with green OK icon - Prevents misleading 'active' display for stopped services User-Stopped Flag Timing Fix: - Clear user-stopped flag AFTER successful service start, not when command sent - Prevents warnings during service startup transition period - Service remains Status::OK during 'activating' state for user-stopped services - Flag only cleared when systemctl start command actually succeeds - Failed start attempts preserve user-stopped flag Result: Clean service state tracking with accurate display and no false alerts during intentional user operations. Bump version to v0.1.45	2025-10-30 11:11:39 +01:00
Christoffer Martinsson	a82c81e8e3	Fix service control by adding .service suffix to systemctl commands All checks were successful Build and Release / build-and-release (push) Successful in 2m8s Details Service stop/start operations were failing because systemctl commands were missing the .service suffix. This caused the new user-stopped tracking feature to mark services but not actually control them. Changes: - Add .service suffix to systemctl commands in service control handler - Matches pattern used throughout systemd collector - Fixes service start/stop functionality via dashboard Clean up legacy documentation: - Remove outdated TODO.md, AGENTS.md, and test files - Update CLAUDE.md with current architecture and rules only - Comprehensive README.md rewrite with technical documentation - Document user-stopped service tracking feature Bump version to v0.1.44	2025-10-30 11:00:36 +01:00
Christoffer Martinsson	c56e9d7be2	Implement user-stopped service tracking system All checks were successful Build and Release / build-and-release (push) Successful in 2m34s Details Add comprehensive tracking for services stopped via dashboard to prevent false alerts when users intentionally stop services. Features: - User-stopped services report Status::Ok instead of Warning - Persistent storage survives agent restarts - Dashboard sends UserStart/UserStop commands - Agent tracks and syncs user-stopped state globally - Systemd collector respects user-stopped flags Implementation: - New service_tracker module with persistent JSON storage - Enhanced ServiceAction enum with UserStart/UserStop variants - Global singleton tracker accessible by collectors - Service status logic updated to check user-stopped flag - Dashboard version now uses CARGO_PKG_VERSION automatically Bump version to v0.1.43	2025-10-30 10:42:56 +01:00
Christoffer Martinsson	c8f800a1e5	Implement git commit hash tracking for build display All checks were successful Build and Release / build-and-release (push) Successful in 1m24s Details - Add get_git_commit() method to read /var/lib/cm-dashboard/git-commit - Replace NixOS build version with actual git commit hash - Show deployed commit hash as 'Build:' value for accurate tracking - Enable verification of which exact commit is deployed per host - Update version to 0.1.42	2025-10-29 15:29:02 +01:00
Christoffer Martinsson	fc6b3424cf	Add hostname to NixOS title and make dashboard title bold All checks were successful Build and Release / build-and-release (push) Successful in 2m46s Details - Change system panel title from 'NixOS:' to 'NixOS hostname:' - Make main dashboard title 'cm-dashboard' bold in top bar - Remove unused Typography::title() function to fix warnings - Update SystemWidget::render_with_scroll to accept hostname parameter - Update version to 0.1.41 in all Cargo.toml files and dashboard code	2025-10-29 14:24:17 +01:00
Christoffer Martinsson	35e06c6734	Implement clean NixOS rebuild tmux popup All checks were successful Build and Release / build-and-release (push) Successful in 1m22s Details - Replace complex ASCII logo with simple text header - Remove extra blank lines for compact display - Left-align text for clean appearance - Add spacing after target line for readability - Simplify heredoc format for better maintainability	2025-10-28 23:59:05 +01:00
Christoffer Martinsson	783d233319	Add CM Dashboard ASCII logo to rebuild tmux popup All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details - Display branded CM Dashboard ASCII logo in green when rebuild starts - Shows logo immediately when tmux popup opens for better UX - Includes rebuild target hostname and visual separator - Enhances rebuild process with professional branding - Bump version to v0.1.39	2025-10-28 23:12:09 +01:00
Christoffer Martinsson	6509a2b91a	Make nginx site latency thresholds configurable and simplify status logic All checks were successful Build and Release / build-and-release (push) Successful in 4m25s Details - Replace hardcoded 500ms/2000ms thresholds with configurable nginx_latency_critical_ms - Simplify status logic to only OK or Critical (no Warning status) - Add validation for nginx latency threshold configuration - Re-enable nginx site collection with configurable thresholds - Resolves issue where sites showed critical at 2000ms despite 30s timeout setting - Bump version to v0.1.38	2025-10-28 21:24:34 +01:00
Christoffer Martinsson	52f8c40b86	Fix title bar layout constraints to prevent text disappearing All checks were successful Build and Release / build-and-release (push) Successful in 2m12s Details - Set fixed width (15 chars) for left side to prevent chunk collapse - Resolves issue where "cm-dashboard" text would flash and disappear - Ensures consistent visibility of title text in dynamic status bar - Bump version to v0.1.37	2025-10-28 18:56:12 +01:00
Christoffer Martinsson	a86b5ba8f9	Implement dynamic status-based title bar with infrastructure health indicator All checks were successful Build and Release / build-and-release (push) Successful in 1m15s Details - Title bar background now dynamically changes based on worst-case status across all hosts - Green: all OK, Yellow: warnings present, Red: critical issues, Blue: pending, Gray: unknown - Provides immediate visual feedback of overall infrastructure health - Added 1-character padding on both sides of title bar - Maintains dark text for visibility against all status background colors - Bump version to v0.1.36	2025-10-28 18:47:02 +01:00
Christoffer Martinsson	1b964545be	Fix storage display parsing and improve title bar UI All checks were successful Build and Release / build-and-release (push) Successful in 1m14s Details - Fix disk drive name extraction for mount points with underscores (e.g., /mnt/steampool) - Replace confusing "1" and "2" drive names with proper device names like "sda1", "sda2" - Update title bar with blue background and dark text styling - Right-align host list in title bar while keeping "cm-dashboard" on left - Bump version to v0.1.35	2025-10-28 18:32:12 +01:00
Christoffer Martinsson	97aa1708c2	Improve service selection UI and help text All checks were successful Build and Release / build-and-release (push) Successful in 2m11s Details - Fix service icons to use background color when selected for better visibility against blue selection background - Combine start/stop service help text entries into single "s/S: Start/Stop Service" - Change help text keys to lowercase (r: Rebuild Host, q: Quit) - Bump version to v0.1.34	2025-10-28 18:17:15 +01:00
Christoffer Martinsson	d12689f3b5	Update CLAUDE.md to reflect simplified navigation and current status Updated documentation to reflect major UI improvements: - Documented simplified navigation system (no more panel switching) - Updated current status to October 28, 2025 with v0.1.33 - Described complete service discovery and visibility features - Added vi-style j/k navigation documentation - Removed outdated panel-focused navigation descriptions - Updated visual feedback documentation for transitional icons - Consolidated service discovery achievements and current working state	2025-10-28 17:00:40 +01:00
Christoffer Martinsson	f22e3ee95e	Simplify navigation and add vi-style keys All checks were successful Build and Release / build-and-release (push) Successful in 1m12s Details Major UI simplification and navigation improvements: Changes: - Removed panel selection concept entirely (no more Shift+Tab) - Service selection always visible with blue highlighting - Up/Down arrows now directly control service selection - Added j/k vi-style navigation keys as alternatives to arrow keys - Removed panel focus borders - all panels look uniform - Service commands (s/S) work without panel focus requirements - Updated keyboard shortcuts to reflect simplified navigation Navigation: - Tab: Switch hosts - ↑↓/jk: Select service (always works) - R: Rebuild host - s: Start service - S: Stop service - q: Quit The interface is now much simpler and more intuitive with direct service control.	2025-10-28 16:31:35 +01:00
Christoffer Martinsson	e890c5e810	Fix service status detection with combined discovery and status approach All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Enhanced service discovery to properly show status for all services: Changes: - Use systemctl list-unit-files for complete service discovery (finds all services) - Use systemctl list-units --all for batch runtime status fetching - Combine both datasets to get comprehensive service list with correct status - Services found in unit-files but not runtime are marked as inactive (Warning status) - Eliminates 'unknown' status issue while maintaining complete service visibility Now inactive services show as Warning (yellow ◐) and active services show as Ok (green ●) instead of all services showing as unknown (? icon).	2025-10-28 15:56:47 +01:00