Fix service control by adding .service suffix to systemctl commands
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s

Service stop/start operations were failing because systemctl commands
were missing the .service suffix. This caused the new user-stopped
tracking feature to mark services but not actually control them.

Changes:
- Add .service suffix to systemctl commands in service control handler
- Matches pattern used throughout systemd collector
- Fixes service start/stop functionality via dashboard

Clean up legacy documentation:
- Remove outdated TODO.md, AGENTS.md, and test files
- Update CLAUDE.md with current architecture and rules only
- Comprehensive README.md rewrite with technical documentation
- Document user-stopped service tracking feature

Bump version to v0.1.44
This commit is contained in:
Christoffer Martinsson 2025-10-30 11:00:36 +01:00
parent c56e9d7be2
commit a82c81e8e3
12 changed files with 332 additions and 927 deletions

View File

@ -1,3 +0,0 @@
# Agent Guide
Agents working in this repo must follow the instructions in `CLAUDE.md`.

432
CLAUDE.md
View File

@ -2,277 +2,57 @@
## Overview ## Overview
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection. A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
## Implementation Strategy ## Current Features
### Current Implementation Status ### Core Functionality
- **Real-time Monitoring**: CPU, RAM, Storage, and Service status
- **Service Management**: Start/stop services with user-stopped tracking
- **Multi-host Support**: Monitor multiple servers from single dashboard
- **NixOS Integration**: System rebuild via SSH + tmux popup
- **Backup Monitoring**: Borgbackup status and scheduling
**System Panel Enhancement - COMPLETED** ✅ ### User-Stopped Service Tracking
- Services stopped via dashboard are marked as "user-stopped"
- User-stopped services report Status::OK instead of Warning
- Prevents false alerts during intentional maintenance
- Persistent storage survives agent restarts
- Automatic flag clearing when services are restarted via dashboard
All system panel features successfully implemented: ### Service Management
- ✅ **NixOS Collector**: Created collector for version and active users - **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services
- ✅ **System Widget**: Unified widget combining NixOS, CPU, RAM, and Storage - **Service Actions**:
- ✅ **Build Display**: Shows NixOS build information without codename - `s` - Start service (sends UserStart command)
- ✅ **Active Users**: Displays currently logged in users - `S` - Stop service (sends UserStop command)
- ✅ **Tmpfs Monitoring**: Added /tmp usage to RAM section - `R` - Rebuild current host
- ✅ **Agent Deployment**: NixOS collector working in production - **Visual Status**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
- **Transitional Icons**: Blue arrows during operations
**Simplified Navigation and Service Management - COMPLETED** ✅ ### Navigation
- **Tab**: Switch between hosts
All navigation and service management features successfully implemented: - **↑↓ or j/k**: Select services
- ✅ **Direct Service Control**: Up/Down (or j/k) arrows directly control service selection
- ✅ **Always Visible Selection**: Service selection highlighting always visible (no panel focus needed)
- ✅ **Complete Service Discovery**: All configured services visible regardless of state
- ✅ **Transitional Visual Feedback**: Service operations show directional arrows (↑ ↓ ↻)
- ✅ **Simplified Interface**: Removed panel switching complexity, uniform appearance
- ✅ **Vi-style Navigation**: Added j/k keys for vim users alongside arrow keys
**Current Status - October 28, 2025:**
- All service discovery and display features working correctly ✅
- Simplified navigation system implemented ✅
- Service selection always visible with direct control ✅
- Complete service visibility (all configured services show regardless of state) ✅
- Transitional service icons working with proper color handling ✅
- Build display working: "Build: 25.05.20251004.3bcc93c" ✅
- Agent version display working: "Agent: v0.1.33" ✅
- Cross-host version comparison implemented ✅
- Automated binary release system working ✅
- SMART data consolidated into disk collector ✅
**RESOLVED - Remote Rebuild Functionality:**
- ✅ **System Rebuild**: Now uses simple SSH + tmux popup approach
- ✅ **Process Isolation**: Rebuild runs independently via SSH, survives agent/dashboard restarts
- ✅ **Configuration**: SSH user and rebuild alias configurable in dashboard config
- ✅ **Service Control**: Works correctly for start/stop/restart of services
**Solution Implemented:**
- Replaced complex SystemRebuild command infrastructure with direct tmux popup
- Uses `tmux display-popup "ssh -tt {user}@{hostname} 'bash -ic {alias}'"`
- Configurable SSH user and rebuild alias in dashboard config
- Eliminates all agent crashes during rebuilds
- Simple, reliable, and follows standard tmux interface patterns
**Current Layout:**
```
NixOS:
Build: 25.05.20251004.3bcc93c
Agent: v0.1.17 # Shows agent version from Cargo.toml
Active users: cm, simon
CPU:
● Load: 0.02 0.31 0.86 • 3000MHz
RAM:
● Usage: 33% 2.6GB/7.6GB
● /tmp: 0% 0B/2.0GB
Storage:
● root (Single):
├─ ● nvme0n1 W: 1%
└─ ● 18% 167.4GB/928.2GB
```
**System panel layout fully implemented with blue tree symbols ✅**
**Tree symbols now use consistent blue theming across all panels ✅**
**Overflow handling restored for all widgets ("... and X more") ✅**
**Agent version display working correctly ✅**
**Cross-host version comparison logging warnings ✅**
**Backup panel visibility fixed - only shows when meaningful data exists ✅**
**SSH-based rebuild system fully implemented and working ✅**
### Current Simplified Navigation Implementation
**Navigation Controls:**
- **Tab**: Switch between hosts (cmbox, srv01, srv02, steambox, etc.)
- **↑↓ or j/k**: Move service selection cursor (always works)
- **q**: Quit dashboard - **q**: Quit dashboard
**Service Control:** ## Core Architecture Principles
- **s**: Start selected service
- **S**: Stop selected service
- **R**: Rebuild current host (works from any context)
**Visual Features:**
- **Service Selection**: Always visible blue background highlighting current service
- **Status Icons**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed), ? (unknown)
- **Transitional Icons**: Blue ↑ (starting), ↓ (stopping), ↻ (restarting) when not selected
- **Transitional Icons**: Dark gray arrows when service is selected (for visibility)
- **Uniform Interface**: All panels have consistent appearance (no focus borders)
### Service Discovery and Display - WORKING ✅
**All Issues Resolved (as of 2025-10-28):**
- ✅ **Complete Service Discovery**: Uses `systemctl list-unit-files` + `list-units --all` for comprehensive service detection
- ✅ **All Services Visible**: Shows all configured services regardless of current state (active/inactive)
- ✅ **Proper Status Display**: Active services show green ●, inactive show yellow ◐, failed show red ◯
- ✅ **Transitional Icons**: Visual feedback during service operations with proper color handling
- ✅ **Simplified Navigation**: Removed panel complexity, direct service control always available
- ✅ **Service Control**: Start (s) and Stop (S) commands work from anywhere
- ✅ **System Rebuild**: SSH + tmux popup approach for reliable remote rebuilds
### Terminal Popup for Real-time Output - IMPLEMENTED ✅
**Status (as of 2025-10-26):**
- ✅ **Terminal Popup UI**: 80% screen coverage with terminal styling and color-coded output
- ✅ **ZMQ Streaming Protocol**: CommandOutputMessage for real-time output transmission
- ✅ **Keyboard Controls**: ESC/Q to close, ↑↓ to scroll, manual close (no auto-close)
- ✅ **Real-time Display**: Live streaming of command output as it happens
- ✅ **Version-based Agent Reporting**: Shows "Agent: v0.1.13" instead of nix store hash
**Current Implementation Issues:**
- ❌ **Agent Process Crashes**: Agent dies during nixos-rebuild execution
- ❌ **Inconsistent Output**: Different outputs each time 'R' is pressed
- ❌ **Limited Output Visibility**: Not capturing all nixos-rebuild progress
**PLANNED SOLUTION - Systemd Service Approach:**
**Problem**: Direct nixos-rebuild execution in agent causes process crashes and inconsistent output.
**Solution**: Create dedicated systemd service for rebuild operations.
**Implementation Plan:**
1. **NixOS Systemd Service**:
```nix
systemd.services.cm-rebuild = {
description = "CM Dashboard NixOS Rebuild";
serviceConfig = {
Type = "oneshot";
ExecStart = "${pkgs.nixos-rebuild}/bin/nixos-rebuild switch --flake . --option sandbox false";
WorkingDirectory = "/var/lib/cm-dashboard/nixos-config";
User = "root";
StandardOutput = "journal";
StandardError = "journal";
};
};
```
2. **Agent Modification**:
- Replace direct nixos-rebuild execution with: `systemctl start cm-rebuild`
- Stream output via: `journalctl -u cm-rebuild -f --no-pager`
- Monitor service status for completion detection
3. **Benefits**:
- **Process Isolation**: Service runs independently, won't crash agent
- **Consistent Output**: Always same deterministic rebuild process
- **Proper Logging**: systemd journal handles all output management
- **Resource Management**: systemd manages cleanup and resource limits
- **Status Tracking**: Can query service status (running/failed/success)
**Next Priority**: Implement systemd service approach for reliable rebuild operations.
**Keyboard Controls Status:**
- **Services Panel**:
- R (restart) ✅ Working
- s (start) ✅ Working
- S (stop) ✅ Working
- **System Panel**: R (nixos-rebuild) ✅ Working with --option sandbox false
- **Backup Panel**: B (trigger backup) ❓ Not implemented
**Visual Feedback Implementation - IN PROGRESS:**
Context-appropriate progress indicators for each panel:
**Services Panel** (Service status transitions):
```
● nginx active → ⏳ nginx restarting → ● nginx active
● docker active → ⏳ docker stopping → ● docker inactive
```
**System Panel** (Build progress in NixOS section):
```
NixOS:
Build: 25.05.20251004.3bcc93c → Build: [████████████ ] 65%
Active users: cm, simon Active users: cm, simon
```
**Backup Panel** (OnGoing status with progress):
```
Latest backup: → Latest backup:
● 2024-10-23 14:32:15 ● OnGoing
└─ Duration: 1.3m └─ [██████ ] 60%
```
**Critical Configuration Hash Fix - HIGH PRIORITY:**
**Problem:** Configuration hash currently shows git commit hash instead of actual deployed system hash.
**Current (incorrect):**
- Shows git hash: `db11f82` (source repository commit)
- Not accurate - doesn't reflect what's actually deployed
**Target (correct):**
- Show nix store hash: `d8ivwiar` (first 8 chars from deployed system)
- Source: `/nix/store/d8ivwiarhwhgqzskj6q2482r58z46qjf-nixos-system-cmbox-25.05.20251004.3bcc93c`
- Pattern: Extract hash from `/nix/store/HASH-nixos-system-HOSTNAME-VERSION`
**Benefits:**
1. **Deployment Verification:** Confirms rebuild actually succeeded
2. **Accurate Status:** Shows what's truly running, not just source
3. **Rebuild Completion Detection:** Hash change = rebuild completed
4. **Rollback Tracking:** Each deployment has unique identifier
**Implementation Required:**
1. Agent extracts nix store hash from `ls -la /run/current-system`
2. Reports this as `system_config_hash` metric instead of git hash
3. Dashboard displays first 8 characters: `Config: d8ivwiar`
**Next Session Priority Tasks:**
**Remaining Features:**
1. **Fix Configuration Hash Display (CRITICAL)**:
- Use nix store hash instead of git commit hash
- Extract from `/run/current-system` -> `/nix/store/HASH-nixos-system-*`
- Enables proper rebuild completion detection
2. **Command Response Protocol**:
- Agent sends command completion/failure back to dashboard via ZMQ
- Dashboard updates UI status from ⏳ to ● when commands complete
- Clear success/failure status after timeout
3. **Backup Panel Features**:
- Implement backup trigger functionality (B key)
- Complete visual feedback for backup operations
- Add backup progress indicators
**Enhancement Tasks:**
- Add confirmation dialogs for destructive actions (stop/restart/rebuild)
- Implement command history/logging
- Add keyboard shortcuts help overlay
**Future Enhanced Navigation:**
- Add Page Up/Down for faster scrolling through long service lists
- Implement search/filter functionality for services
- Add jump-to-service shortcuts (first letter navigation)
**Future Advanced Features:**
- Service dependency visualization
- Historical service status tracking
- Real-time log viewing integration
## Core Architecture Principles - CRITICAL
### Individual Metrics Philosophy ### Individual Metrics Philosophy
- Agent collects individual metrics, dashboard composes widgets
**NEW ARCHITECTURE**: Agent collects individual metrics, dashboard composes widgets from those metrics. - Each metric collected, transmitted, and stored individually
- Agent calculates status for each metric using thresholds
- Dashboard aggregates individual metric statuses for widget status
### Maintenance Mode ### Maintenance Mode
**Purpose:**
- Suppress email notifications during planned maintenance or backups
- Prevents false alerts when services are intentionally stopped
**Implementation:**
- Agent checks for `/tmp/cm-maintenance` file before sending notifications - Agent checks for `/tmp/cm-maintenance` file before sending notifications
- File presence suppresses all email notifications while continuing monitoring - File presence suppresses all email notifications while continuing monitoring
- Dashboard continues to show real status, only notifications are blocked - Dashboard continues to show real status, only notifications are blocked
**Usage:** Usage:
```bash ```bash
# Enable maintenance mode # Enable maintenance mode
touch /tmp/cm-maintenance touch /tmp/cm-maintenance
# Run maintenance tasks (backups, service restarts, etc.) # Run maintenance tasks
systemctl stop service systemctl stop service
# ... maintenance work ... # ... maintenance work ...
systemctl start service systemctl start service
@ -281,102 +61,29 @@ systemctl start service
rm /tmp/cm-maintenance rm /tmp/cm-maintenance
``` ```
**NixOS Integration:**
- Borgbackup script automatically creates/removes maintenance file
- Automatic cleanup via trap ensures maintenance mode doesn't stick
- All cinfiguration are shall be done from nixos config
**ARCHITECTURE ENFORCEMENT**:
- **ZERO legacy code reuse** - Fresh implementation following ARCHITECT.md exactly
- **Individual metrics only** - NO grouped metric structures
- **Reference-only legacy** - Study old functionality, implement new architecture
- **Clean slate mindset** - Build as if legacy codebase never existed
**Implementation Rules**:
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
**Testing & Building**:
- **Workspace builds**: `cargo build --workspace` for all testing
- **Clean compilation**: Remove `target/` between architecture changes
- **ZMQ testing**: Test agent-dashboard communication independently
- **Widget testing**: Verify UI layout matches legacy appearance exactly
**NEVER in New Implementation**:
- Copy/paste ANY code from legacy backup
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
# Important Communication Guidelines
NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise.
NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation.
## Commit Message Guidelines
**NEVER mention:**
- Claude or any AI assistant names
- Automation or AI-generated content
- Any reference to automated code generation
**ALWAYS:**
- Focus purely on technical changes and their purpose
- Use standard software development commit message format
- Describe what was changed and why, not how it was created
- Write from the perspective of a human developer
**Examples:**
- ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation"
- ❌ "Automated refactoring"
- ✅ "Implement maintenance mode for backup operations"
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
## Development and Deployment Architecture ## Development and Deployment Architecture
**CRITICAL:** Development and deployment paths are completely separate:
### Development Path ### Development Path
- **Location:** `~/projects/nixosbox` - **Location:** `~/projects/cm-dashboard`
- **Purpose:** Development workflow only - for committing new cm-dashboard code - **Purpose:** Development workflow only - for committing new code
- **Access:** Only for developers to commit changes - **Access:** Only for developers to commit changes
- **Code Access:** Running cm-dashboard code shall NEVER access this path
### Deployment Path ### Deployment Path
- **Location:** `/var/lib/cm-dashboard/nixos-config` - **Location:** `/var/lib/cm-dashboard/nixos-config`
- **Purpose:** Production deployment only - agent clones/pulls from git - **Purpose:** Production deployment only - agent clones/pulls from git
- **Access:** Only cm-dashboard agent for deployment operations
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild - **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild
### Git Flow ### Git Flow
``` ```
Development: ~/projects/nixosbox → git commit → git push Development: ~/projects/cm-dashboard → git commit → git push
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
``` ```
## Automated Binary Release System ## Automated Binary Release System
**IMPLEMENTED:** cm-dashboard now uses automated binary releases instead of source builds. CM Dashboard uses automated binary releases instead of source builds.
### Release Workflow ### Creating New Releases
1. **Automated Release Creation**
- Gitea Actions workflow builds static binaries on tag push
- Creates release with `cm-dashboard-linux-x86_64.tar.gz` tarball
- No manual intervention required for binary generation
2. **Creating New Releases**
```bash ```bash
cd ~/projects/cm-dashboard cd ~/projects/cm-dashboard
git tag v0.1.X git tag v0.1.X
@ -388,7 +95,7 @@ Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
- Creates GitHub-style release with tarball - Creates GitHub-style release with tarball
- Uploads binaries via Gitea API - Uploads binaries via Gitea API
3. **NixOS Configuration Updates** ### NixOS Configuration Updates
Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`: Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
```nix ```nix
@ -399,7 +106,7 @@ Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
}; };
``` ```
4. **Get Release Hash** ### Get Release Hash
```bash ```bash
cd ~/projects/nixosbox cd ~/projects/nixosbox
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl { nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
@ -408,18 +115,53 @@ Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
}' 2>&1 | grep "got:" }' 2>&1 | grep "got:"
``` ```
5. **Commit and Deploy** ### Building
```bash
cd ~/projects/nixosbox
git add hosts/common/cm-dashboard.nix
git commit -m "Update cm-dashboard to v0.1.X with static binaries"
git push
```
### Benefits **Testing & Building:**
- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
- **Clean compilation**: Remove `target/` between major changes
- **No compilation overhead** on each host ## Important Communication Guidelines
- **Consistent static binaries** across all hosts
- **Faster deployments** - download vs compile Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
- **No library dependency issues** - static linking
- **Automated pipeline** - tag push triggers everything ## Commit Message Guidelines
**NEVER mention:**
- Claude or any AI assistant names
- Automation or AI-generated content
- Any reference to automated code generation
**ALWAYS:**
- Focus purely on technical changes and their purpose
- Use standard software development commit message format
- Describe what was changed and why, not how it was created
- Write from the perspective of a human developer
**Examples:**
- ❌ "Generated with Claude Code"
- ❌ "AI-assisted implementation"
- ❌ "Automated refactoring"
- ✅ "Implement maintenance mode for backup operations"
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
## Implementation Rules
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
**NEVER:**
- Copy/paste ANY code from legacy implementations
- Calculate status in dashboard widgets
- Hardcode metric names in widgets (use const arrays)
- Create files unless absolutely necessary for achieving goals
- Create documentation files unless explicitly requested
**ALWAYS:**
- Prefer editing existing files to creating new ones
- Follow existing code conventions and patterns
- Use existing libraries and utilities
- Follow security best practices

498
README.md
View File

@ -1,88 +1,105 @@
# CM Dashboard # CM Dashboard
A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ. A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
## Current Implementation ## Features
This is a complete rewrite implementing an **individual metrics architecture** where: ### Core Monitoring
- **Real-time metrics**: CPU, RAM, Storage, and Service status
- **Multi-host support**: Monitor multiple servers from single dashboard
- **Service management**: Start/stop services with intelligent status tracking
- **NixOS integration**: System rebuild via SSH + tmux popup
- **Backup monitoring**: Borgbackup status and scheduling
- **Email notifications**: Intelligent batching prevents spam
- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status ### User-Stopped Service Tracking
- **Dashboard** subscribes to specific metrics and composes widgets Services stopped via the dashboard are intelligently tracked to prevent false alerts:
- **Status Aggregation** provides intelligent email notifications with batching
- **Persistent Cache** prevents false notifications on restart
## Dashboard Interface - **Smart status reporting**: User-stopped services show as Status::OK instead of Warning
- **Persistent storage**: Tracking survives agent restarts via JSON storage
- **Automatic management**: Flags cleared when services restarted via dashboard
- **Maintenance friendly**: No false alerts during intentional service operations
## Architecture
### Individual Metrics Philosophy
- **Agent**: Collects individual metrics, calculates status using thresholds
- **Dashboard**: Subscribes to specific metrics, composes widgets from individual data
- **ZMQ Communication**: Efficient real-time metric transmission
- **Status Aggregation**: Host-level status calculated from all service metrics
### Components
```
┌─────────────────┐ ZMQ ┌─────────────────┐
│ │◄──────────►│ │
│ Agent │ Metrics │ Dashboard │
│ - Collectors │ │ - TUI │
│ - Status │ │ - Widgets │
│ - Tracking │ │ - Commands │
│ │ │ │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ JSON Storage │ │ SSH + tmux │
│ - User-stopped │ │ - Remote rebuild│
│ - Cache │ │ - Process │
│ - State │ │ isolation │
└─────────────────┘ └─────────────────┘
```
### Service Control Flow
1. **User Action**: Dashboard sends `UserStart`/`UserStop` commands
2. **Agent Processing**:
- Marks service as user-stopped (if stopping)
- Executes `systemctl start/stop service`
- Syncs state to global tracker
3. **Status Calculation**:
- Systemd collector checks user-stopped flag
- Reports Status::OK for user-stopped inactive services
- Normal Warning status for system failures
## Interface
``` ```
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐ ┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
│CPU: ││Service: Status: RAM: Disk: │ │NixOS: ││Service: Status: RAM: Disk: │
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │ │Build: 25.05.20251004.3bcc93c ││● docker active 27M 496MB │
│RAM: ││● docker-registry active 19M 496MB │ │Agent: v0.1.43 ││● gitea active 579M 2.6GB │
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │ │Active users: cm, simon ││● nginx active 28M 24MB │
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │ │CPU: ││ ├─ ● gitea.cmtec.se 51ms │
│Disk nvme0n1: ││● haasp-core active 9M 1MB │ │● Load: 0.10 0.52 0.88 • 3000MHz ││ ├─ ● photos.cmtec.se 41ms │
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │ │RAM: ││● postgresql active 112M 357MB │
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │ │● Usage: 33% 2.6GB/7.6GB ││● redis-immich user-stopped │
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │ │● /tmp: 0% 0B/2.0GB ││● sshd active 2M 0 │
│ ││● mosquitto active 1M 1MB │ │Storage: ││● unifi active 594M 495MB │
│ ││● mysql active 38M 225MB │ │● root (Single): ││ │
│ ││● nginx active 28M 24MB │ │ ├─ ● nvme0n1 W: 1% ││ │
│ ││ ├─ ● gitea.cmtec.se 51ms │ │ └─ ● 18% 167.4GB/928.2GB ││ │
│ ││ ├─ ● haasp.cmtec.se 43ms │
│ ││ ├─ ● haasp.net 43ms │
│ ││ ├─ ● pages.cmtec.se 45ms │
└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
│● kryddorten 2 archives 67.6MB ││ │
│● mariehall2 2 archives 321.8MB ││ │
│● nixosbox 2 archives 4.5MB ││ │
│● unifi 2 archives 2.9MB ││ │
│● vaultwarden 2 archives 305kB ││ │
└────────────────────────────────────┘└─────────────────────────────────────────────────┘ └────────────────────────────────────┘└─────────────────────────────────────────────────┘
``` ```
**Navigation**: `←→` switch hosts, `r` refresh, `q` quit ### Navigation
- **Tab**: Switch between hosts
- **↑↓ or j/k**: Navigate services
- **s**: Start selected service (UserStart)
- **S**: Stop selected service (UserStop)
- **R**: Rebuild current host
- **q**: Quit
## Features ### Status Indicators
- **Green ●**: Active service
- **Real-time monitoring** - Dashboard updates every 1-2 seconds - **Yellow ◐**: Inactive service (system issue)
- **Individual metric collection** - Granular data for flexible dashboard composition - **Red ◯**: Failed service
- **Intelligent status aggregation** - Host-level status calculated from all services - **Blue arrows**: Service transitioning (↑ starting, ↓ stopping, ↻ restarting)
- **Smart email notifications** - Batched, detailed alerts with service groupings - **"user-stopped"**: Service stopped via dashboard (Status::OK)
- **Persistent state** - Prevents false notifications on restarts
- **ZMQ communication** - Efficient agent-to-dashboard messaging
- **Clean TUI** - Terminal-based dashboard with color-coded status indicators
## Architecture
### Core Components
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
- **Status Aggregation** - Intelligent batching and notification management
- **Persistent Cache** - Maintains state across restarts
### Status Levels
- **🟢 Ok** - Service running normally
- **🔵 Pending** - Service starting/stopping/reloading
- **🟡 Warning** - Service issues (high load, memory, disk usage)
- **🔴 Critical** - Service failed or critical thresholds exceeded
- **❓ Unknown** - Service state cannot be determined
## Quick Start ## Quick Start
### Build ### Building
```bash ```bash
# With Nix (recommended) # With Nix (recommended)
@ -93,21 +110,20 @@ sudo apt install libssl-dev pkg-config # Ubuntu/Debian
cargo build --workspace cargo build --workspace
``` ```
### Run ### Running
```bash ```bash
# Start agent (requires configuration file) # Start agent (requires configuration)
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml ./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
# Start dashboard # Start dashboard (inside tmux session)
./target/debug/cm-dashboard --config /path/to/dashboard.toml tmux
./target/debug/cm-dashboard --config /etc/cm-dashboard/dashboard.toml
``` ```
## Configuration ## Configuration
### Agent Configuration (`agent.toml`) ### Agent Configuration
The agent requires a comprehensive TOML configuration file:
```toml ```toml
collection_interval_seconds = 2 collection_interval_seconds = 2
@ -116,50 +132,27 @@ collection_interval_seconds = 2
publisher_port = 6130 publisher_port = 6130
command_port = 6131 command_port = 6131
bind_address = "0.0.0.0" bind_address = "0.0.0.0"
timeout_ms = 5000 transmission_interval_seconds = 2
heartbeat_interval_ms = 30000
[collectors.cpu] [collectors.cpu]
enabled = true enabled = true
interval_seconds = 2 interval_seconds = 2
load_warning_threshold = 9.0 load_warning_threshold = 5.0
load_critical_threshold = 10.0 load_critical_threshold = 10.0
temperature_warning_threshold = 100.0
temperature_critical_threshold = 110.0
[collectors.memory] [collectors.memory]
enabled = true enabled = true
interval_seconds = 2 interval_seconds = 2
usage_warning_percent = 80.0 usage_warning_percent = 80.0
usage_critical_percent = 95.0
[collectors.disk]
enabled = true
interval_seconds = 300
usage_warning_percent = 80.0
usage_critical_percent = 90.0 usage_critical_percent = 90.0
[[collectors.disk.filesystems]]
name = "root"
uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79"
mount_point = "/"
fs_type = "ext4"
monitor = true
[collectors.systemd] [collectors.systemd]
enabled = true enabled = true
interval_seconds = 10 interval_seconds = 10
memory_warning_mb = 1000.0 service_name_filters = ["nginx*", "postgresql*", "docker*", "sshd*"]
memory_critical_mb = 2000.0 excluded_services = ["nginx-config-reload", "systemd-", "getty@"]
service_name_filters = [ nginx_latency_critical_ms = 1000.0
"nginx*", "postgresql*", "redis*", "docker*", "sshd*", http_timeout_seconds = 10
"gitea*", "immich*", "haasp*", "mosquitto*", "mysql*",
"unifi*", "vaultwarden*"
]
excluded_services = [
"nginx-config-reload", "sshd-keygen", "systemd-",
"getty@", "user@", "dbus-", "NetworkManager-"
]
[notifications] [notifications]
enabled = true enabled = true
@ -167,251 +160,202 @@ smtp_host = "localhost"
smtp_port = 25 smtp_port = 25
from_email = "{hostname}@example.com" from_email = "{hostname}@example.com"
to_email = "admin@example.com" to_email = "admin@example.com"
rate_limit_minutes = 0 aggregation_interval_seconds = 30
trigger_on_warnings = true
trigger_on_failures = true
recovery_requires_all_ok = true
suppress_individual_recoveries = true
[status_aggregation]
enabled = true
aggregation_method = "worst_case"
notification_interval_seconds = 30
[cache]
persist_path = "/var/lib/cm-dashboard/cache.json"
``` ```
### Dashboard Configuration (`dashboard.toml`) ### Dashboard Configuration
```toml ```toml
[zmq] [zmq]
hosts = [ subscriber_ports = [6130]
{ name = "server1", address = "192.168.1.100", port = 6130 },
{ name = "server2", address = "192.168.1.101", port = 6130 } [hosts]
] predefined_hosts = ["cmbox", "srv01", "srv02"]
connection_timeout_ms = 5000
reconnect_interval_ms = 10000
[ui] [ui]
refresh_interval_ms = 1000 ssh_user = "cm"
theme = "dark" rebuild_alias = "nixos-rebuild-cmtec"
``` ```
## Collectors ## Technical Implementation
The agent implements several specialized collectors: ### Collectors
### CPU Collector (`cpu.rs`) #### Systemd Collector
- **Service Discovery**: Uses `systemctl list-unit-files` + `list-units --all`
- **Status Calculation**: Checks user-stopped flag before assigning Warning status
- **Memory Tracking**: Per-service memory usage via `systemctl show`
- **Sub-services**: Nginx site latency, Docker containers
- **User-stopped Integration**: `UserStoppedServiceTracker::is_service_user_stopped()`
- Load average (1, 5, 15 minute) #### User-Stopped Service Tracker
- CPU temperature monitoring - **Storage**: `/var/lib/cm-dashboard/user-stopped-services.json`
- Real-time process monitoring (top CPU consumers) - **Thread Safety**: Global singleton with `Arc<Mutex<>>`
- Status calculation with configurable thresholds - **Persistence**: Automatic save on state changes
- **Global Access**: Static methods for collector integration
### Memory Collector (`memory.rs`) #### Other Collectors
- **CPU**: Load average, temperature, frequency monitoring
- **Memory**: RAM/swap usage, tmpfs monitoring
- **Disk**: Filesystem usage, SMART health data
- **NixOS**: Build version, active users, agent version
- **Backup**: Borgbackup repository status and metrics
- RAM usage (total, used, available) ### ZMQ Protocol
- Swap monitoring
- Real-time process monitoring (top RAM consumers)
- Memory pressure detection
### Disk Collector (`disk.rs`) ```rust
// Metric Message
#[derive(Serialize, Deserialize)]
pub struct MetricMessage {
pub hostname: String,
pub timestamp: u64,
pub metrics: Vec<Metric>,
}
- Filesystem usage per mount point // Service Commands
- SMART health monitoring pub enum AgentCommand {
- Temperature and wear tracking ServiceControl {
- Configurable filesystem monitoring service_name: String,
action: ServiceAction,
},
SystemRebuild { /* SSH config */ },
CollectNow,
}
### Systemd Collector (`systemd.rs`) pub enum ServiceAction {
Start, // System-initiated
Stop, // System-initiated
UserStart, // User via dashboard (clears user-stopped)
UserStop, // User via dashboard (marks user-stopped)
Status,
}
```
- Service status monitoring (`active`, `inactive`, `failed`) ### Maintenance Mode
- Memory usage per service
- Service filtering and exclusions
- Handles transitional states (`Status::Pending`)
### Backup Collector (`backup.rs`) Suppress notifications during planned maintenance:
- Reads TOML status files from backup systems ```bash
- Archive age verification # Enable maintenance mode
- Disk usage tracking touch /tmp/cm-maintenance
- Repository health monitoring
# Perform maintenance
systemctl stop service
# ... work ...
systemctl start service
# Disable maintenance mode
rm /tmp/cm-maintenance
```
## Email Notifications ## Email Notifications
### Intelligent Batching ### Intelligent Batching
- **Real-time dashboard**: Immediate status updates
- **Batched emails**: Aggregated every 30 seconds
- **Smart grouping**: Services organized by severity
- **Recovery suppression**: Reduces notification spam
The system implements smart notification batching to prevent email spam: ### Example Alert
- **Real-time dashboard updates** - Status changes appear immediately
- **Batched email notifications** - Aggregated every 30 seconds
- **Detailed groupings** - Services organized by severity
### Example Alert Email
``` ```
Subject: Status Alert: 2 critical, 1 warning, 15 started Subject: Status Alert: 1 critical, 2 warnings, 0 recoveries
Status Summary (30s duration) Status Summary (30s duration)
Host Status: Ok → Warning Host Status: Ok → Warning
🔴 CRITICAL ISSUES (2): 🔴 CRITICAL ISSUES (1):
postgresql: Ok → Critical postgresql: Ok → Critical (memory usage 95%)
nginx: Warning → Critical
🟡 WARNINGS (1): 🟡 WARNINGS (2):
redis: Ok → Warning (memory usage 85%) nginx: Ok → Warning (high load 8.5)
redis: user-stopped → Warning (restarted by system)
✅ RECOVERIES (0): ✅ RECOVERIES (0):
🟢 SERVICE STARTUPS (15):
docker: Unknown → Ok
sshd: Unknown → Ok
...
-- --
CM Dashboard Agent CM Dashboard Agent v0.1.43
Generated at 2025-10-21 19:42:42 CET
``` ```
## Individual Metrics Architecture
The system follows a **metrics-first architecture**:
### Agent Side
```rust
// Agent collects individual metrics
vec![
Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok),
Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning),
Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok),
]
```
### Dashboard Side
```rust
// Widgets subscribe to specific metrics
impl Widget for CpuWidget {
fn update_from_metrics(&mut self, metrics: &[&Metric]) {
for metric in metrics {
match metric.name.as_str() {
"cpu_load_1min" => self.load_1min = metric.value.as_f32(),
"cpu_load_5min" => self.load_5min = metric.value.as_f32(),
"cpu_temperature_celsius" => self.temperature = metric.value.as_f32(),
_ => {}
}
}
}
}
```
## Persistent Cache
The cache system prevents false notifications:
- **Automatic saving** - Saves when service status changes
- **Persistent storage** - Maintains state across agent restarts
- **Simple design** - No complex TTL or cleanup logic
- **Status preservation** - Prevents duplicate notifications
## Development ## Development
### Project Structure ### Project Structure
``` ```
cm-dashboard/ cm-dashboard/
├── agent/ # Metrics collection agent ├── agent/ # Metrics collection agent
│ ├── src/ │ ├── src/
│ │ ├── collectors/ # CPU, memory, disk, systemd, backup │ │ ├── collectors/ # CPU, memory, disk, systemd, backup, nixos
│ │ ├── service_tracker.rs # User-stopped service tracking
│ │ ├── status/ # Status aggregation and notifications │ │ ├── status/ # Status aggregation and notifications
│ │ ├── cache/ # Persistent metric caching
│ │ ├── config/ # TOML configuration loading │ │ ├── config/ # TOML configuration loading
│ │ └── notifications/ # Email notification system │ │ └── communication/ # ZMQ message handling
├── dashboard/ # TUI dashboard application ├── dashboard/ # TUI dashboard application
│ ├── src/ │ ├── src/
│ │ ├── ui/widgets/ # CPU, memory, services, backup widgets │ │ ├── ui/widgets/ # CPU, memory, services, backup, system
│ │ ├── metrics/ # Metric storage and filtering │ │ ├── communication/ # ZMQ consumption and commands
│ │ └── communication/ # ZMQ metric consumption │ │ └── app.rs # Main application loop
├── shared/ # Shared types and utilities ├── shared/ # Shared types and utilities
│ └── src/ │ └── src/
│ ├── metrics.rs # Metric, Status, and Value types │ ├── metrics.rs # Metric, Status, StatusTracker types
│ ├── protocol.rs # ZMQ message format │ ├── protocol.rs # ZMQ message format
│ └── cache.rs # Cache configuration │ └── cache.rs # Cache configuration
└── README.md # This file └── CLAUDE.md # Development guidelines and rules
``` ```
### Building ### Testing
```bash ```bash
# Debug build # Build and test
cargo build --workspace nix-shell -p openssl pkg-config --run "cargo build --workspace"
nix-shell -p openssl pkg-config --run "cargo test --workspace"
# Release build # Code quality
cargo build --workspace --release cargo fmt --all
# Run tests
cargo test --workspace
# Check code formatting
cargo fmt --all -- --check
# Run clippy linter
cargo clippy --workspace -- -D warnings cargo clippy --workspace -- -D warnings
``` ```
### Dependencies ## Deployment
- **tokio** - Async runtime ### Automated Binary Releases
- **zmq** - Message passing between agent and dashboard ```bash
- **ratatui** - Terminal user interface # Create new release
- **serde** - Serialization for metrics and config cd ~/projects/cm-dashboard
- **anyhow/thiserror** - Error handling git tag v0.1.X
- **tracing** - Structured logging git push origin v0.1.X
- **lettre** - SMTP email notifications ```
- **clap** - Command-line argument parsing
- **toml** - Configuration file parsing
## NixOS Integration This triggers automated:
- Static binary compilation with `RUSTFLAGS="-C target-feature=+crt-static"`
- GitHub-style release creation
- Tarball upload to Gitea
This project is designed for declarative deployment via NixOS: ### NixOS Integration
Update `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
### Configuration Generation
The NixOS module automatically generates the agent configuration:
```nix ```nix
# hosts/common/cm-dashboard.nix version = "v0.1.43";
services.cm-dashboard-agent = { src = pkgs.fetchurl {
enable = true; url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
port = 6130; sha256 = "sha256-HASH";
}; };
``` ```
### Deployment Get hash via:
```bash ```bash
# Update NixOS configuration cd ~/projects/nixosbox
git add hosts/common/cm-dashboard.nix nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
git commit -m "Update cm-dashboard configuration" url = "URL_HERE";
git push sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
}' 2>&1 | grep "got:"
# Rebuild system (user-performed)
sudo nixos-rebuild switch --flake .
``` ```
## Monitoring Intervals ## Monitoring Intervals
- **CPU/Memory**: 2 seconds (real-time monitoring) - **Metrics Collection**: 2 seconds (CPU, memory, services)
- **Disk usage**: 300 seconds (5 minutes) - **Metric Transmission**: 2 seconds (ZMQ publish)
- **Systemd services**: 10 seconds - **Dashboard Updates**: 1 second (UI refresh)
- **SMART health**: 600 seconds (10 minutes) - **Email Notifications**: 30 seconds (batched)
- **Backup status**: 60 seconds (1 minute) - **Disk Monitoring**: 300 seconds (5 minutes)
- **Email notifications**: 30 seconds (batched) - **Service Discovery**: 300 seconds (5 minutes cache)
- **Dashboard updates**: 1 second (real-time display)
## License ## License
MIT License - see LICENSE file for details MIT License - see LICENSE file for details.

63
TODO.md
View File

@ -1,63 +0,0 @@
# TODO
## Systemd filtering (agent)
- remove user systemd collection
- reduce number of systemctl call
- Cahnge so only services in include list are detected
- Filter on exact name
- Add support for "\*" in filtering
## System panel (agent/dashboard)
use following layout:
'''
NixOS:
Build: xxxxxx
Agen: xxxxxx
CPU:
● Load: 0.02 0.31 0.86
└─ Freq: 3000MHz
RAM:
● Usage: 33% 2.6GB/7.6GB
└─ ● /tmp: 0% 0B/2.0GB
Storage:
● /:
├─ ● nvme0n1 T: 40C • W: 4%
└─ ● 8% 75.0GB/906.2GB
'''
- Add support to show login/active users
- Add support to show timestamp/version for latest nixos rebuild
## Backup panel (dashboard)
use following layout:
'''
Latest backup:
<timestamp>
└─ Duration: 1.3m
Disk:
● Samsung SSD 870 QVO 1TB
├─ S/N: S5RRNF0W800639Y
└─ Usage: 50.5GB/915.8GB
Repos:
● gitea (4) 5.1GB
● immich (4) 45.0GB
● kryddorten (4) 67.8MB
● mariehall2 (4) 322.7MB
● nixosbox (4) 5.5MB
● unifi (4) 5.7MB
● vaultwarden (4) 508kB
'''
## Keyboard navigation and scrolling (dashboard)
- Add keyboard navigation between panels "Shift-Tab"
- Add lower statusbar with dynamic updated shortcuts when switchng between panels
## Remote execution (agent/dashboard)
- Add support for send command via dashboard to agent to do nixos rebuid
- Add support for navigating services in dashboard and trigger start/stop/restart
- Add support for trigger backup

View File

@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.43" version = "0.1.44"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@ -314,7 +314,7 @@ impl Agent {
let output = tokio::process::Command::new("sudo") let output = tokio::process::Command::new("sudo")
.arg("systemctl") .arg("systemctl")
.arg(action_str) .arg(action_str)
.arg(service_name) .arg(format!("{}.service", service_name))
.output() .output()
.await?; .await?;

View File

@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.43" version = "0.1.44"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@ -1,88 +0,0 @@
# Hardcoded Values Removed - Configuration Summary
## ✅ All Hardcoded Values Converted to Configuration
### **1. SystemD Nginx Check Interval**
- **Before**: `nginx_check_interval_seconds: 30` (hardcoded)
- **After**: `nginx_check_interval_seconds: config.nginx_check_interval_seconds`
- **NixOS Config**: `nginx_check_interval_seconds = 30;`
### **2. ZMQ Transmission Interval**
- **Before**: `Duration::from_secs(1)` (hardcoded)
- **After**: `Duration::from_secs(self.config.zmq.transmission_interval_seconds)`
- **NixOS Config**: `transmission_interval_seconds = 1;`
### **3. HTTP Timeouts in SystemD Collector**
- **Before**:
```rust
.timeout(Duration::from_secs(10))
.connect_timeout(Duration::from_secs(10))
```
- **After**:
```rust
.timeout(Duration::from_secs(self.config.http_timeout_seconds))
.connect_timeout(Duration::from_secs(self.config.http_connect_timeout_seconds))
```
- **NixOS Config**:
```nix
http_timeout_seconds = 10;
http_connect_timeout_seconds = 10;
```
## **Configuration Structure Changes**
### **SystemdConfig** (agent/src/config/mod.rs)
```rust
pub struct SystemdConfig {
// ... existing fields ...
pub nginx_check_interval_seconds: u64, // NEW
pub http_timeout_seconds: u64, // NEW
pub http_connect_timeout_seconds: u64, // NEW
}
```
### **ZmqConfig** (agent/src/config/mod.rs)
```rust
pub struct ZmqConfig {
// ... existing fields ...
pub transmission_interval_seconds: u64, // NEW
}
```
## **NixOS Configuration Updates**
### **ZMQ Section** (hosts/common/cm-dashboard.nix)
```nix
zmq = {
# ... existing fields ...
transmission_interval_seconds = 1; # NEW
};
```
### **SystemD Section** (hosts/common/cm-dashboard.nix)
```nix
systemd = {
# ... existing fields ...
nginx_check_interval_seconds = 30; # NEW
http_timeout_seconds = 10; # NEW
http_connect_timeout_seconds = 10; # NEW
};
```
## **Benefits**
**No hardcoded values** - All timing/timeout values configurable
**Consistent configuration** - Everything follows NixOS config pattern
**Environment-specific tuning** - Can adjust timeouts per deployment
**Maintainability** - No magic numbers scattered in code
**Testing flexibility** - Can configure different values for testing
## **Runtime Behavior**
All previously hardcoded values now respect configuration:
- **Nginx latency checks**: Every 30s (configurable)
- **ZMQ transmission**: Every 1s (configurable)
- **HTTP requests**: 10s timeout (configurable)
- **HTTP connections**: 10s timeout (configurable)
The codebase is now **100% configuration-driven** with no hardcoded timing values.

View File

@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.43" version = "0.1.44"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@ -1,42 +0,0 @@
#!/bin/bash
# Test script to verify collector intervals are working correctly
# Expected behavior:
# - CPU/Memory: Every 2 seconds
# - Systemd/Network: Every 10 seconds
# - Backup/NixOS: Every 60 seconds
# - Disk: Every 300 seconds (5 minutes)
echo "=== Testing Collector Interval Implementation ==="
echo "Expected intervals from NixOS config:"
echo " CPU: 2s, Memory: 2s"
echo " Systemd: 10s, Network: 10s"
echo " Backup: 60s, NixOS: 60s"
echo " Disk: 300s (5m)"
echo ""
# Note: Cannot run actual agent without proper config, but we can verify the code logic
echo "✅ Code Implementation Status:"
echo " - TimedCollector struct with interval tracking: IMPLEMENTED"
echo " - Individual collector intervals from config: IMPLEMENTED"
echo " - collect_metrics_timed() respects intervals: IMPLEMENTED"
echo " - Debug logging shows interval compliance: IMPLEMENTED"
echo ""
echo "🔍 Key Implementation Details:"
echo " - MetricCollectionManager now tracks last_collection time per collector"
echo " - Each collector gets Duration::from_secs(config.{collector}.interval_seconds)"
echo " - Only collectors with elapsed >= interval are called"
echo " - Debug logs show actual collection with interval info"
echo ""
echo "📊 Expected Runtime Behavior:"
echo " At 0s: All collectors run (startup)"
echo " At 2s: CPU, Memory run"
echo " At 4s: CPU, Memory run"
echo " At 10s: CPU, Memory, Systemd, Network run"
echo " At 60s: CPU, Memory, Systemd, Network, Backup, NixOS run"
echo " At 300s: All collectors run including Disk"
echo ""
echo "✅ CONCLUSION: Codebase now follows NixOS configuration intervals correctly!"

View File

@ -1,32 +0,0 @@
#!/usr/bin/env rust-script
use std::process;
/// Check if running inside tmux session
fn check_tmux_session() {
// Check for TMUX environment variable which is set when inside a tmux session
if std::env::var("TMUX").is_err() {
eprintln!("╭─────────────────────────────────────────────────────────────╮");
eprintln!("│ ⚠️ TMUX REQUIRED │");
eprintln!("├─────────────────────────────────────────────────────────────┤");
eprintln!("│ CM Dashboard must be run inside a tmux session for proper │");
eprintln!("│ terminal handling and remote operation functionality. │");
eprintln!("│ │");
eprintln!("│ Please start a tmux session first: │");
eprintln!("│ tmux new-session -d -s dashboard cm-dashboard │");
eprintln!("│ tmux attach-session -t dashboard │");
eprintln!("│ │");
eprintln!("│ Or simply: │");
eprintln!("│ tmux │");
eprintln!("│ cm-dashboard │");
eprintln!("╰─────────────────────────────────────────────────────────────╯");
process::exit(1);
} else {
println!("✅ Running inside tmux session - OK");
}
}
fn main() {
println!("Testing tmux check function...");
check_tmux_session();
}

View File

@ -1,53 +0,0 @@
#!/bin/bash
echo "=== TMUX Check Implementation Test ==="
echo ""
echo "📋 Testing tmux check logic:"
echo ""
echo "1. Current environment:"
if [ -n "$TMUX" ]; then
echo " ✅ Running inside tmux session"
echo " TMUX variable: $TMUX"
else
echo " ❌ NOT running inside tmux session"
echo " TMUX variable: (not set)"
fi
echo ""
echo "2. Simulating dashboard tmux check logic:"
echo ""
# Simulate the Rust check logic
if [ -z "$TMUX" ]; then
echo " Dashboard would show:"
echo " ╭─────────────────────────────────────────────────────────────╮"
echo " │ ⚠️ TMUX REQUIRED │"
echo " ├─────────────────────────────────────────────────────────────┤"
echo " │ CM Dashboard must be run inside a tmux session for proper │"
echo " │ terminal handling and remote operation functionality. │"
echo " │ │"
echo " │ Please start a tmux session first: │"
echo " │ tmux new-session -d -s dashboard cm-dashboard │"
echo " │ tmux attach-session -t dashboard │"
echo " │ │"
echo " │ Or simply: │"
echo " │ tmux │"
echo " │ cm-dashboard │"
echo " ╰─────────────────────────────────────────────────────────────╯"
echo " Then exit with code 1"
else
echo " ✅ Dashboard tmux check would PASS - continuing normally"
fi
echo ""
echo "3. Implementation status:"
echo " ✅ check_tmux_session() function added to dashboard/src/main.rs"
echo " ✅ Called early in main() but only for TUI mode (not headless)"
echo " ✅ Uses std::env::var(\"TMUX\") to detect tmux session"
echo " ✅ Shows helpful error message with usage instructions"
echo " ✅ Exits with code 1 if not in tmux"
echo ""
echo "✅ TMUX check implementation complete!"