Compare commits
34 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 5f6e47ece5 | |||
| 0e7cf24dbb | |||
| 2d080a2f51 | |||
| 6179bd51a7 | |||
| 57de4c366a | |||
| e18778e962 | |||
| e4469a0ebf | |||
| 6fedf4c7fc | |||
| 3f6dffa66e | |||
| 1b64fbde3d | |||
| 4f4c3b0d6e | |||
| bd20f0cae1 | |||
| 11c9a5f9d2 | |||
| aeae60146d | |||
| a82c81e8e3 | |||
| c56e9d7be2 | |||
| c8f800a1e5 | |||
| fc6b3424cf | |||
| 35e06c6734 | |||
| 783d233319 | |||
| 6509a2b91a | |||
| 52f8c40b86 | |||
| a86b5ba8f9 | |||
| 1b964545be | |||
| 97aa1708c2 | |||
| d12689f3b5 | |||
| f22e3ee95e | |||
| e890c5e810 | |||
| 078c30a592 | |||
| a847674004 | |||
| 2618f6b62f | |||
| c3fc5a181d | |||
| 3f45a172b3 | |||
| 5b12c12228 |
@@ -1,3 +0,0 @@
|
|||||||
# Agent Guide
|
|
||||||
|
|
||||||
Agents working in this repo must follow the instructions in `CLAUDE.md`.
|
|
||||||
472
CLAUDE.md
472
CLAUDE.md
@@ -2,276 +2,76 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection.
|
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
|
||||||
|
|
||||||
## Implementation Strategy
|
## Current Features
|
||||||
|
|
||||||
### Current Implementation Status
|
### Core Functionality
|
||||||
|
- **Real-time Monitoring**: CPU, RAM, Storage, and Service status
|
||||||
|
- **Service Management**: Start/stop services with user-stopped tracking
|
||||||
|
- **Multi-host Support**: Monitor multiple servers from single dashboard
|
||||||
|
- **NixOS Integration**: System rebuild via SSH + tmux popup
|
||||||
|
- **Backup Monitoring**: Borgbackup status and scheduling
|
||||||
|
|
||||||
**System Panel Enhancement - COMPLETED** ✅
|
### User-Stopped Service Tracking
|
||||||
|
- Services stopped via dashboard are marked as "user-stopped"
|
||||||
|
- User-stopped services report Status::OK instead of Warning
|
||||||
|
- Prevents false alerts during intentional maintenance
|
||||||
|
- Persistent storage survives agent restarts
|
||||||
|
- Automatic flag clearing when services are restarted via dashboard
|
||||||
|
|
||||||
All system panel features successfully implemented:
|
### Custom Service Logs
|
||||||
- ✅ **NixOS Collector**: Created collector for version and active users
|
- Configure service-specific log file paths per host in dashboard config
|
||||||
- ✅ **System Widget**: Unified widget combining NixOS, CPU, RAM, and Storage
|
- Press `L` on any service to view custom log files via `tail -f`
|
||||||
- ✅ **Build Display**: Shows NixOS build information without codename
|
- Configuration format in dashboard config:
|
||||||
- ✅ **Active Users**: Displays currently logged in users
|
```toml
|
||||||
- ✅ **Tmpfs Monitoring**: Added /tmp usage to RAM section
|
[service_logs]
|
||||||
- ✅ **Agent Deployment**: NixOS collector working in production
|
hostname1 = [
|
||||||
|
{ service_name = "nginx", log_file_path = "/var/log/nginx/access.log" },
|
||||||
**Keyboard Navigation and Service Management - COMPLETED** ✅
|
{ service_name = "app", log_file_path = "/var/log/myapp/app.log" }
|
||||||
|
]
|
||||||
All keyboard navigation and service selection features successfully implemented:
|
hostname2 = [
|
||||||
- ✅ **Panel Navigation**: Shift+Tab cycles through visible panels only (System → Services → Backup)
|
{ service_name = "database", log_file_path = "/var/log/postgres/postgres.log" }
|
||||||
- ✅ **Service Selection**: Up/Down arrows navigate through parent services with visual cursor
|
]
|
||||||
- ✅ **Focus Management**: Selection highlighting only visible when Services panel focused
|
|
||||||
- ✅ **Status Preservation**: Service health colors maintained during selection (green/red icons)
|
|
||||||
- ✅ **Smart Panel Switching**: Only cycles through panels with data (backup panel conditional)
|
|
||||||
- ✅ **Scroll Support**: All panels support content scrolling with proper overflow indicators
|
|
||||||
|
|
||||||
**Current Status - October 27, 2025:**
|
|
||||||
- All keyboard navigation features working correctly ✅
|
|
||||||
- Service selection cursor implemented with focus-aware highlighting ✅
|
|
||||||
- Panel scrolling fixed for System, Services, and Backup panels ✅
|
|
||||||
- Build display working: "Build: 25.05.20251004.3bcc93c" ✅
|
|
||||||
- Agent version display working: "Agent: v0.1.17" ✅
|
|
||||||
- Cross-host version comparison implemented ✅
|
|
||||||
- Automated binary release system working ✅
|
|
||||||
- SMART data consolidated into disk collector ✅
|
|
||||||
|
|
||||||
**RESOLVED - Remote Rebuild Functionality:**
|
|
||||||
- ✅ **System Rebuild**: Now uses simple SSH + tmux popup approach
|
|
||||||
- ✅ **Process Isolation**: Rebuild runs independently via SSH, survives agent/dashboard restarts
|
|
||||||
- ✅ **Configuration**: SSH user and rebuild alias configurable in dashboard config
|
|
||||||
- ✅ **Service Control**: Works correctly for start/stop/restart of services
|
|
||||||
|
|
||||||
**Solution Implemented:**
|
|
||||||
- Replaced complex SystemRebuild command infrastructure with direct tmux popup
|
|
||||||
- Uses `tmux display-popup "ssh -tt {user}@{hostname} 'bash -ic {alias}'"`
|
|
||||||
- Configurable SSH user and rebuild alias in dashboard config
|
|
||||||
- Eliminates all agent crashes during rebuilds
|
|
||||||
- Simple, reliable, and follows standard tmux interface patterns
|
|
||||||
|
|
||||||
**Current Layout:**
|
|
||||||
```
|
|
||||||
NixOS:
|
|
||||||
Build: 25.05.20251004.3bcc93c
|
|
||||||
Agent: v0.1.17 # Shows agent version from Cargo.toml
|
|
||||||
Active users: cm, simon
|
|
||||||
CPU:
|
|
||||||
● Load: 0.02 0.31 0.86 • 3000MHz
|
|
||||||
RAM:
|
|
||||||
● Usage: 33% 2.6GB/7.6GB
|
|
||||||
● /tmp: 0% 0B/2.0GB
|
|
||||||
Storage:
|
|
||||||
● root (Single):
|
|
||||||
├─ ● nvme0n1 W: 1%
|
|
||||||
└─ ● 18% 167.4GB/928.2GB
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**System panel layout fully implemented with blue tree symbols ✅**
|
### Service Management
|
||||||
**Tree symbols now use consistent blue theming across all panels ✅**
|
- **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services
|
||||||
**Overflow handling restored for all widgets ("... and X more") ✅**
|
- **Service Actions**:
|
||||||
**Agent version display working correctly ✅**
|
- `s` - Start service (sends UserStart command)
|
||||||
**Cross-host version comparison logging warnings ✅**
|
- `S` - Stop service (sends UserStop command)
|
||||||
**Backup panel visibility fixed - only shows when meaningful data exists ✅**
|
- `J` - Show service logs (journalctl in tmux popup)
|
||||||
**SSH-based rebuild system fully implemented and working ✅**
|
- `L` - Show custom log files (tail -f custom paths in tmux popup)
|
||||||
|
- `R` - Rebuild current host
|
||||||
|
- **Visual Status**: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed)
|
||||||
|
- **Transitional Icons**: Blue arrows during operations
|
||||||
|
|
||||||
### Current Keyboard Navigation Implementation
|
### Navigation
|
||||||
|
- **Tab**: Switch between hosts
|
||||||
**Navigation Controls:**
|
- **↑↓ or j/k**: Select services
|
||||||
- **Tab**: Switch between hosts (cmbox, srv01, srv02, steambox, etc.)
|
- **J**: Show service logs (journalctl)
|
||||||
- **Shift+Tab**: Cycle through visible panels (System → Services → Backup → System)
|
- **L**: Show custom log files
|
||||||
- **Up/Down (System/Backup)**: Scroll through panel content
|
|
||||||
- **Up/Down (Services)**: Move service selection cursor between parent services
|
|
||||||
- **q**: Quit dashboard
|
- **q**: Quit dashboard
|
||||||
|
|
||||||
**Panel-Specific Features:**
|
## Core Architecture Principles
|
||||||
- **System Panel**: Scrollable content with CPU, RAM, Storage details
|
|
||||||
- **Services Panel**: Service selection cursor for parent services only (docker, nginx, postgresql, etc.)
|
|
||||||
- **Backup Panel**: Scrollable repository list with proper overflow handling
|
|
||||||
|
|
||||||
**Visual Feedback:**
|
|
||||||
- **Focused Panel**: Blue border and title highlighting
|
|
||||||
- **Service Selection**: Blue background with preserved status icon colors (green ● for active, red ● for failed)
|
|
||||||
- **Focus-Aware Selection**: Selection highlighting only visible when Services panel focused
|
|
||||||
- **Dynamic Statusbar**: Context-aware shortcuts based on focused panel
|
|
||||||
|
|
||||||
### Remote Command Execution - WORKING ✅
|
|
||||||
|
|
||||||
**All Issues Resolved (as of 2025-10-24):**
|
|
||||||
- ✅ **ZMQ Command Protocol**: Extended with ServiceControl and SystemRebuild variants
|
|
||||||
- ✅ **Agent Handlers**: systemctl and nixos-rebuild execution with maintenance mode
|
|
||||||
- ✅ **Dashboard Integration**: Keyboard shortcuts execute commands
|
|
||||||
- ✅ **Service Control**: Fixed toggle logic - replaced with separate 's' (start) and 'S' (stop)
|
|
||||||
- ✅ **System Rebuild**: Fixed permission issues and sandboxing problems
|
|
||||||
- ✅ **Git Clone Approach**: Implemented for nixos-rebuild to avoid directory permissions
|
|
||||||
- ✅ **Visual Feedback**: Directional arrows for service status (↑ starting, ↓ stopping, ↻ restarting)
|
|
||||||
|
|
||||||
### Terminal Popup for Real-time Output - IMPLEMENTED ✅
|
|
||||||
|
|
||||||
**Status (as of 2025-10-26):**
|
|
||||||
- ✅ **Terminal Popup UI**: 80% screen coverage with terminal styling and color-coded output
|
|
||||||
- ✅ **ZMQ Streaming Protocol**: CommandOutputMessage for real-time output transmission
|
|
||||||
- ✅ **Keyboard Controls**: ESC/Q to close, ↑↓ to scroll, manual close (no auto-close)
|
|
||||||
- ✅ **Real-time Display**: Live streaming of command output as it happens
|
|
||||||
- ✅ **Version-based Agent Reporting**: Shows "Agent: v0.1.13" instead of nix store hash
|
|
||||||
|
|
||||||
**Current Implementation Issues:**
|
|
||||||
- ❌ **Agent Process Crashes**: Agent dies during nixos-rebuild execution
|
|
||||||
- ❌ **Inconsistent Output**: Different outputs each time 'R' is pressed
|
|
||||||
- ❌ **Limited Output Visibility**: Not capturing all nixos-rebuild progress
|
|
||||||
|
|
||||||
**PLANNED SOLUTION - Systemd Service Approach:**
|
|
||||||
|
|
||||||
**Problem**: Direct nixos-rebuild execution in agent causes process crashes and inconsistent output.
|
|
||||||
|
|
||||||
**Solution**: Create dedicated systemd service for rebuild operations.
|
|
||||||
|
|
||||||
**Implementation Plan:**
|
|
||||||
1. **NixOS Systemd Service**:
|
|
||||||
```nix
|
|
||||||
systemd.services.cm-rebuild = {
|
|
||||||
description = "CM Dashboard NixOS Rebuild";
|
|
||||||
serviceConfig = {
|
|
||||||
Type = "oneshot";
|
|
||||||
ExecStart = "${pkgs.nixos-rebuild}/bin/nixos-rebuild switch --flake . --option sandbox false";
|
|
||||||
WorkingDirectory = "/var/lib/cm-dashboard/nixos-config";
|
|
||||||
User = "root";
|
|
||||||
StandardOutput = "journal";
|
|
||||||
StandardError = "journal";
|
|
||||||
};
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Agent Modification**:
|
|
||||||
- Replace direct nixos-rebuild execution with: `systemctl start cm-rebuild`
|
|
||||||
- Stream output via: `journalctl -u cm-rebuild -f --no-pager`
|
|
||||||
- Monitor service status for completion detection
|
|
||||||
|
|
||||||
3. **Benefits**:
|
|
||||||
- **Process Isolation**: Service runs independently, won't crash agent
|
|
||||||
- **Consistent Output**: Always same deterministic rebuild process
|
|
||||||
- **Proper Logging**: systemd journal handles all output management
|
|
||||||
- **Resource Management**: systemd manages cleanup and resource limits
|
|
||||||
- **Status Tracking**: Can query service status (running/failed/success)
|
|
||||||
|
|
||||||
**Next Priority**: Implement systemd service approach for reliable rebuild operations.
|
|
||||||
|
|
||||||
**Keyboard Controls Status:**
|
|
||||||
- **Services Panel**:
|
|
||||||
- R (restart) ✅ Working
|
|
||||||
- s (start) ✅ Working
|
|
||||||
- S (stop) ✅ Working
|
|
||||||
- **System Panel**: R (nixos-rebuild) ✅ Working with --option sandbox false
|
|
||||||
- **Backup Panel**: B (trigger backup) ❓ Not implemented
|
|
||||||
|
|
||||||
**Visual Feedback Implementation - IN PROGRESS:**
|
|
||||||
|
|
||||||
Context-appropriate progress indicators for each panel:
|
|
||||||
|
|
||||||
**Services Panel** (Service status transitions):
|
|
||||||
```
|
|
||||||
● nginx active → ⏳ nginx restarting → ● nginx active
|
|
||||||
● docker active → ⏳ docker stopping → ● docker inactive
|
|
||||||
```
|
|
||||||
|
|
||||||
**System Panel** (Build progress in NixOS section):
|
|
||||||
```
|
|
||||||
NixOS:
|
|
||||||
Build: 25.05.20251004.3bcc93c → Build: [████████████ ] 65%
|
|
||||||
Active users: cm, simon Active users: cm, simon
|
|
||||||
```
|
|
||||||
|
|
||||||
**Backup Panel** (OnGoing status with progress):
|
|
||||||
```
|
|
||||||
Latest backup: → Latest backup:
|
|
||||||
● 2024-10-23 14:32:15 ● OnGoing
|
|
||||||
└─ Duration: 1.3m └─ [██████ ] 60%
|
|
||||||
```
|
|
||||||
|
|
||||||
**Critical Configuration Hash Fix - HIGH PRIORITY:**
|
|
||||||
|
|
||||||
**Problem:** Configuration hash currently shows git commit hash instead of actual deployed system hash.
|
|
||||||
|
|
||||||
**Current (incorrect):**
|
|
||||||
- Shows git hash: `db11f82` (source repository commit)
|
|
||||||
- Not accurate - doesn't reflect what's actually deployed
|
|
||||||
|
|
||||||
**Target (correct):**
|
|
||||||
- Show nix store hash: `d8ivwiar` (first 8 chars from deployed system)
|
|
||||||
- Source: `/nix/store/d8ivwiarhwhgqzskj6q2482r58z46qjf-nixos-system-cmbox-25.05.20251004.3bcc93c`
|
|
||||||
- Pattern: Extract hash from `/nix/store/HASH-nixos-system-HOSTNAME-VERSION`
|
|
||||||
|
|
||||||
**Benefits:**
|
|
||||||
1. **Deployment Verification:** Confirms rebuild actually succeeded
|
|
||||||
2. **Accurate Status:** Shows what's truly running, not just source
|
|
||||||
3. **Rebuild Completion Detection:** Hash change = rebuild completed
|
|
||||||
4. **Rollback Tracking:** Each deployment has unique identifier
|
|
||||||
|
|
||||||
**Implementation Required:**
|
|
||||||
1. Agent extracts nix store hash from `ls -la /run/current-system`
|
|
||||||
2. Reports this as `system_config_hash` metric instead of git hash
|
|
||||||
3. Dashboard displays first 8 characters: `Config: d8ivwiar`
|
|
||||||
|
|
||||||
**Next Session Priority Tasks:**
|
|
||||||
|
|
||||||
**Remaining Features:**
|
|
||||||
1. **Fix Configuration Hash Display (CRITICAL)**:
|
|
||||||
- Use nix store hash instead of git commit hash
|
|
||||||
- Extract from `/run/current-system` -> `/nix/store/HASH-nixos-system-*`
|
|
||||||
- Enables proper rebuild completion detection
|
|
||||||
|
|
||||||
2. **Command Response Protocol**:
|
|
||||||
- Agent sends command completion/failure back to dashboard via ZMQ
|
|
||||||
- Dashboard updates UI status from ⏳ to ● when commands complete
|
|
||||||
- Clear success/failure status after timeout
|
|
||||||
|
|
||||||
3. **Backup Panel Features**:
|
|
||||||
- Implement backup trigger functionality (B key)
|
|
||||||
- Complete visual feedback for backup operations
|
|
||||||
- Add backup progress indicators
|
|
||||||
|
|
||||||
**Enhancement Tasks:**
|
|
||||||
- Add confirmation dialogs for destructive actions (stop/restart/rebuild)
|
|
||||||
- Implement command history/logging
|
|
||||||
- Add keyboard shortcuts help overlay
|
|
||||||
|
|
||||||
**Future Enhanced Navigation:**
|
|
||||||
- Add Page Up/Down for faster scrolling through long service lists
|
|
||||||
- Implement search/filter functionality for services
|
|
||||||
- Add jump-to-service shortcuts (first letter navigation)
|
|
||||||
|
|
||||||
**Future Advanced Features:**
|
|
||||||
- Service dependency visualization
|
|
||||||
- Historical service status tracking
|
|
||||||
- Real-time log viewing integration
|
|
||||||
|
|
||||||
## Core Architecture Principles - CRITICAL
|
|
||||||
|
|
||||||
### Individual Metrics Philosophy
|
### Individual Metrics Philosophy
|
||||||
|
- Agent collects individual metrics, dashboard composes widgets
|
||||||
**NEW ARCHITECTURE**: Agent collects individual metrics, dashboard composes widgets from those metrics.
|
- Each metric collected, transmitted, and stored individually
|
||||||
|
- Agent calculates status for each metric using thresholds
|
||||||
|
- Dashboard aggregates individual metric statuses for widget status
|
||||||
|
|
||||||
### Maintenance Mode
|
### Maintenance Mode
|
||||||
|
|
||||||
**Purpose:**
|
|
||||||
|
|
||||||
- Suppress email notifications during planned maintenance or backups
|
|
||||||
- Prevents false alerts when services are intentionally stopped
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
|
|
||||||
- Agent checks for `/tmp/cm-maintenance` file before sending notifications
|
- Agent checks for `/tmp/cm-maintenance` file before sending notifications
|
||||||
- File presence suppresses all email notifications while continuing monitoring
|
- File presence suppresses all email notifications while continuing monitoring
|
||||||
- Dashboard continues to show real status, only notifications are blocked
|
- Dashboard continues to show real status, only notifications are blocked
|
||||||
|
|
||||||
**Usage:**
|
Usage:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Enable maintenance mode
|
# Enable maintenance mode
|
||||||
touch /tmp/cm-maintenance
|
touch /tmp/cm-maintenance
|
||||||
|
|
||||||
# Run maintenance tasks (backups, service restarts, etc.)
|
# Run maintenance tasks
|
||||||
systemctl stop service
|
systemctl stop service
|
||||||
# ... maintenance work ...
|
# ... maintenance work ...
|
||||||
systemctl start service
|
systemctl start service
|
||||||
@@ -280,61 +80,84 @@ systemctl start service
|
|||||||
rm /tmp/cm-maintenance
|
rm /tmp/cm-maintenance
|
||||||
```
|
```
|
||||||
|
|
||||||
**NixOS Integration:**
|
## Development and Deployment Architecture
|
||||||
|
|
||||||
- Borgbackup script automatically creates/removes maintenance file
|
### Development Path
|
||||||
- Automatic cleanup via trap ensures maintenance mode doesn't stick
|
- **Location:** `~/projects/cm-dashboard`
|
||||||
- All cinfiguration are shall be done from nixos config
|
- **Purpose:** Development workflow only - for committing new code
|
||||||
|
- **Access:** Only for developers to commit changes
|
||||||
|
|
||||||
**ARCHITECTURE ENFORCEMENT**:
|
### Deployment Path
|
||||||
|
- **Location:** `/var/lib/cm-dashboard/nixos-config`
|
||||||
|
- **Purpose:** Production deployment only - agent clones/pulls from git
|
||||||
|
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild
|
||||||
|
|
||||||
- **ZERO legacy code reuse** - Fresh implementation following ARCHITECT.md exactly
|
### Git Flow
|
||||||
- **Individual metrics only** - NO grouped metric structures
|
```
|
||||||
- **Reference-only legacy** - Study old functionality, implement new architecture
|
Development: ~/projects/cm-dashboard → git commit → git push
|
||||||
- **Clean slate mindset** - Build as if legacy codebase never existed
|
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
|
||||||
|
```
|
||||||
|
|
||||||
**Implementation Rules**:
|
## Automated Binary Release System
|
||||||
|
|
||||||
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
|
CM Dashboard uses automated binary releases instead of source builds.
|
||||||
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
|
|
||||||
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
|
|
||||||
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
|
|
||||||
**Testing & Building**:
|
|
||||||
|
|
||||||
- **Workspace builds**: `cargo build --workspace` for all testing
|
### Creating New Releases
|
||||||
- **Clean compilation**: Remove `target/` between architecture changes
|
```bash
|
||||||
- **ZMQ testing**: Test agent-dashboard communication independently
|
cd ~/projects/cm-dashboard
|
||||||
- **Widget testing**: Verify UI layout matches legacy appearance exactly
|
git tag v0.1.X
|
||||||
|
git push origin v0.1.X
|
||||||
|
```
|
||||||
|
|
||||||
**NEVER in New Implementation**:
|
This automatically:
|
||||||
|
- Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
|
||||||
|
- Creates GitHub-style release with tarball
|
||||||
|
- Uploads binaries via Gitea API
|
||||||
|
|
||||||
- Copy/paste ANY code from legacy backup
|
### NixOS Configuration Updates
|
||||||
- Calculate status in dashboard widgets
|
Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
|
||||||
- Hardcode metric names in widgets (use const arrays)
|
|
||||||
|
|
||||||
# Important Communication Guidelines
|
```nix
|
||||||
|
version = "v0.1.X";
|
||||||
|
src = pkgs.fetchurl {
|
||||||
|
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
|
||||||
|
sha256 = "sha256-NEW_HASH_HERE";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise.
|
### Get Release Hash
|
||||||
|
```bash
|
||||||
|
cd ~/projects/nixosbox
|
||||||
|
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
|
||||||
|
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
|
||||||
|
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
|
||||||
|
}' 2>&1 | grep "got:"
|
||||||
|
```
|
||||||
|
|
||||||
NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation.
|
### Building
|
||||||
|
|
||||||
|
**Testing & Building:**
|
||||||
|
- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
|
||||||
|
- **Clean compilation**: Remove `target/` between major changes
|
||||||
|
|
||||||
|
## Important Communication Guidelines
|
||||||
|
|
||||||
|
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
|
||||||
|
|
||||||
## Commit Message Guidelines
|
## Commit Message Guidelines
|
||||||
|
|
||||||
**NEVER mention:**
|
**NEVER mention:**
|
||||||
|
|
||||||
- Claude or any AI assistant names
|
- Claude or any AI assistant names
|
||||||
- Automation or AI-generated content
|
- Automation or AI-generated content
|
||||||
- Any reference to automated code generation
|
- Any reference to automated code generation
|
||||||
|
|
||||||
**ALWAYS:**
|
**ALWAYS:**
|
||||||
|
|
||||||
- Focus purely on technical changes and their purpose
|
- Focus purely on technical changes and their purpose
|
||||||
- Use standard software development commit message format
|
- Use standard software development commit message format
|
||||||
- Describe what was changed and why, not how it was created
|
- Describe what was changed and why, not how it was created
|
||||||
- Write from the perspective of a human developer
|
- Write from the perspective of a human developer
|
||||||
|
|
||||||
**Examples:**
|
**Examples:**
|
||||||
|
|
||||||
- ❌ "Generated with Claude Code"
|
- ❌ "Generated with Claude Code"
|
||||||
- ❌ "AI-assisted implementation"
|
- ❌ "AI-assisted implementation"
|
||||||
- ❌ "Automated refactoring"
|
- ❌ "Automated refactoring"
|
||||||
@@ -342,83 +165,22 @@ NEVER implement code without first getting explicit user agreement on the approa
|
|||||||
- ✅ "Restructure storage widget with improved layout"
|
- ✅ "Restructure storage widget with improved layout"
|
||||||
- ✅ "Update CPU thresholds to production values"
|
- ✅ "Update CPU thresholds to production values"
|
||||||
|
|
||||||
## Development and Deployment Architecture
|
## Implementation Rules
|
||||||
|
|
||||||
**CRITICAL:** Development and deployment paths are completely separate:
|
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
|
||||||
|
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
|
||||||
|
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
|
||||||
|
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
|
||||||
|
|
||||||
### Development Path
|
**NEVER:**
|
||||||
- **Location:** `~/projects/nixosbox`
|
- Copy/paste ANY code from legacy implementations
|
||||||
- **Purpose:** Development workflow only - for committing new cm-dashboard code
|
- Calculate status in dashboard widgets
|
||||||
- **Access:** Only for developers to commit changes
|
- Hardcode metric names in widgets (use const arrays)
|
||||||
- **Code Access:** Running cm-dashboard code shall NEVER access this path
|
- Create files unless absolutely necessary for achieving goals
|
||||||
|
- Create documentation files unless explicitly requested
|
||||||
|
|
||||||
### Deployment Path
|
**ALWAYS:**
|
||||||
- **Location:** `/var/lib/cm-dashboard/nixos-config`
|
- Prefer editing existing files to creating new ones
|
||||||
- **Purpose:** Production deployment only - agent clones/pulls from git
|
- Follow existing code conventions and patterns
|
||||||
- **Access:** Only cm-dashboard agent for deployment operations
|
- Use existing libraries and utilities
|
||||||
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild
|
- Follow security best practices
|
||||||
|
|
||||||
### Git Flow
|
|
||||||
```
|
|
||||||
Development: ~/projects/nixosbox → git commit → git push
|
|
||||||
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
|
|
||||||
```
|
|
||||||
|
|
||||||
## Automated Binary Release System
|
|
||||||
|
|
||||||
**IMPLEMENTED:** cm-dashboard now uses automated binary releases instead of source builds.
|
|
||||||
|
|
||||||
### Release Workflow
|
|
||||||
|
|
||||||
1. **Automated Release Creation**
|
|
||||||
- Gitea Actions workflow builds static binaries on tag push
|
|
||||||
- Creates release with `cm-dashboard-linux-x86_64.tar.gz` tarball
|
|
||||||
- No manual intervention required for binary generation
|
|
||||||
|
|
||||||
2. **Creating New Releases**
|
|
||||||
```bash
|
|
||||||
cd ~/projects/cm-dashboard
|
|
||||||
git tag v0.1.X
|
|
||||||
git push origin v0.1.X
|
|
||||||
```
|
|
||||||
|
|
||||||
This automatically:
|
|
||||||
- Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
|
|
||||||
- Creates GitHub-style release with tarball
|
|
||||||
- Uploads binaries via Gitea API
|
|
||||||
|
|
||||||
3. **NixOS Configuration Updates**
|
|
||||||
Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
|
|
||||||
|
|
||||||
```nix
|
|
||||||
version = "v0.1.X";
|
|
||||||
src = pkgs.fetchurl {
|
|
||||||
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
|
|
||||||
sha256 = "sha256-NEW_HASH_HERE";
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
4. **Get Release Hash**
|
|
||||||
```bash
|
|
||||||
cd ~/projects/nixosbox
|
|
||||||
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
|
|
||||||
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
|
|
||||||
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
|
|
||||||
}' 2>&1 | grep "got:"
|
|
||||||
```
|
|
||||||
|
|
||||||
5. **Commit and Deploy**
|
|
||||||
```bash
|
|
||||||
cd ~/projects/nixosbox
|
|
||||||
git add hosts/common/cm-dashboard.nix
|
|
||||||
git commit -m "Update cm-dashboard to v0.1.X with static binaries"
|
|
||||||
git push
|
|
||||||
```
|
|
||||||
|
|
||||||
### Benefits
|
|
||||||
|
|
||||||
- **No compilation overhead** on each host
|
|
||||||
- **Consistent static binaries** across all hosts
|
|
||||||
- **Faster deployments** - download vs compile
|
|
||||||
- **No library dependency issues** - static linking
|
|
||||||
- **Automated pipeline** - tag push triggers everything
|
|
||||||
13
Cargo.lock
generated
13
Cargo.lock
generated
@@ -270,7 +270,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "cm-dashboard"
|
name = "cm-dashboard"
|
||||||
version = "0.1.24"
|
version = "0.1.56"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"anyhow",
|
"anyhow",
|
||||||
"chrono",
|
"chrono",
|
||||||
@@ -286,12 +286,13 @@ dependencies = [
|
|||||||
"toml",
|
"toml",
|
||||||
"tracing",
|
"tracing",
|
||||||
"tracing-subscriber",
|
"tracing-subscriber",
|
||||||
|
"wake-on-lan",
|
||||||
"zmq",
|
"zmq",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "cm-dashboard-agent"
|
name = "cm-dashboard-agent"
|
||||||
version = "0.1.24"
|
version = "0.1.56"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"anyhow",
|
"anyhow",
|
||||||
"async-trait",
|
"async-trait",
|
||||||
@@ -314,7 +315,7 @@ dependencies = [
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "cm-dashboard-shared"
|
name = "cm-dashboard-shared"
|
||||||
version = "0.1.24"
|
version = "0.1.56"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"chrono",
|
"chrono",
|
||||||
"serde",
|
"serde",
|
||||||
@@ -2064,6 +2065,12 @@ version = "0.9.5"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
|
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "wake-on-lan"
|
||||||
|
version = "0.2.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "1ccf60b60ad7e5b1b37372c5134cbcab4db0706c231d212e0c643a077462bc8f"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "walkdir"
|
name = "walkdir"
|
||||||
version = "2.5.0"
|
version = "2.5.0"
|
||||||
|
|||||||
513
README.md
513
README.md
@@ -1,88 +1,106 @@
|
|||||||
# CM Dashboard
|
# CM Dashboard
|
||||||
|
|
||||||
A real-time infrastructure monitoring system with intelligent status aggregation and email notifications, built with Rust and ZMQ.
|
A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built with ZMQ-based metric collection and individual metrics architecture.
|
||||||
|
|
||||||
## Current Implementation
|
## Features
|
||||||
|
|
||||||
This is a complete rewrite implementing an **individual metrics architecture** where:
|
### Core Monitoring
|
||||||
|
- **Real-time metrics**: CPU, RAM, Storage, and Service status
|
||||||
|
- **Multi-host support**: Monitor multiple servers from single dashboard
|
||||||
|
- **Service management**: Start/stop services with intelligent status tracking
|
||||||
|
- **NixOS integration**: System rebuild via SSH + tmux popup
|
||||||
|
- **Backup monitoring**: Borgbackup status and scheduling
|
||||||
|
- **Email notifications**: Intelligent batching prevents spam
|
||||||
|
|
||||||
- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status
|
### User-Stopped Service Tracking
|
||||||
- **Dashboard** subscribes to specific metrics and composes widgets
|
Services stopped via the dashboard are intelligently tracked to prevent false alerts:
|
||||||
- **Status Aggregation** provides intelligent email notifications with batching
|
|
||||||
- **Persistent Cache** prevents false notifications on restart
|
|
||||||
|
|
||||||
## Dashboard Interface
|
- **Smart status reporting**: User-stopped services show as Status::OK instead of Warning
|
||||||
|
- **Persistent storage**: Tracking survives agent restarts via JSON storage
|
||||||
|
- **Automatic management**: Flags cleared when services restarted via dashboard
|
||||||
|
- **Maintenance friendly**: No false alerts during intentional service operations
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
### Individual Metrics Philosophy
|
||||||
|
- **Agent**: Collects individual metrics, calculates status using thresholds
|
||||||
|
- **Dashboard**: Subscribes to specific metrics, composes widgets from individual data
|
||||||
|
- **ZMQ Communication**: Efficient real-time metric transmission
|
||||||
|
- **Status Aggregation**: Host-level status calculated from all service metrics
|
||||||
|
|
||||||
|
### Components
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ZMQ ┌─────────────────┐
|
||||||
|
│ │◄──────────►│ │
|
||||||
|
│ Agent │ Metrics │ Dashboard │
|
||||||
|
│ - Collectors │ │ - TUI │
|
||||||
|
│ - Status │ │ - Widgets │
|
||||||
|
│ - Tracking │ │ - Commands │
|
||||||
|
│ │ │ │
|
||||||
|
└─────────────────┘ └─────────────────┘
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌─────────────────┐ ┌─────────────────┐
|
||||||
|
│ JSON Storage │ │ SSH + tmux │
|
||||||
|
│ - User-stopped │ │ - Remote rebuild│
|
||||||
|
│ - Cache │ │ - Process │
|
||||||
|
│ - State │ │ isolation │
|
||||||
|
└─────────────────┘ └─────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Service Control Flow
|
||||||
|
|
||||||
|
1. **User Action**: Dashboard sends `UserStart`/`UserStop` commands
|
||||||
|
2. **Agent Processing**:
|
||||||
|
- Marks service as user-stopped (if stopping)
|
||||||
|
- Executes `systemctl start/stop service`
|
||||||
|
- Syncs state to global tracker
|
||||||
|
3. **Status Calculation**:
|
||||||
|
- Systemd collector checks user-stopped flag
|
||||||
|
- Reports Status::OK for user-stopped inactive services
|
||||||
|
- Normal Warning status for system failures
|
||||||
|
|
||||||
|
## Interface
|
||||||
|
|
||||||
```
|
```
|
||||||
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
||||||
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
|
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
|
||||||
│CPU: ││Service: Status: RAM: Disk: │
|
│NixOS: ││Service: Status: RAM: Disk: │
|
||||||
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │
|
│Build: 25.05.20251004.3bcc93c ││● docker active 27M 496MB │
|
||||||
│RAM: ││● docker-registry active 19M 496MB │
|
│Agent: v0.1.43 ││● gitea active 579M 2.6GB │
|
||||||
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │
|
│Active users: cm, simon ││● nginx active 28M 24MB │
|
||||||
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │
|
│CPU: ││ ├─ ● gitea.cmtec.se 51ms │
|
||||||
│Disk nvme0n1: ││● haasp-core active 9M 1MB │
|
│● Load: 0.10 0.52 0.88 • 3000MHz ││ ├─ ● photos.cmtec.se 41ms │
|
||||||
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │
|
│RAM: ││● postgresql active 112M 357MB │
|
||||||
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │
|
│● Usage: 33% 2.6GB/7.6GB ││● redis-immich user-stopped │
|
||||||
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │
|
│● /tmp: 0% 0B/2.0GB ││● sshd active 2M 0 │
|
||||||
│ ││● mosquitto active 1M 1MB │
|
│Storage: ││● unifi active 594M 495MB │
|
||||||
│ ││● mysql active 38M 225MB │
|
│● root (Single): ││ │
|
||||||
│ ││● nginx active 28M 24MB │
|
│ ├─ ● nvme0n1 W: 1% ││ │
|
||||||
│ ││ ├─ ● gitea.cmtec.se 51ms │
|
│ └─ ● 18% 167.4GB/928.2GB ││ │
|
||||||
│ ││ ├─ ● haasp.cmtec.se 43ms │
|
|
||||||
│ ││ ├─ ● haasp.net 43ms │
|
|
||||||
│ ││ ├─ ● pages.cmtec.se 45ms │
|
|
||||||
└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
|
|
||||||
┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
|
|
||||||
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
|
|
||||||
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
|
|
||||||
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
|
|
||||||
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
|
|
||||||
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
|
|
||||||
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
|
|
||||||
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
|
|
||||||
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
|
|
||||||
│● kryddorten 2 archives 67.6MB ││ │
|
|
||||||
│● mariehall2 2 archives 321.8MB ││ │
|
|
||||||
│● nixosbox 2 archives 4.5MB ││ │
|
|
||||||
│● unifi 2 archives 2.9MB ││ │
|
|
||||||
│● vaultwarden 2 archives 305kB ││ │
|
|
||||||
└────────────────────────────────────┘└─────────────────────────────────────────────────┘
|
└────────────────────────────────────┘└─────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
**Navigation**: `←→` switch hosts, `r` refresh, `q` quit
|
### Navigation
|
||||||
|
- **Tab**: Switch between hosts
|
||||||
|
- **↑↓ or j/k**: Navigate services
|
||||||
|
- **s**: Start selected service (UserStart)
|
||||||
|
- **S**: Stop selected service (UserStop)
|
||||||
|
- **J**: Show service logs (journalctl in tmux popup)
|
||||||
|
- **R**: Rebuild current host
|
||||||
|
- **q**: Quit
|
||||||
|
|
||||||
## Features
|
### Status Indicators
|
||||||
|
- **Green ●**: Active service
|
||||||
- **Real-time monitoring** - Dashboard updates every 1-2 seconds
|
- **Yellow ◐**: Inactive service (system issue)
|
||||||
- **Individual metric collection** - Granular data for flexible dashboard composition
|
- **Red ◯**: Failed service
|
||||||
- **Intelligent status aggregation** - Host-level status calculated from all services
|
- **Blue arrows**: Service transitioning (↑ starting, ↓ stopping, ↻ restarting)
|
||||||
- **Smart email notifications** - Batched, detailed alerts with service groupings
|
- **"user-stopped"**: Service stopped via dashboard (Status::OK)
|
||||||
- **Persistent state** - Prevents false notifications on restarts
|
|
||||||
- **ZMQ communication** - Efficient agent-to-dashboard messaging
|
|
||||||
- **Clean TUI** - Terminal-based dashboard with color-coded status indicators
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
### Core Components
|
|
||||||
|
|
||||||
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
|
|
||||||
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
|
|
||||||
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
|
|
||||||
- **Status Aggregation** - Intelligent batching and notification management
|
|
||||||
- **Persistent Cache** - Maintains state across restarts
|
|
||||||
|
|
||||||
### Status Levels
|
|
||||||
|
|
||||||
- **🟢 Ok** - Service running normally
|
|
||||||
- **🔵 Pending** - Service starting/stopping/reloading
|
|
||||||
- **🟡 Warning** - Service issues (high load, memory, disk usage)
|
|
||||||
- **🔴 Critical** - Service failed or critical thresholds exceeded
|
|
||||||
- **❓ Unknown** - Service state cannot be determined
|
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
### Build
|
### Building
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# With Nix (recommended)
|
# With Nix (recommended)
|
||||||
@@ -93,21 +111,20 @@ sudo apt install libssl-dev pkg-config # Ubuntu/Debian
|
|||||||
cargo build --workspace
|
cargo build --workspace
|
||||||
```
|
```
|
||||||
|
|
||||||
### Run
|
### Running
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Start agent (requires configuration file)
|
# Start agent (requires configuration)
|
||||||
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
|
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
|
||||||
|
|
||||||
# Start dashboard
|
# Start dashboard (inside tmux session)
|
||||||
./target/debug/cm-dashboard --config /path/to/dashboard.toml
|
tmux
|
||||||
|
./target/debug/cm-dashboard --config /etc/cm-dashboard/dashboard.toml
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
### Agent Configuration (`agent.toml`)
|
### Agent Configuration
|
||||||
|
|
||||||
The agent requires a comprehensive TOML configuration file:
|
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
collection_interval_seconds = 2
|
collection_interval_seconds = 2
|
||||||
@@ -116,50 +133,27 @@ collection_interval_seconds = 2
|
|||||||
publisher_port = 6130
|
publisher_port = 6130
|
||||||
command_port = 6131
|
command_port = 6131
|
||||||
bind_address = "0.0.0.0"
|
bind_address = "0.0.0.0"
|
||||||
timeout_ms = 5000
|
transmission_interval_seconds = 2
|
||||||
heartbeat_interval_ms = 30000
|
|
||||||
|
|
||||||
[collectors.cpu]
|
[collectors.cpu]
|
||||||
enabled = true
|
enabled = true
|
||||||
interval_seconds = 2
|
interval_seconds = 2
|
||||||
load_warning_threshold = 9.0
|
load_warning_threshold = 5.0
|
||||||
load_critical_threshold = 10.0
|
load_critical_threshold = 10.0
|
||||||
temperature_warning_threshold = 100.0
|
|
||||||
temperature_critical_threshold = 110.0
|
|
||||||
|
|
||||||
[collectors.memory]
|
[collectors.memory]
|
||||||
enabled = true
|
enabled = true
|
||||||
interval_seconds = 2
|
interval_seconds = 2
|
||||||
usage_warning_percent = 80.0
|
usage_warning_percent = 80.0
|
||||||
usage_critical_percent = 95.0
|
|
||||||
|
|
||||||
[collectors.disk]
|
|
||||||
enabled = true
|
|
||||||
interval_seconds = 300
|
|
||||||
usage_warning_percent = 80.0
|
|
||||||
usage_critical_percent = 90.0
|
usage_critical_percent = 90.0
|
||||||
|
|
||||||
[[collectors.disk.filesystems]]
|
|
||||||
name = "root"
|
|
||||||
uuid = "4cade5ce-85a5-4a03-83c8-dfd1d3888d79"
|
|
||||||
mount_point = "/"
|
|
||||||
fs_type = "ext4"
|
|
||||||
monitor = true
|
|
||||||
|
|
||||||
[collectors.systemd]
|
[collectors.systemd]
|
||||||
enabled = true
|
enabled = true
|
||||||
interval_seconds = 10
|
interval_seconds = 10
|
||||||
memory_warning_mb = 1000.0
|
service_name_filters = ["nginx*", "postgresql*", "docker*", "sshd*"]
|
||||||
memory_critical_mb = 2000.0
|
excluded_services = ["nginx-config-reload", "systemd-", "getty@"]
|
||||||
service_name_filters = [
|
nginx_latency_critical_ms = 1000.0
|
||||||
"nginx*", "postgresql*", "redis*", "docker*", "sshd*",
|
http_timeout_seconds = 10
|
||||||
"gitea*", "immich*", "haasp*", "mosquitto*", "mysql*",
|
|
||||||
"unifi*", "vaultwarden*"
|
|
||||||
]
|
|
||||||
excluded_services = [
|
|
||||||
"nginx-config-reload", "sshd-keygen", "systemd-",
|
|
||||||
"getty@", "user@", "dbus-", "NetworkManager-"
|
|
||||||
]
|
|
||||||
|
|
||||||
[notifications]
|
[notifications]
|
||||||
enabled = true
|
enabled = true
|
||||||
@@ -167,251 +161,202 @@ smtp_host = "localhost"
|
|||||||
smtp_port = 25
|
smtp_port = 25
|
||||||
from_email = "{hostname}@example.com"
|
from_email = "{hostname}@example.com"
|
||||||
to_email = "admin@example.com"
|
to_email = "admin@example.com"
|
||||||
rate_limit_minutes = 0
|
aggregation_interval_seconds = 30
|
||||||
trigger_on_warnings = true
|
|
||||||
trigger_on_failures = true
|
|
||||||
recovery_requires_all_ok = true
|
|
||||||
suppress_individual_recoveries = true
|
|
||||||
|
|
||||||
[status_aggregation]
|
|
||||||
enabled = true
|
|
||||||
aggregation_method = "worst_case"
|
|
||||||
notification_interval_seconds = 30
|
|
||||||
|
|
||||||
[cache]
|
|
||||||
persist_path = "/var/lib/cm-dashboard/cache.json"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Dashboard Configuration (`dashboard.toml`)
|
### Dashboard Configuration
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
[zmq]
|
[zmq]
|
||||||
hosts = [
|
subscriber_ports = [6130]
|
||||||
{ name = "server1", address = "192.168.1.100", port = 6130 },
|
|
||||||
{ name = "server2", address = "192.168.1.101", port = 6130 }
|
[hosts]
|
||||||
]
|
predefined_hosts = ["cmbox", "srv01", "srv02"]
|
||||||
connection_timeout_ms = 5000
|
|
||||||
reconnect_interval_ms = 10000
|
|
||||||
|
|
||||||
[ui]
|
[ui]
|
||||||
refresh_interval_ms = 1000
|
ssh_user = "cm"
|
||||||
theme = "dark"
|
rebuild_alias = "nixos-rebuild-cmtec"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Collectors
|
## Technical Implementation
|
||||||
|
|
||||||
The agent implements several specialized collectors:
|
### Collectors
|
||||||
|
|
||||||
### CPU Collector (`cpu.rs`)
|
#### Systemd Collector
|
||||||
|
- **Service Discovery**: Uses `systemctl list-unit-files` + `list-units --all`
|
||||||
|
- **Status Calculation**: Checks user-stopped flag before assigning Warning status
|
||||||
|
- **Memory Tracking**: Per-service memory usage via `systemctl show`
|
||||||
|
- **Sub-services**: Nginx site latency, Docker containers
|
||||||
|
- **User-stopped Integration**: `UserStoppedServiceTracker::is_service_user_stopped()`
|
||||||
|
|
||||||
- Load average (1, 5, 15 minute)
|
#### User-Stopped Service Tracker
|
||||||
- CPU temperature monitoring
|
- **Storage**: `/var/lib/cm-dashboard/user-stopped-services.json`
|
||||||
- Real-time process monitoring (top CPU consumers)
|
- **Thread Safety**: Global singleton with `Arc<Mutex<>>`
|
||||||
- Status calculation with configurable thresholds
|
- **Persistence**: Automatic save on state changes
|
||||||
|
- **Global Access**: Static methods for collector integration
|
||||||
|
|
||||||
### Memory Collector (`memory.rs`)
|
#### Other Collectors
|
||||||
|
- **CPU**: Load average, temperature, frequency monitoring
|
||||||
|
- **Memory**: RAM/swap usage, tmpfs monitoring
|
||||||
|
- **Disk**: Filesystem usage, SMART health data
|
||||||
|
- **NixOS**: Build version, active users, agent version
|
||||||
|
- **Backup**: Borgbackup repository status and metrics
|
||||||
|
|
||||||
- RAM usage (total, used, available)
|
### ZMQ Protocol
|
||||||
- Swap monitoring
|
|
||||||
- Real-time process monitoring (top RAM consumers)
|
|
||||||
- Memory pressure detection
|
|
||||||
|
|
||||||
### Disk Collector (`disk.rs`)
|
```rust
|
||||||
|
// Metric Message
|
||||||
|
#[derive(Serialize, Deserialize)]
|
||||||
|
pub struct MetricMessage {
|
||||||
|
pub hostname: String,
|
||||||
|
pub timestamp: u64,
|
||||||
|
pub metrics: Vec<Metric>,
|
||||||
|
}
|
||||||
|
|
||||||
- Filesystem usage per mount point
|
// Service Commands
|
||||||
- SMART health monitoring
|
pub enum AgentCommand {
|
||||||
- Temperature and wear tracking
|
ServiceControl {
|
||||||
- Configurable filesystem monitoring
|
service_name: String,
|
||||||
|
action: ServiceAction,
|
||||||
|
},
|
||||||
|
SystemRebuild { /* SSH config */ },
|
||||||
|
CollectNow,
|
||||||
|
}
|
||||||
|
|
||||||
### Systemd Collector (`systemd.rs`)
|
pub enum ServiceAction {
|
||||||
|
Start, // System-initiated
|
||||||
|
Stop, // System-initiated
|
||||||
|
UserStart, // User via dashboard (clears user-stopped)
|
||||||
|
UserStop, // User via dashboard (marks user-stopped)
|
||||||
|
Status,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
- Service status monitoring (`active`, `inactive`, `failed`)
|
### Maintenance Mode
|
||||||
- Memory usage per service
|
|
||||||
- Service filtering and exclusions
|
|
||||||
- Handles transitional states (`Status::Pending`)
|
|
||||||
|
|
||||||
### Backup Collector (`backup.rs`)
|
Suppress notifications during planned maintenance:
|
||||||
|
|
||||||
- Reads TOML status files from backup systems
|
```bash
|
||||||
- Archive age verification
|
# Enable maintenance mode
|
||||||
- Disk usage tracking
|
touch /tmp/cm-maintenance
|
||||||
- Repository health monitoring
|
|
||||||
|
# Perform maintenance
|
||||||
|
systemctl stop service
|
||||||
|
# ... work ...
|
||||||
|
systemctl start service
|
||||||
|
|
||||||
|
# Disable maintenance mode
|
||||||
|
rm /tmp/cm-maintenance
|
||||||
|
```
|
||||||
|
|
||||||
## Email Notifications
|
## Email Notifications
|
||||||
|
|
||||||
### Intelligent Batching
|
### Intelligent Batching
|
||||||
|
- **Real-time dashboard**: Immediate status updates
|
||||||
|
- **Batched emails**: Aggregated every 30 seconds
|
||||||
|
- **Smart grouping**: Services organized by severity
|
||||||
|
- **Recovery suppression**: Reduces notification spam
|
||||||
|
|
||||||
The system implements smart notification batching to prevent email spam:
|
### Example Alert
|
||||||
|
|
||||||
- **Real-time dashboard updates** - Status changes appear immediately
|
|
||||||
- **Batched email notifications** - Aggregated every 30 seconds
|
|
||||||
- **Detailed groupings** - Services organized by severity
|
|
||||||
|
|
||||||
### Example Alert Email
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Subject: Status Alert: 2 critical, 1 warning, 15 started
|
Subject: Status Alert: 1 critical, 2 warnings, 0 recoveries
|
||||||
|
|
||||||
Status Summary (30s duration)
|
Status Summary (30s duration)
|
||||||
Host Status: Ok → Warning
|
Host Status: Ok → Warning
|
||||||
|
|
||||||
🔴 CRITICAL ISSUES (2):
|
🔴 CRITICAL ISSUES (1):
|
||||||
postgresql: Ok → Critical
|
postgresql: Ok → Critical (memory usage 95%)
|
||||||
nginx: Warning → Critical
|
|
||||||
|
|
||||||
🟡 WARNINGS (1):
|
🟡 WARNINGS (2):
|
||||||
redis: Ok → Warning (memory usage 85%)
|
nginx: Ok → Warning (high load 8.5)
|
||||||
|
redis: user-stopped → Warning (restarted by system)
|
||||||
|
|
||||||
✅ RECOVERIES (0):
|
✅ RECOVERIES (0):
|
||||||
|
|
||||||
🟢 SERVICE STARTUPS (15):
|
|
||||||
docker: Unknown → Ok
|
|
||||||
sshd: Unknown → Ok
|
|
||||||
...
|
|
||||||
|
|
||||||
--
|
--
|
||||||
CM Dashboard Agent
|
CM Dashboard Agent v0.1.43
|
||||||
Generated at 2025-10-21 19:42:42 CET
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Individual Metrics Architecture
|
|
||||||
|
|
||||||
The system follows a **metrics-first architecture**:
|
|
||||||
|
|
||||||
### Agent Side
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// Agent collects individual metrics
|
|
||||||
vec![
|
|
||||||
Metric::new("cpu_load_1min".to_string(), MetricValue::Float(2.5), Status::Ok),
|
|
||||||
Metric::new("memory_usage_percent".to_string(), MetricValue::Float(78.5), Status::Warning),
|
|
||||||
Metric::new("service_nginx_status".to_string(), MetricValue::String("active".to_string()), Status::Ok),
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Dashboard Side
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// Widgets subscribe to specific metrics
|
|
||||||
impl Widget for CpuWidget {
|
|
||||||
fn update_from_metrics(&mut self, metrics: &[&Metric]) {
|
|
||||||
for metric in metrics {
|
|
||||||
match metric.name.as_str() {
|
|
||||||
"cpu_load_1min" => self.load_1min = metric.value.as_f32(),
|
|
||||||
"cpu_load_5min" => self.load_5min = metric.value.as_f32(),
|
|
||||||
"cpu_temperature_celsius" => self.temperature = metric.value.as_f32(),
|
|
||||||
_ => {}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Persistent Cache
|
|
||||||
|
|
||||||
The cache system prevents false notifications:
|
|
||||||
|
|
||||||
- **Automatic saving** - Saves when service status changes
|
|
||||||
- **Persistent storage** - Maintains state across agent restarts
|
|
||||||
- **Simple design** - No complex TTL or cleanup logic
|
|
||||||
- **Status preservation** - Prevents duplicate notifications
|
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
|
|
||||||
### Project Structure
|
### Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
cm-dashboard/
|
cm-dashboard/
|
||||||
├── agent/ # Metrics collection agent
|
├── agent/ # Metrics collection agent
|
||||||
│ ├── src/
|
│ ├── src/
|
||||||
│ │ ├── collectors/ # CPU, memory, disk, systemd, backup
|
│ │ ├── collectors/ # CPU, memory, disk, systemd, backup, nixos
|
||||||
│ │ ├── status/ # Status aggregation and notifications
|
│ │ ├── service_tracker.rs # User-stopped service tracking
|
||||||
│ │ ├── cache/ # Persistent metric caching
|
│ │ ├── status/ # Status aggregation and notifications
|
||||||
│ │ ├── config/ # TOML configuration loading
|
│ │ ├── config/ # TOML configuration loading
|
||||||
│ │ └── notifications/ # Email notification system
|
│ │ └── communication/ # ZMQ message handling
|
||||||
├── dashboard/ # TUI dashboard application
|
├── dashboard/ # TUI dashboard application
|
||||||
│ ├── src/
|
│ ├── src/
|
||||||
│ │ ├── ui/widgets/ # CPU, memory, services, backup widgets
|
│ │ ├── ui/widgets/ # CPU, memory, services, backup, system
|
||||||
│ │ ├── metrics/ # Metric storage and filtering
|
│ │ ├── communication/ # ZMQ consumption and commands
|
||||||
│ │ └── communication/ # ZMQ metric consumption
|
│ │ └── app.rs # Main application loop
|
||||||
├── shared/ # Shared types and utilities
|
├── shared/ # Shared types and utilities
|
||||||
│ └── src/
|
│ └── src/
|
||||||
│ ├── metrics.rs # Metric, Status, and Value types
|
│ ├── metrics.rs # Metric, Status, StatusTracker types
|
||||||
│ ├── protocol.rs # ZMQ message format
|
│ ├── protocol.rs # ZMQ message format
|
||||||
│ └── cache.rs # Cache configuration
|
│ └── cache.rs # Cache configuration
|
||||||
└── README.md # This file
|
└── CLAUDE.md # Development guidelines and rules
|
||||||
```
|
```
|
||||||
|
|
||||||
### Building
|
### Testing
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Debug build
|
# Build and test
|
||||||
cargo build --workspace
|
nix-shell -p openssl pkg-config --run "cargo build --workspace"
|
||||||
|
nix-shell -p openssl pkg-config --run "cargo test --workspace"
|
||||||
|
|
||||||
# Release build
|
# Code quality
|
||||||
cargo build --workspace --release
|
cargo fmt --all
|
||||||
|
|
||||||
# Run tests
|
|
||||||
cargo test --workspace
|
|
||||||
|
|
||||||
# Check code formatting
|
|
||||||
cargo fmt --all -- --check
|
|
||||||
|
|
||||||
# Run clippy linter
|
|
||||||
cargo clippy --workspace -- -D warnings
|
cargo clippy --workspace -- -D warnings
|
||||||
```
|
```
|
||||||
|
|
||||||
### Dependencies
|
## Deployment
|
||||||
|
|
||||||
- **tokio** - Async runtime
|
### Automated Binary Releases
|
||||||
- **zmq** - Message passing between agent and dashboard
|
```bash
|
||||||
- **ratatui** - Terminal user interface
|
# Create new release
|
||||||
- **serde** - Serialization for metrics and config
|
cd ~/projects/cm-dashboard
|
||||||
- **anyhow/thiserror** - Error handling
|
git tag v0.1.X
|
||||||
- **tracing** - Structured logging
|
git push origin v0.1.X
|
||||||
- **lettre** - SMTP email notifications
|
```
|
||||||
- **clap** - Command-line argument parsing
|
|
||||||
- **toml** - Configuration file parsing
|
|
||||||
|
|
||||||
## NixOS Integration
|
This triggers automated:
|
||||||
|
- Static binary compilation with `RUSTFLAGS="-C target-feature=+crt-static"`
|
||||||
|
- GitHub-style release creation
|
||||||
|
- Tarball upload to Gitea
|
||||||
|
|
||||||
This project is designed for declarative deployment via NixOS:
|
### NixOS Integration
|
||||||
|
Update `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
|
||||||
### Configuration Generation
|
|
||||||
|
|
||||||
The NixOS module automatically generates the agent configuration:
|
|
||||||
|
|
||||||
```nix
|
```nix
|
||||||
# hosts/common/cm-dashboard.nix
|
version = "v0.1.43";
|
||||||
services.cm-dashboard-agent = {
|
src = pkgs.fetchurl {
|
||||||
enable = true;
|
url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
|
||||||
port = 6130;
|
sha256 = "sha256-HASH";
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
### Deployment
|
Get hash via:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Update NixOS configuration
|
cd ~/projects/nixosbox
|
||||||
git add hosts/common/cm-dashboard.nix
|
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
|
||||||
git commit -m "Update cm-dashboard configuration"
|
url = "URL_HERE";
|
||||||
git push
|
sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
|
||||||
|
}' 2>&1 | grep "got:"
|
||||||
# Rebuild system (user-performed)
|
|
||||||
sudo nixos-rebuild switch --flake .
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Monitoring Intervals
|
## Monitoring Intervals
|
||||||
|
|
||||||
- **CPU/Memory**: 2 seconds (real-time monitoring)
|
- **Metrics Collection**: 2 seconds (CPU, memory, services)
|
||||||
- **Disk usage**: 300 seconds (5 minutes)
|
- **Metric Transmission**: 2 seconds (ZMQ publish)
|
||||||
- **Systemd services**: 10 seconds
|
- **Dashboard Updates**: 1 second (UI refresh)
|
||||||
- **SMART health**: 600 seconds (10 minutes)
|
- **Email Notifications**: 30 seconds (batched)
|
||||||
- **Backup status**: 60 seconds (1 minute)
|
- **Disk Monitoring**: 300 seconds (5 minutes)
|
||||||
- **Email notifications**: 30 seconds (batched)
|
- **Service Discovery**: 300 seconds (5 minutes cache)
|
||||||
- **Dashboard updates**: 1 second (real-time display)
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT License - see LICENSE file for details
|
MIT License - see LICENSE file for details.
|
||||||
|
|
||||||
63
TODO.md
63
TODO.md
@@ -1,63 +0,0 @@
|
|||||||
# TODO
|
|
||||||
|
|
||||||
## Systemd filtering (agent)
|
|
||||||
|
|
||||||
- remove user systemd collection
|
|
||||||
- reduce number of systemctl call
|
|
||||||
- Cahnge so only services in include list are detected
|
|
||||||
- Filter on exact name
|
|
||||||
- Add support for "\*" in filtering
|
|
||||||
|
|
||||||
## System panel (agent/dashboard)
|
|
||||||
|
|
||||||
use following layout:
|
|
||||||
'''
|
|
||||||
NixOS:
|
|
||||||
Build: xxxxxx
|
|
||||||
Agen: xxxxxx
|
|
||||||
CPU:
|
|
||||||
● Load: 0.02 0.31 0.86
|
|
||||||
└─ Freq: 3000MHz
|
|
||||||
RAM:
|
|
||||||
● Usage: 33% 2.6GB/7.6GB
|
|
||||||
└─ ● /tmp: 0% 0B/2.0GB
|
|
||||||
Storage:
|
|
||||||
● /:
|
|
||||||
├─ ● nvme0n1 T: 40C • W: 4%
|
|
||||||
└─ ● 8% 75.0GB/906.2GB
|
|
||||||
'''
|
|
||||||
|
|
||||||
- Add support to show login/active users
|
|
||||||
- Add support to show timestamp/version for latest nixos rebuild
|
|
||||||
|
|
||||||
## Backup panel (dashboard)
|
|
||||||
|
|
||||||
use following layout:
|
|
||||||
'''
|
|
||||||
Latest backup:
|
|
||||||
● <timestamp>
|
|
||||||
└─ Duration: 1.3m
|
|
||||||
Disk:
|
|
||||||
● Samsung SSD 870 QVO 1TB
|
|
||||||
├─ S/N: S5RRNF0W800639Y
|
|
||||||
└─ Usage: 50.5GB/915.8GB
|
|
||||||
Repos:
|
|
||||||
● gitea (4) 5.1GB
|
|
||||||
● immich (4) 45.0GB
|
|
||||||
● kryddorten (4) 67.8MB
|
|
||||||
● mariehall2 (4) 322.7MB
|
|
||||||
● nixosbox (4) 5.5MB
|
|
||||||
● unifi (4) 5.7MB
|
|
||||||
● vaultwarden (4) 508kB
|
|
||||||
'''
|
|
||||||
|
|
||||||
## Keyboard navigation and scrolling (dashboard)
|
|
||||||
|
|
||||||
- Add keyboard navigation between panels "Shift-Tab"
|
|
||||||
- Add lower statusbar with dynamic updated shortcuts when switchng between panels
|
|
||||||
|
|
||||||
## Remote execution (agent/dashboard)
|
|
||||||
|
|
||||||
- Add support for send command via dashboard to agent to do nixos rebuid
|
|
||||||
- Add support for navigating services in dashboard and trigger start/stop/restart
|
|
||||||
- Add support for trigger backup
|
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "cm-dashboard-agent"
|
name = "cm-dashboard-agent"
|
||||||
version = "0.1.25"
|
version = "0.1.57"
|
||||||
edition = "2021"
|
edition = "2021"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ use crate::communication::{AgentCommand, ServiceAction, ZmqHandler};
|
|||||||
use crate::config::AgentConfig;
|
use crate::config::AgentConfig;
|
||||||
use crate::metrics::MetricCollectionManager;
|
use crate::metrics::MetricCollectionManager;
|
||||||
use crate::notifications::NotificationManager;
|
use crate::notifications::NotificationManager;
|
||||||
|
use crate::service_tracker::UserStoppedServiceTracker;
|
||||||
use crate::status::HostStatusManager;
|
use crate::status::HostStatusManager;
|
||||||
use cm_dashboard_shared::{Metric, MetricMessage, MetricValue, Status};
|
use cm_dashboard_shared::{Metric, MetricMessage, MetricValue, Status};
|
||||||
|
|
||||||
@@ -18,6 +19,7 @@ pub struct Agent {
|
|||||||
metric_manager: MetricCollectionManager,
|
metric_manager: MetricCollectionManager,
|
||||||
notification_manager: NotificationManager,
|
notification_manager: NotificationManager,
|
||||||
host_status_manager: HostStatusManager,
|
host_status_manager: HostStatusManager,
|
||||||
|
service_tracker: UserStoppedServiceTracker,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Agent {
|
impl Agent {
|
||||||
@@ -50,6 +52,10 @@ impl Agent {
|
|||||||
let host_status_manager = HostStatusManager::new(config.status_aggregation.clone());
|
let host_status_manager = HostStatusManager::new(config.status_aggregation.clone());
|
||||||
info!("Host status manager initialized");
|
info!("Host status manager initialized");
|
||||||
|
|
||||||
|
// Initialize user-stopped service tracker
|
||||||
|
let service_tracker = UserStoppedServiceTracker::init_global()?;
|
||||||
|
info!("User-stopped service tracker initialized");
|
||||||
|
|
||||||
Ok(Self {
|
Ok(Self {
|
||||||
hostname,
|
hostname,
|
||||||
config,
|
config,
|
||||||
@@ -57,6 +63,7 @@ impl Agent {
|
|||||||
metric_manager,
|
metric_manager,
|
||||||
notification_manager,
|
notification_manager,
|
||||||
host_status_manager,
|
host_status_manager,
|
||||||
|
service_tracker,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -173,6 +180,13 @@ impl Agent {
|
|||||||
let version_metric = self.get_agent_version_metric();
|
let version_metric = self.get_agent_version_metric();
|
||||||
metrics.push(version_metric);
|
metrics.push(version_metric);
|
||||||
|
|
||||||
|
// Add heartbeat metric for host connectivity detection
|
||||||
|
let heartbeat_metric = self.get_heartbeat_metric();
|
||||||
|
metrics.push(heartbeat_metric);
|
||||||
|
|
||||||
|
// Check for user-stopped services that are now active and clear their flags
|
||||||
|
self.clear_user_stopped_flags_for_active_services(&metrics);
|
||||||
|
|
||||||
if metrics.is_empty() {
|
if metrics.is_empty() {
|
||||||
debug!("No metrics to broadcast");
|
debug!("No metrics to broadcast");
|
||||||
return Ok(());
|
return Ok(());
|
||||||
@@ -191,6 +205,12 @@ impl Agent {
|
|||||||
async fn process_metrics(&mut self, metrics: &[Metric]) -> bool {
|
async fn process_metrics(&mut self, metrics: &[Metric]) -> bool {
|
||||||
let mut status_changed = false;
|
let mut status_changed = false;
|
||||||
for metric in metrics {
|
for metric in metrics {
|
||||||
|
// Filter excluded metrics from email notification processing only
|
||||||
|
if self.config.exclude_email_metrics.contains(&metric.name) {
|
||||||
|
debug!("Excluding metric '{}' from email notification processing", metric.name);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
if self.host_status_manager.process_metric(metric, &mut self.notification_manager).await {
|
if self.host_status_manager.process_metric(metric, &mut self.notification_manager).await {
|
||||||
status_changed = true;
|
status_changed = true;
|
||||||
}
|
}
|
||||||
@@ -216,6 +236,22 @@ impl Agent {
|
|||||||
format!("v{}", env!("CARGO_PKG_VERSION"))
|
format!("v{}", env!("CARGO_PKG_VERSION"))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Create heartbeat metric for host connectivity detection
|
||||||
|
fn get_heartbeat_metric(&self) -> Metric {
|
||||||
|
use std::time::{SystemTime, UNIX_EPOCH};
|
||||||
|
|
||||||
|
let timestamp = SystemTime::now()
|
||||||
|
.duration_since(UNIX_EPOCH)
|
||||||
|
.unwrap()
|
||||||
|
.as_secs();
|
||||||
|
|
||||||
|
Metric::new(
|
||||||
|
"agent_heartbeat".to_string(),
|
||||||
|
MetricValue::Integer(timestamp as i64),
|
||||||
|
Status::Ok,
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
async fn handle_commands(&mut self) -> Result<()> {
|
async fn handle_commands(&mut self) -> Result<()> {
|
||||||
// Try to receive commands (non-blocking)
|
// Try to receive commands (non-blocking)
|
||||||
match self.zmq_handler.try_receive_command() {
|
match self.zmq_handler.try_receive_command() {
|
||||||
@@ -271,19 +307,34 @@ impl Agent {
|
|||||||
|
|
||||||
/// Handle systemd service control commands
|
/// Handle systemd service control commands
|
||||||
async fn handle_service_control(&mut self, service_name: &str, action: &ServiceAction) -> Result<()> {
|
async fn handle_service_control(&mut self, service_name: &str, action: &ServiceAction) -> Result<()> {
|
||||||
let action_str = match action {
|
let (action_str, is_user_action) = match action {
|
||||||
ServiceAction::Start => "start",
|
ServiceAction::Start => ("start", false),
|
||||||
ServiceAction::Stop => "stop",
|
ServiceAction::Stop => ("stop", false),
|
||||||
ServiceAction::Restart => "restart",
|
ServiceAction::Status => ("status", false),
|
||||||
ServiceAction::Status => "status",
|
ServiceAction::UserStart => ("start", true),
|
||||||
|
ServiceAction::UserStop => ("stop", true),
|
||||||
};
|
};
|
||||||
|
|
||||||
info!("Executing systemctl {} {}", action_str, service_name);
|
info!("Executing systemctl {} {} (user action: {})", action_str, service_name, is_user_action);
|
||||||
|
|
||||||
|
// Handle user-stopped service tracking before systemctl execution (stop only)
|
||||||
|
match action {
|
||||||
|
ServiceAction::UserStop => {
|
||||||
|
info!("Marking service '{}' as user-stopped", service_name);
|
||||||
|
if let Err(e) = self.service_tracker.mark_user_stopped(service_name) {
|
||||||
|
error!("Failed to mark service as user-stopped: {}", e);
|
||||||
|
} else {
|
||||||
|
// Sync to global tracker
|
||||||
|
UserStoppedServiceTracker::update_global(&self.service_tracker);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
_ => {}
|
||||||
|
}
|
||||||
|
|
||||||
let output = tokio::process::Command::new("sudo")
|
let output = tokio::process::Command::new("sudo")
|
||||||
.arg("systemctl")
|
.arg("systemctl")
|
||||||
.arg(action_str)
|
.arg(action_str)
|
||||||
.arg(service_name)
|
.arg(format!("{}.service", service_name))
|
||||||
.output()
|
.output()
|
||||||
.await?;
|
.await?;
|
||||||
|
|
||||||
@@ -292,6 +343,9 @@ impl Agent {
|
|||||||
if !output.stdout.is_empty() {
|
if !output.stdout.is_empty() {
|
||||||
debug!("stdout: {}", String::from_utf8_lossy(&output.stdout));
|
debug!("stdout: {}", String::from_utf8_lossy(&output.stdout));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Note: User-stopped flag will be cleared by systemd collector
|
||||||
|
// when service actually reaches 'active' state, not here
|
||||||
} else {
|
} else {
|
||||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||||
error!("Service {} {} failed: {}", service_name, action_str, stderr);
|
error!("Service {} {} failed: {}", service_name, action_str, stderr);
|
||||||
@@ -299,7 +353,7 @@ impl Agent {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Force refresh metrics after service control to update service status
|
// Force refresh metrics after service control to update service status
|
||||||
if matches!(action, ServiceAction::Start | ServiceAction::Stop | ServiceAction::Restart) {
|
if matches!(action, ServiceAction::Start | ServiceAction::Stop | ServiceAction::UserStart | ServiceAction::UserStop) {
|
||||||
info!("Triggering immediate metric refresh after service control");
|
info!("Triggering immediate metric refresh after service control");
|
||||||
if let Err(e) = self.collect_metrics_only().await {
|
if let Err(e) = self.collect_metrics_only().await {
|
||||||
error!("Failed to refresh metrics after service control: {}", e);
|
error!("Failed to refresh metrics after service control: {}", e);
|
||||||
@@ -311,4 +365,33 @@ impl Agent {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Check metrics for user-stopped services that are now active and clear their flags
|
||||||
|
fn clear_user_stopped_flags_for_active_services(&mut self, metrics: &[Metric]) {
|
||||||
|
for metric in metrics {
|
||||||
|
// Look for service status metrics that are active
|
||||||
|
if metric.name.starts_with("service_") && metric.name.ends_with("_status") {
|
||||||
|
if let MetricValue::String(status) = &metric.value {
|
||||||
|
if status == "active" {
|
||||||
|
// Extract service name from metric name (service_nginx_status -> nginx)
|
||||||
|
let service_name = metric.name
|
||||||
|
.strip_prefix("service_")
|
||||||
|
.and_then(|s| s.strip_suffix("_status"))
|
||||||
|
.unwrap_or("");
|
||||||
|
|
||||||
|
if !service_name.is_empty() && UserStoppedServiceTracker::is_service_user_stopped(service_name) {
|
||||||
|
info!("Service '{}' is now active - clearing user-stopped flag", service_name);
|
||||||
|
if let Err(e) = self.service_tracker.clear_user_stopped(service_name) {
|
||||||
|
error!("Failed to clear user-stopped flag for '{}': {}", service_name, e);
|
||||||
|
} else {
|
||||||
|
// Sync to global tracker
|
||||||
|
UserStoppedServiceTracker::update_global(&self.service_tracker);
|
||||||
|
debug!("Cleared user-stopped flag for service '{}'", service_name);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
@@ -140,6 +140,7 @@ impl Collector for BackupCollector {
|
|||||||
Status::Warning => "warning".to_string(),
|
Status::Warning => "warning".to_string(),
|
||||||
Status::Critical => "critical".to_string(),
|
Status::Critical => "critical".to_string(),
|
||||||
Status::Unknown => "unknown".to_string(),
|
Status::Unknown => "unknown".to_string(),
|
||||||
|
Status::Offline => "offline".to_string(),
|
||||||
}),
|
}),
|
||||||
status: overall_status,
|
status: overall_status,
|
||||||
timestamp,
|
timestamp,
|
||||||
@@ -202,6 +203,7 @@ impl Collector for BackupCollector {
|
|||||||
Status::Warning => "warning".to_string(),
|
Status::Warning => "warning".to_string(),
|
||||||
Status::Critical => "critical".to_string(),
|
Status::Critical => "critical".to_string(),
|
||||||
Status::Unknown => "unknown".to_string(),
|
Status::Unknown => "unknown".to_string(),
|
||||||
|
Status::Offline => "offline".to_string(),
|
||||||
}),
|
}),
|
||||||
status: service_status,
|
status: service_status,
|
||||||
timestamp,
|
timestamp,
|
||||||
|
|||||||
@@ -37,6 +37,22 @@ impl NixOSCollector {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Get configuration hash from deployed nix store system
|
/// Get configuration hash from deployed nix store system
|
||||||
|
/// Get git commit hash from rebuild process
|
||||||
|
fn get_git_commit(&self) -> Result<String, Box<dyn std::error::Error>> {
|
||||||
|
let commit_file = "/var/lib/cm-dashboard/git-commit";
|
||||||
|
match std::fs::read_to_string(commit_file) {
|
||||||
|
Ok(content) => {
|
||||||
|
let commit_hash = content.trim();
|
||||||
|
if commit_hash.len() >= 7 {
|
||||||
|
Ok(commit_hash.to_string())
|
||||||
|
} else {
|
||||||
|
Err("Git commit hash too short".into())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(e) => Err(format!("Failed to read git commit file: {}", e).into())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
fn get_config_hash(&self) -> Result<String, Box<dyn std::error::Error>> {
|
fn get_config_hash(&self) -> Result<String, Box<dyn std::error::Error>> {
|
||||||
// Read the symlink target of /run/current-system to get nix store path
|
// Read the symlink target of /run/current-system to get nix store path
|
||||||
let output = Command::new("readlink")
|
let output = Command::new("readlink")
|
||||||
@@ -74,25 +90,25 @@ impl Collector for NixOSCollector {
|
|||||||
let mut metrics = Vec::new();
|
let mut metrics = Vec::new();
|
||||||
let timestamp = chrono::Utc::now().timestamp() as u64;
|
let timestamp = chrono::Utc::now().timestamp() as u64;
|
||||||
|
|
||||||
// Collect NixOS build information (config hash)
|
// Collect git commit information (shows what's actually deployed)
|
||||||
match self.get_config_hash() {
|
match self.get_git_commit() {
|
||||||
Ok(config_hash) => {
|
Ok(git_commit) => {
|
||||||
metrics.push(Metric {
|
metrics.push(Metric {
|
||||||
name: "system_nixos_build".to_string(),
|
name: "system_nixos_build".to_string(),
|
||||||
value: MetricValue::String(config_hash),
|
value: MetricValue::String(git_commit),
|
||||||
unit: None,
|
unit: None,
|
||||||
description: Some("NixOS deployed configuration hash".to_string()),
|
description: Some("Git commit hash of deployed configuration".to_string()),
|
||||||
status: Status::Ok,
|
status: Status::Ok,
|
||||||
timestamp,
|
timestamp,
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
Err(e) => {
|
Err(e) => {
|
||||||
debug!("Failed to get config hash: {}", e);
|
debug!("Failed to get git commit: {}", e);
|
||||||
metrics.push(Metric {
|
metrics.push(Metric {
|
||||||
name: "system_nixos_build".to_string(),
|
name: "system_nixos_build".to_string(),
|
||||||
value: MetricValue::String("unknown".to_string()),
|
value: MetricValue::String("unknown".to_string()),
|
||||||
unit: None,
|
unit: None,
|
||||||
description: Some("NixOS config hash (failed to detect)".to_string()),
|
description: Some("Git commit hash (failed to detect)".to_string()),
|
||||||
status: Status::Unknown,
|
status: Status::Unknown,
|
||||||
timestamp,
|
timestamp,
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ use tracing::debug;
|
|||||||
|
|
||||||
use super::{Collector, CollectorError};
|
use super::{Collector, CollectorError};
|
||||||
use crate::config::SystemdConfig;
|
use crate::config::SystemdConfig;
|
||||||
|
use crate::service_tracker::UserStoppedServiceTracker;
|
||||||
|
|
||||||
/// Systemd collector for monitoring systemd services
|
/// Systemd collector for monitoring systemd services
|
||||||
pub struct SystemdCollector {
|
pub struct SystemdCollector {
|
||||||
@@ -136,8 +137,21 @@ impl SystemdCollector {
|
|||||||
/// Auto-discover interesting services to monitor (internal version that doesn't update state)
|
/// Auto-discover interesting services to monitor (internal version that doesn't update state)
|
||||||
fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> {
|
fn discover_services_internal(&self) -> Result<(Vec<String>, std::collections::HashMap<String, ServiceStatusInfo>)> {
|
||||||
debug!("Starting systemd service discovery with status caching");
|
debug!("Starting systemd service discovery with status caching");
|
||||||
// Get all services (includes inactive, running, failed - everything)
|
|
||||||
let units_output = Command::new("systemctl")
|
// First: Get all service unit files (includes services that have never been started)
|
||||||
|
let unit_files_output = Command::new("systemctl")
|
||||||
|
.arg("list-unit-files")
|
||||||
|
.arg("--type=service")
|
||||||
|
.arg("--no-pager")
|
||||||
|
.arg("--plain")
|
||||||
|
.output()?;
|
||||||
|
|
||||||
|
if !unit_files_output.status.success() {
|
||||||
|
return Err(anyhow::anyhow!("systemctl list-unit-files command failed"));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Second: Get runtime status of all units
|
||||||
|
let units_status_output = Command::new("systemctl")
|
||||||
.arg("list-units")
|
.arg("list-units")
|
||||||
.arg("--type=service")
|
.arg("--type=service")
|
||||||
.arg("--all")
|
.arg("--all")
|
||||||
@@ -145,22 +159,33 @@ impl SystemdCollector {
|
|||||||
.arg("--plain")
|
.arg("--plain")
|
||||||
.output()?;
|
.output()?;
|
||||||
|
|
||||||
if !units_output.status.success() {
|
if !units_status_output.status.success() {
|
||||||
return Err(anyhow::anyhow!("systemctl system command failed"));
|
return Err(anyhow::anyhow!("systemctl list-units command failed"));
|
||||||
}
|
}
|
||||||
|
|
||||||
let units_str = String::from_utf8(units_output.stdout)?;
|
let unit_files_str = String::from_utf8(unit_files_output.stdout)?;
|
||||||
|
let units_status_str = String::from_utf8(units_status_output.stdout)?;
|
||||||
let mut services = Vec::new();
|
let mut services = Vec::new();
|
||||||
|
|
||||||
// Use configuration instead of hardcoded values
|
// Use configuration instead of hardcoded values
|
||||||
let excluded_services = &self.config.excluded_services;
|
let excluded_services = &self.config.excluded_services;
|
||||||
let service_name_filters = &self.config.service_name_filters;
|
let service_name_filters = &self.config.service_name_filters;
|
||||||
|
|
||||||
// Parse all services and cache their status information
|
// Parse all service unit files to get complete service list
|
||||||
let mut all_service_names = std::collections::HashSet::new();
|
let mut all_service_names = std::collections::HashSet::new();
|
||||||
let mut status_cache = std::collections::HashMap::new();
|
|
||||||
|
|
||||||
for line in units_str.lines() {
|
for line in unit_files_str.lines() {
|
||||||
|
let fields: Vec<&str> = line.split_whitespace().collect();
|
||||||
|
if fields.len() >= 2 && fields[0].ends_with(".service") {
|
||||||
|
let service_name = fields[0].trim_end_matches(".service");
|
||||||
|
all_service_names.insert(service_name.to_string());
|
||||||
|
debug!("Found service unit file: {}", service_name);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse runtime status for all units
|
||||||
|
let mut status_cache = std::collections::HashMap::new();
|
||||||
|
for line in units_status_str.lines() {
|
||||||
let fields: Vec<&str> = line.split_whitespace().collect();
|
let fields: Vec<&str> = line.split_whitespace().collect();
|
||||||
if fields.len() >= 4 && fields[0].ends_with(".service") {
|
if fields.len() >= 4 && fields[0].ends_with(".service") {
|
||||||
let service_name = fields[0].trim_end_matches(".service");
|
let service_name = fields[0].trim_end_matches(".service");
|
||||||
@@ -177,8 +202,19 @@ impl SystemdCollector {
|
|||||||
sub_state: sub_state.clone(),
|
sub_state: sub_state.clone(),
|
||||||
});
|
});
|
||||||
|
|
||||||
all_service_names.insert(service_name.to_string());
|
debug!("Got runtime status for service: {} (load:{}, active:{}, sub:{})", service_name, load_state, active_state, sub_state);
|
||||||
debug!("Parsed service: {} (load:{}, active:{}, sub:{})", service_name, load_state, active_state, sub_state);
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// For services found in unit files but not in runtime status, set default inactive status
|
||||||
|
for service_name in &all_service_names {
|
||||||
|
if !status_cache.contains_key(service_name) {
|
||||||
|
status_cache.insert(service_name.to_string(), ServiceStatusInfo {
|
||||||
|
load_state: "not-loaded".to_string(),
|
||||||
|
active_state: "inactive".to_string(),
|
||||||
|
sub_state: "dead".to_string(),
|
||||||
|
});
|
||||||
|
debug!("Service {} found in unit files but not runtime - marked as inactive", service_name);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -318,13 +354,37 @@ impl SystemdCollector {
|
|||||||
Ok((active_status, detailed_info))
|
Ok((active_status, detailed_info))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Calculate service status
|
/// Calculate service status, taking user-stopped services into account
|
||||||
fn calculate_service_status(&self, active_status: &str) -> Status {
|
fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status {
|
||||||
match active_status.to_lowercase().as_str() {
|
match active_status.to_lowercase().as_str() {
|
||||||
"active" => Status::Ok,
|
"active" => {
|
||||||
"inactive" | "dead" => Status::Warning,
|
// If service is now active and was marked as user-stopped, clear the flag
|
||||||
|
if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
|
||||||
|
debug!("Service '{}' is now active - clearing user-stopped flag", service_name);
|
||||||
|
// Note: We can't directly clear here because this is a read-only context
|
||||||
|
// The agent will need to handle this differently
|
||||||
|
}
|
||||||
|
Status::Ok
|
||||||
|
},
|
||||||
|
"inactive" | "dead" => {
|
||||||
|
// Check if this service was stopped by user action
|
||||||
|
if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
|
||||||
|
debug!("Service '{}' is inactive but marked as user-stopped - treating as OK", service_name);
|
||||||
|
Status::Ok
|
||||||
|
} else {
|
||||||
|
Status::Warning
|
||||||
|
}
|
||||||
|
},
|
||||||
"failed" | "error" => Status::Critical,
|
"failed" | "error" => Status::Critical,
|
||||||
"activating" | "deactivating" | "reloading" | "start" | "stop" | "restart" => Status::Pending,
|
"activating" | "deactivating" | "reloading" | "start" | "stop" | "restart" => {
|
||||||
|
// For user-stopped services that are transitioning, keep them as OK during transition
|
||||||
|
if UserStoppedServiceTracker::is_service_user_stopped(service_name) {
|
||||||
|
debug!("Service '{}' is transitioning but was user-stopped - treating as OK", service_name);
|
||||||
|
Status::Ok
|
||||||
|
} else {
|
||||||
|
Status::Pending
|
||||||
|
}
|
||||||
|
},
|
||||||
_ => Status::Unknown,
|
_ => Status::Unknown,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -445,7 +505,7 @@ impl Collector for SystemdCollector {
|
|||||||
for service in &monitored_services {
|
for service in &monitored_services {
|
||||||
match self.get_service_status(service) {
|
match self.get_service_status(service) {
|
||||||
Ok((active_status, _detailed_info)) => {
|
Ok((active_status, _detailed_info)) => {
|
||||||
let status = self.calculate_service_status(&active_status);
|
let status = self.calculate_service_status(service, &active_status);
|
||||||
|
|
||||||
// Individual service status metric
|
// Individual service status metric
|
||||||
metrics.push(Metric {
|
metrics.push(Metric {
|
||||||
@@ -520,10 +580,8 @@ impl SystemdCollector {
|
|||||||
for (site_name, url) in &sites {
|
for (site_name, url) in &sites {
|
||||||
match self.check_site_latency(url) {
|
match self.check_site_latency(url) {
|
||||||
Ok(latency_ms) => {
|
Ok(latency_ms) => {
|
||||||
let status = if latency_ms < 500.0 {
|
let status = if latency_ms < self.config.nginx_latency_critical_ms {
|
||||||
Status::Ok
|
Status::Ok
|
||||||
} else if latency_ms < 2000.0 {
|
|
||||||
Status::Warning
|
|
||||||
} else {
|
} else {
|
||||||
Status::Critical
|
Status::Critical
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -66,8 +66,6 @@ impl ZmqHandler {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/// Send heartbeat (placeholder for future use)
|
|
||||||
|
|
||||||
/// Try to receive a command (non-blocking)
|
/// Try to receive a command (non-blocking)
|
||||||
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
|
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
|
||||||
match self.command_receiver.recv_bytes(zmq::DONTWAIT) {
|
match self.command_receiver.recv_bytes(zmq::DONTWAIT) {
|
||||||
@@ -112,6 +110,7 @@ pub enum AgentCommand {
|
|||||||
pub enum ServiceAction {
|
pub enum ServiceAction {
|
||||||
Start,
|
Start,
|
||||||
Stop,
|
Stop,
|
||||||
Restart,
|
|
||||||
Status,
|
Status,
|
||||||
|
UserStart, // User-initiated start (clears user-stopped flag)
|
||||||
|
UserStop, // User-initiated stop (marks as user-stopped)
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -17,6 +17,9 @@ pub struct AgentConfig {
|
|||||||
pub notifications: NotificationConfig,
|
pub notifications: NotificationConfig,
|
||||||
pub status_aggregation: HostStatusConfig,
|
pub status_aggregation: HostStatusConfig,
|
||||||
pub collection_interval_seconds: u64,
|
pub collection_interval_seconds: u64,
|
||||||
|
/// List of metric names to exclude from email notifications
|
||||||
|
#[serde(default)]
|
||||||
|
pub exclude_email_metrics: Vec<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// ZMQ communication configuration
|
/// ZMQ communication configuration
|
||||||
@@ -25,8 +28,6 @@ pub struct ZmqConfig {
|
|||||||
pub publisher_port: u16,
|
pub publisher_port: u16,
|
||||||
pub command_port: u16,
|
pub command_port: u16,
|
||||||
pub bind_address: String,
|
pub bind_address: String,
|
||||||
pub timeout_ms: u64,
|
|
||||||
pub heartbeat_interval_ms: u64,
|
|
||||||
pub transmission_interval_seconds: u64,
|
pub transmission_interval_seconds: u64,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -108,6 +109,7 @@ pub struct SystemdConfig {
|
|||||||
pub nginx_check_interval_seconds: u64,
|
pub nginx_check_interval_seconds: u64,
|
||||||
pub http_timeout_seconds: u64,
|
pub http_timeout_seconds: u64,
|
||||||
pub http_connect_timeout_seconds: u64,
|
pub http_connect_timeout_seconds: u64,
|
||||||
|
pub nginx_latency_critical_ms: f32,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -19,10 +19,6 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
|
|||||||
bail!("ZMQ bind address cannot be empty");
|
bail!("ZMQ bind address cannot be empty");
|
||||||
}
|
}
|
||||||
|
|
||||||
if config.zmq.timeout_ms == 0 {
|
|
||||||
bail!("ZMQ timeout cannot be 0");
|
|
||||||
}
|
|
||||||
|
|
||||||
// Validate collection interval
|
// Validate collection interval
|
||||||
if config.collection_interval_seconds == 0 {
|
if config.collection_interval_seconds == 0 {
|
||||||
bail!("Collection interval cannot be 0");
|
bail!("Collection interval cannot be 0");
|
||||||
@@ -83,6 +79,13 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Validate systemd configuration
|
||||||
|
if config.collectors.systemd.enabled {
|
||||||
|
if config.collectors.systemd.nginx_latency_critical_ms <= 0.0 {
|
||||||
|
bail!("Nginx latency critical threshold must be positive");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Validate SMTP configuration
|
// Validate SMTP configuration
|
||||||
if config.notifications.enabled {
|
if config.notifications.enabled {
|
||||||
if config.notifications.smtp_host.is_empty() {
|
if config.notifications.smtp_host.is_empty() {
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ mod communication;
|
|||||||
mod config;
|
mod config;
|
||||||
mod metrics;
|
mod metrics;
|
||||||
mod notifications;
|
mod notifications;
|
||||||
|
mod service_tracker;
|
||||||
mod status;
|
mod status;
|
||||||
|
|
||||||
use agent::Agent;
|
use agent::Agent;
|
||||||
|
|||||||
172
agent/src/service_tracker.rs
Normal file
172
agent/src/service_tracker.rs
Normal file
@@ -0,0 +1,172 @@
|
|||||||
|
use anyhow::Result;
|
||||||
|
use serde::{Deserialize, Serialize};
|
||||||
|
use std::collections::HashSet;
|
||||||
|
use std::fs;
|
||||||
|
use std::path::Path;
|
||||||
|
use std::sync::{Arc, Mutex, OnceLock};
|
||||||
|
use tracing::{debug, info, warn};
|
||||||
|
|
||||||
|
/// Shared instance for global access
|
||||||
|
static GLOBAL_TRACKER: OnceLock<Arc<Mutex<UserStoppedServiceTracker>>> = OnceLock::new();
|
||||||
|
|
||||||
|
/// Tracks services that have been stopped by user action
|
||||||
|
/// These services should be treated as OK status instead of Warning
|
||||||
|
#[derive(Debug)]
|
||||||
|
pub struct UserStoppedServiceTracker {
|
||||||
|
/// Set of services stopped by user action
|
||||||
|
user_stopped_services: HashSet<String>,
|
||||||
|
/// Path to persistent storage file
|
||||||
|
storage_path: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Serializable data structure for persistence
|
||||||
|
#[derive(Debug, Serialize, Deserialize)]
|
||||||
|
struct UserStoppedData {
|
||||||
|
services: Vec<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl UserStoppedServiceTracker {
|
||||||
|
/// Create new tracker with default storage path
|
||||||
|
pub fn new() -> Self {
|
||||||
|
Self::with_storage_path("/var/lib/cm-dashboard/user-stopped-services.json")
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Initialize global instance (called by agent)
|
||||||
|
pub fn init_global() -> Result<Self> {
|
||||||
|
let tracker = Self::new();
|
||||||
|
|
||||||
|
// Set global instance
|
||||||
|
let global_instance = Arc::new(Mutex::new(tracker));
|
||||||
|
if GLOBAL_TRACKER.set(global_instance).is_err() {
|
||||||
|
warn!("Global service tracker was already initialized");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Return a new instance for the agent to use
|
||||||
|
Ok(Self::new())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if a service is user-stopped (global access for collectors)
|
||||||
|
pub fn is_service_user_stopped(service_name: &str) -> bool {
|
||||||
|
if let Some(global) = GLOBAL_TRACKER.get() {
|
||||||
|
if let Ok(tracker) = global.lock() {
|
||||||
|
tracker.is_user_stopped(service_name)
|
||||||
|
} else {
|
||||||
|
debug!("Failed to lock global service tracker");
|
||||||
|
false
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
debug!("Global service tracker not initialized");
|
||||||
|
false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Update global tracker (called by agent when tracker state changes)
|
||||||
|
pub fn update_global(updated_tracker: &UserStoppedServiceTracker) {
|
||||||
|
if let Some(global) = GLOBAL_TRACKER.get() {
|
||||||
|
if let Ok(mut tracker) = global.lock() {
|
||||||
|
tracker.user_stopped_services = updated_tracker.user_stopped_services.clone();
|
||||||
|
} else {
|
||||||
|
debug!("Failed to lock global service tracker for update");
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
debug!("Global service tracker not initialized for update");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Create new tracker with custom storage path
|
||||||
|
pub fn with_storage_path<P: AsRef<Path>>(storage_path: P) -> Self {
|
||||||
|
let storage_path = storage_path.as_ref().to_string_lossy().to_string();
|
||||||
|
let mut tracker = Self {
|
||||||
|
user_stopped_services: HashSet::new(),
|
||||||
|
storage_path,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Load existing data from storage
|
||||||
|
if let Err(e) = tracker.load_from_storage() {
|
||||||
|
warn!("Failed to load user-stopped services from storage: {}", e);
|
||||||
|
info!("Starting with empty user-stopped services list");
|
||||||
|
}
|
||||||
|
|
||||||
|
tracker
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Mark a service as user-stopped
|
||||||
|
pub fn mark_user_stopped(&mut self, service_name: &str) -> Result<()> {
|
||||||
|
info!("Marking service '{}' as user-stopped", service_name);
|
||||||
|
self.user_stopped_services.insert(service_name.to_string());
|
||||||
|
self.save_to_storage()?;
|
||||||
|
debug!("Service '{}' marked as user-stopped and saved to storage", service_name);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Clear user-stopped flag for a service (when user starts it)
|
||||||
|
pub fn clear_user_stopped(&mut self, service_name: &str) -> Result<()> {
|
||||||
|
if self.user_stopped_services.remove(service_name) {
|
||||||
|
info!("Cleared user-stopped flag for service '{}'", service_name);
|
||||||
|
self.save_to_storage()?;
|
||||||
|
debug!("Service '{}' user-stopped flag cleared and saved to storage", service_name);
|
||||||
|
} else {
|
||||||
|
debug!("Service '{}' was not marked as user-stopped", service_name);
|
||||||
|
}
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Check if a service is marked as user-stopped
|
||||||
|
pub fn is_user_stopped(&self, service_name: &str) -> bool {
|
||||||
|
let is_stopped = self.user_stopped_services.contains(service_name);
|
||||||
|
debug!("Service '{}' user-stopped status: {}", service_name, is_stopped);
|
||||||
|
is_stopped
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/// Save current state to persistent storage
|
||||||
|
fn save_to_storage(&self) -> Result<()> {
|
||||||
|
// Create parent directory if it doesn't exist
|
||||||
|
if let Some(parent_dir) = Path::new(&self.storage_path).parent() {
|
||||||
|
if !parent_dir.exists() {
|
||||||
|
fs::create_dir_all(parent_dir)?;
|
||||||
|
debug!("Created parent directory: {}", parent_dir.display());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
let data = UserStoppedData {
|
||||||
|
services: self.user_stopped_services.iter().cloned().collect(),
|
||||||
|
};
|
||||||
|
|
||||||
|
let json_data = serde_json::to_string_pretty(&data)?;
|
||||||
|
fs::write(&self.storage_path, json_data)?;
|
||||||
|
|
||||||
|
debug!(
|
||||||
|
"Saved {} user-stopped services to {}",
|
||||||
|
data.services.len(),
|
||||||
|
self.storage_path
|
||||||
|
);
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Load state from persistent storage
|
||||||
|
fn load_from_storage(&mut self) -> Result<()> {
|
||||||
|
if !Path::new(&self.storage_path).exists() {
|
||||||
|
debug!("Storage file {} does not exist, starting fresh", self.storage_path);
|
||||||
|
return Ok(());
|
||||||
|
}
|
||||||
|
|
||||||
|
let json_data = fs::read_to_string(&self.storage_path)?;
|
||||||
|
let data: UserStoppedData = serde_json::from_str(&json_data)?;
|
||||||
|
|
||||||
|
self.user_stopped_services = data.services.into_iter().collect();
|
||||||
|
|
||||||
|
info!(
|
||||||
|
"Loaded {} user-stopped services from {}",
|
||||||
|
self.user_stopped_services.len(),
|
||||||
|
self.storage_path
|
||||||
|
);
|
||||||
|
|
||||||
|
if !self.user_stopped_services.is_empty() {
|
||||||
|
debug!("User-stopped services: {:?}", self.user_stopped_services);
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
@@ -272,11 +272,13 @@ impl HostStatusManager {
|
|||||||
/// Check if a status change is significant enough for notification
|
/// Check if a status change is significant enough for notification
|
||||||
fn is_significant_change(&self, old_status: Status, new_status: Status) -> bool {
|
fn is_significant_change(&self, old_status: Status, new_status: Status) -> bool {
|
||||||
match (old_status, new_status) {
|
match (old_status, new_status) {
|
||||||
// Always notify on problems
|
// Don't notify on transitions from Unknown (startup/restart scenario)
|
||||||
|
(Status::Unknown, _) => false,
|
||||||
|
// Always notify on problems (but not from Unknown)
|
||||||
(_, Status::Warning) | (_, Status::Critical) => true,
|
(_, Status::Warning) | (_, Status::Critical) => true,
|
||||||
// Only notify on recovery if it's from a problem state to OK and all services are OK
|
// Only notify on recovery if it's from a problem state to OK and all services are OK
|
||||||
(Status::Warning | Status::Critical, Status::Ok) => self.current_host_status == Status::Ok,
|
(Status::Warning | Status::Critical, Status::Ok) => self.current_host_status == Status::Ok,
|
||||||
// Don't notify on startup or other transitions
|
// Don't notify on other transitions
|
||||||
_ => false,
|
_ => false,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -374,8 +376,8 @@ impl HostStatusManager {
|
|||||||
details.push('\n');
|
details.push('\n');
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show recoveries
|
// Show recoveries only if host status is now OK (all services recovered)
|
||||||
if !recovery_changes.is_empty() {
|
if !recovery_changes.is_empty() && aggregated.host_status_final == Status::Ok {
|
||||||
details.push_str(&format!("✅ RECOVERIES ({}):\n", recovery_changes.len()));
|
details.push_str(&format!("✅ RECOVERIES ({}):\n", recovery_changes.len()));
|
||||||
for change in recovery_changes {
|
for change in recovery_changes {
|
||||||
details.push_str(&format!(" {}\n", change));
|
details.push_str(&format!(" {}\n", change));
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "cm-dashboard"
|
name = "cm-dashboard"
|
||||||
version = "0.1.25"
|
version = "0.1.57"
|
||||||
edition = "2021"
|
edition = "2021"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
@@ -18,4 +18,5 @@ tracing-subscriber = { workspace = true }
|
|||||||
ratatui = { workspace = true }
|
ratatui = { workspace = true }
|
||||||
crossterm = { workspace = true }
|
crossterm = { workspace = true }
|
||||||
toml = { workspace = true }
|
toml = { workspace = true }
|
||||||
gethostname = { workspace = true }
|
gethostname = { workspace = true }
|
||||||
|
wake-on-lan = "0.2"
|
||||||
@@ -22,7 +22,7 @@ pub struct Dashboard {
|
|||||||
terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>,
|
terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>,
|
||||||
headless: bool,
|
headless: bool,
|
||||||
initial_commands_sent: std::collections::HashSet<String>,
|
initial_commands_sent: std::collections::HashSet<String>,
|
||||||
_config: DashboardConfig,
|
config: DashboardConfig,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Dashboard {
|
impl Dashboard {
|
||||||
@@ -67,8 +67,8 @@ impl Dashboard {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
// Connect to predefined hosts from configuration
|
// Connect to configured hosts from configuration
|
||||||
let hosts = config.hosts.predefined_hosts.clone();
|
let hosts: Vec<String> = config.hosts.keys().cloned().collect();
|
||||||
|
|
||||||
// Try to connect to hosts but don't fail if none are available
|
// Try to connect to hosts but don't fail if none are available
|
||||||
match zmq_consumer.connect_to_predefined_hosts(&hosts).await {
|
match zmq_consumer.connect_to_predefined_hosts(&hosts).await {
|
||||||
@@ -133,7 +133,7 @@ impl Dashboard {
|
|||||||
terminal,
|
terminal,
|
||||||
headless,
|
headless,
|
||||||
initial_commands_sent: std::collections::HashSet::new(),
|
initial_commands_sent: std::collections::HashSet::new(),
|
||||||
_config: config,
|
config,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -247,7 +247,7 @@ impl Dashboard {
|
|||||||
if let Some(ref mut tui_app) = self.tui_app {
|
if let Some(ref mut tui_app) = self.tui_app {
|
||||||
let connected_hosts = self
|
let connected_hosts = self
|
||||||
.metric_store
|
.metric_store
|
||||||
.get_connected_hosts(Duration::from_secs(30));
|
.get_connected_hosts(Duration::from_secs(self.config.zmq.heartbeat_timeout_seconds));
|
||||||
|
|
||||||
|
|
||||||
tui_app.update_hosts(connected_hosts);
|
tui_app.update_hosts(connected_hosts);
|
||||||
@@ -294,27 +294,19 @@ impl Dashboard {
|
|||||||
/// Execute a UI command by sending it to the appropriate agent
|
/// Execute a UI command by sending it to the appropriate agent
|
||||||
async fn execute_ui_command(&self, command: UiCommand) -> Result<()> {
|
async fn execute_ui_command(&self, command: UiCommand) -> Result<()> {
|
||||||
match command {
|
match command {
|
||||||
UiCommand::ServiceRestart { hostname, service_name } => {
|
|
||||||
info!("Sending restart command for service {} on {}", service_name, hostname);
|
|
||||||
let agent_command = AgentCommand::ServiceControl {
|
|
||||||
service_name,
|
|
||||||
action: ServiceAction::Restart,
|
|
||||||
};
|
|
||||||
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
|
|
||||||
}
|
|
||||||
UiCommand::ServiceStart { hostname, service_name } => {
|
UiCommand::ServiceStart { hostname, service_name } => {
|
||||||
info!("Sending start command for service {} on {}", service_name, hostname);
|
info!("Sending user start command for service {} on {}", service_name, hostname);
|
||||||
let agent_command = AgentCommand::ServiceControl {
|
let agent_command = AgentCommand::ServiceControl {
|
||||||
service_name: service_name.clone(),
|
service_name: service_name.clone(),
|
||||||
action: ServiceAction::Start,
|
action: ServiceAction::UserStart,
|
||||||
};
|
};
|
||||||
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
|
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
|
||||||
}
|
}
|
||||||
UiCommand::ServiceStop { hostname, service_name } => {
|
UiCommand::ServiceStop { hostname, service_name } => {
|
||||||
info!("Sending stop command for service {} on {}", service_name, hostname);
|
info!("Sending user stop command for service {} on {}", service_name, hostname);
|
||||||
let agent_command = AgentCommand::ServiceControl {
|
let agent_command = AgentCommand::ServiceControl {
|
||||||
service_name: service_name.clone(),
|
service_name: service_name.clone(),
|
||||||
action: ServiceAction::Stop,
|
action: ServiceAction::UserStop,
|
||||||
};
|
};
|
||||||
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
|
self.zmq_command_sender.send_command(&hostname, agent_command).await?;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -35,8 +35,9 @@ pub enum AgentCommand {
|
|||||||
pub enum ServiceAction {
|
pub enum ServiceAction {
|
||||||
Start,
|
Start,
|
||||||
Stop,
|
Stop,
|
||||||
Restart,
|
|
||||||
Status,
|
Status,
|
||||||
|
UserStart, // User-initiated start (clears user-stopped flag)
|
||||||
|
UserStop, // User-initiated stop (marks as user-stopped)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// ZMQ consumer for receiving metrics from agents
|
/// ZMQ consumer for receiving metrics from agents
|
||||||
@@ -140,9 +141,9 @@ impl ZmqConsumer {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Receive metrics from any connected agent (non-blocking)
|
/// Receive metrics from any connected agent (with timeout)
|
||||||
pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> {
|
pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> {
|
||||||
match self.subscriber.recv_bytes(zmq::DONTWAIT) {
|
match self.subscriber.recv_bytes(0) {
|
||||||
Ok(data) => {
|
Ok(data) => {
|
||||||
debug!("Received {} bytes from ZMQ", data.len());
|
debug!("Received {} bytes from ZMQ", data.len());
|
||||||
|
|
||||||
|
|||||||
@@ -6,21 +6,29 @@ use std::path::Path;
|
|||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
pub struct DashboardConfig {
|
pub struct DashboardConfig {
|
||||||
pub zmq: ZmqConfig,
|
pub zmq: ZmqConfig,
|
||||||
pub hosts: HostsConfig,
|
pub hosts: std::collections::HashMap<String, HostDetails>,
|
||||||
pub system: SystemConfig,
|
pub system: SystemConfig,
|
||||||
pub ssh: SshConfig,
|
pub ssh: SshConfig,
|
||||||
|
pub service_logs: std::collections::HashMap<String, Vec<ServiceLogConfig>>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// ZMQ consumer configuration
|
/// ZMQ consumer configuration
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
pub struct ZmqConfig {
|
pub struct ZmqConfig {
|
||||||
pub subscriber_ports: Vec<u16>,
|
pub subscriber_ports: Vec<u16>,
|
||||||
|
/// Heartbeat timeout in seconds - hosts considered offline if no heartbeat received within this time
|
||||||
|
#[serde(default = "default_heartbeat_timeout_seconds")]
|
||||||
|
pub heartbeat_timeout_seconds: u64,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Hosts configuration
|
fn default_heartbeat_timeout_seconds() -> u64 {
|
||||||
|
10 // Default to 10 seconds - allows for multiple missed heartbeats
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Individual host configuration details
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
pub struct HostsConfig {
|
pub struct HostDetails {
|
||||||
pub predefined_hosts: Vec<String>,
|
pub mac_address: Option<String>,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// System configuration
|
/// System configuration
|
||||||
@@ -39,6 +47,13 @@ pub struct SshConfig {
|
|||||||
pub rebuild_alias: String,
|
pub rebuild_alias: String,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Service log file configuration per host
|
||||||
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||||
|
pub struct ServiceLogConfig {
|
||||||
|
pub service_name: String,
|
||||||
|
pub log_file_path: String,
|
||||||
|
}
|
||||||
|
|
||||||
impl DashboardConfig {
|
impl DashboardConfig {
|
||||||
pub fn load_from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
|
pub fn load_from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
|
||||||
let path = path.as_ref();
|
let path = path.as_ref();
|
||||||
@@ -60,8 +75,3 @@ impl Default for ZmqConfig {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for HostsConfig {
|
|
||||||
fn default() -> Self {
|
|
||||||
panic!("Dashboard configuration must be loaded from file - no hardcoded defaults allowed")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|||||||
@@ -12,10 +12,6 @@ mod ui;
|
|||||||
|
|
||||||
use app::Dashboard;
|
use app::Dashboard;
|
||||||
|
|
||||||
/// Get hardcoded version
|
|
||||||
fn get_version() -> &'static str {
|
|
||||||
"v0.1.25"
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Check if running inside tmux session
|
/// Check if running inside tmux session
|
||||||
fn check_tmux_session() {
|
fn check_tmux_session() {
|
||||||
@@ -42,7 +38,7 @@ fn check_tmux_session() {
|
|||||||
#[derive(Parser)]
|
#[derive(Parser)]
|
||||||
#[command(name = "cm-dashboard")]
|
#[command(name = "cm-dashboard")]
|
||||||
#[command(about = "CM Dashboard TUI with individual metric consumption")]
|
#[command(about = "CM Dashboard TUI with individual metric consumption")]
|
||||||
#[command(version = get_version())]
|
#[command(version)]
|
||||||
struct Cli {
|
struct Cli {
|
||||||
/// Increase logging verbosity (-v, -vv)
|
/// Increase logging verbosity (-v, -vv)
|
||||||
#[arg(short, long, action = clap::ArgAction::Count)]
|
#[arg(short, long, action = clap::ArgAction::Count)]
|
||||||
|
|||||||
@@ -11,8 +11,8 @@ pub struct MetricStore {
|
|||||||
current_metrics: HashMap<String, HashMap<String, Metric>>,
|
current_metrics: HashMap<String, HashMap<String, Metric>>,
|
||||||
/// Historical metrics for trending
|
/// Historical metrics for trending
|
||||||
historical_metrics: HashMap<String, Vec<MetricDataPoint>>,
|
historical_metrics: HashMap<String, Vec<MetricDataPoint>>,
|
||||||
/// Last update timestamp per host
|
/// Last heartbeat timestamp per host
|
||||||
last_update: HashMap<String, Instant>,
|
last_heartbeat: HashMap<String, Instant>,
|
||||||
/// Configuration
|
/// Configuration
|
||||||
max_metrics_per_host: usize,
|
max_metrics_per_host: usize,
|
||||||
history_retention: Duration,
|
history_retention: Duration,
|
||||||
@@ -23,7 +23,7 @@ impl MetricStore {
|
|||||||
Self {
|
Self {
|
||||||
current_metrics: HashMap::new(),
|
current_metrics: HashMap::new(),
|
||||||
historical_metrics: HashMap::new(),
|
historical_metrics: HashMap::new(),
|
||||||
last_update: HashMap::new(),
|
last_heartbeat: HashMap::new(),
|
||||||
max_metrics_per_host,
|
max_metrics_per_host,
|
||||||
history_retention: Duration::from_secs(history_retention_hours * 3600),
|
history_retention: Duration::from_secs(history_retention_hours * 3600),
|
||||||
}
|
}
|
||||||
@@ -56,10 +56,13 @@ impl MetricStore {
|
|||||||
|
|
||||||
// Add to history
|
// Add to history
|
||||||
host_history.push(MetricDataPoint { received_at: now });
|
host_history.push(MetricDataPoint { received_at: now });
|
||||||
}
|
|
||||||
|
|
||||||
// Update last update timestamp
|
// Track heartbeat metrics for connectivity detection
|
||||||
self.last_update.insert(hostname.to_string(), now);
|
if metric_name == "agent_heartbeat" {
|
||||||
|
self.last_heartbeat.insert(hostname.to_string(), now);
|
||||||
|
debug!("Updated heartbeat for host {}", hostname);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Get metrics count before cleanup
|
// Get metrics count before cleanup
|
||||||
let metrics_count = host_metrics.len();
|
let metrics_count = host_metrics.len();
|
||||||
@@ -88,16 +91,18 @@ impl MetricStore {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Get connected hosts (hosts with recent updates)
|
/// Get connected hosts (hosts with recent heartbeats)
|
||||||
pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> {
|
pub fn get_connected_hosts(&self, timeout: Duration) -> Vec<String> {
|
||||||
let now = Instant::now();
|
let now = Instant::now();
|
||||||
|
|
||||||
self.last_update
|
self.last_heartbeat
|
||||||
.iter()
|
.iter()
|
||||||
.filter_map(|(hostname, &last_update)| {
|
.filter_map(|(hostname, &last_heartbeat)| {
|
||||||
if now.duration_since(last_update) <= timeout {
|
if now.duration_since(last_heartbeat) <= timeout {
|
||||||
Some(hostname.clone())
|
Some(hostname.clone())
|
||||||
} else {
|
} else {
|
||||||
|
debug!("Host {} considered offline - last heartbeat was {:?} ago",
|
||||||
|
hostname, now.duration_since(last_heartbeat));
|
||||||
None
|
None
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
use anyhow::Result;
|
use anyhow::Result;
|
||||||
use crossterm::event::{Event, KeyCode, KeyModifiers};
|
use crossterm::event::{Event, KeyCode};
|
||||||
use ratatui::{
|
use ratatui::{
|
||||||
layout::{Constraint, Direction, Layout, Rect},
|
layout::{Constraint, Direction, Layout, Rect},
|
||||||
style::Style,
|
style::Style,
|
||||||
@@ -9,6 +9,7 @@ use ratatui::{
|
|||||||
use std::collections::HashMap;
|
use std::collections::HashMap;
|
||||||
use std::time::Instant;
|
use std::time::Instant;
|
||||||
use tracing::info;
|
use tracing::info;
|
||||||
|
use wake_on_lan::MagicPacket;
|
||||||
|
|
||||||
pub mod theme;
|
pub mod theme;
|
||||||
pub mod widgets;
|
pub mod widgets;
|
||||||
@@ -22,7 +23,6 @@ use widgets::{BackupWidget, ServicesWidget, SystemWidget, Widget};
|
|||||||
/// Commands that can be triggered from the UI
|
/// Commands that can be triggered from the UI
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
pub enum UiCommand {
|
pub enum UiCommand {
|
||||||
ServiceRestart { hostname: String, service_name: String },
|
|
||||||
ServiceStart { hostname: String, service_name: String },
|
ServiceStart { hostname: String, service_name: String },
|
||||||
ServiceStop { hostname: String, service_name: String },
|
ServiceStop { hostname: String, service_name: String },
|
||||||
TriggerBackup { hostname: String },
|
TriggerBackup { hostname: String },
|
||||||
@@ -32,22 +32,12 @@ pub enum UiCommand {
|
|||||||
/// Types of commands for status tracking
|
/// Types of commands for status tracking
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
pub enum CommandType {
|
pub enum CommandType {
|
||||||
ServiceRestart,
|
|
||||||
ServiceStart,
|
ServiceStart,
|
||||||
ServiceStop,
|
ServiceStop,
|
||||||
BackupTrigger,
|
BackupTrigger,
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Panel types for focus management
|
/// Panel types for focus management
|
||||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
|
||||||
pub enum PanelType {
|
|
||||||
System,
|
|
||||||
Services,
|
|
||||||
Backup,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl PanelType {
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Widget states for a specific host
|
/// Widget states for a specific host
|
||||||
#[derive(Clone)]
|
#[derive(Clone)]
|
||||||
@@ -65,7 +55,7 @@ pub struct HostWidgets {
|
|||||||
/// Last update time for this host
|
/// Last update time for this host
|
||||||
pub last_update: Option<Instant>,
|
pub last_update: Option<Instant>,
|
||||||
/// Pending service transitions for immediate visual feedback
|
/// Pending service transitions for immediate visual feedback
|
||||||
pub pending_service_transitions: HashMap<String, (CommandType, String)>, // service_name -> (command_type, original_status)
|
pub pending_service_transitions: HashMap<String, (CommandType, String, Instant)>, // service_name -> (command_type, original_status, start_time)
|
||||||
}
|
}
|
||||||
|
|
||||||
impl HostWidgets {
|
impl HostWidgets {
|
||||||
@@ -94,8 +84,6 @@ pub struct TuiApp {
|
|||||||
available_hosts: Vec<String>,
|
available_hosts: Vec<String>,
|
||||||
/// Host index for navigation
|
/// Host index for navigation
|
||||||
host_index: usize,
|
host_index: usize,
|
||||||
/// Currently focused panel
|
|
||||||
focused_panel: PanelType,
|
|
||||||
/// Should quit application
|
/// Should quit application
|
||||||
should_quit: bool,
|
should_quit: bool,
|
||||||
/// Track if user manually navigated away from localhost
|
/// Track if user manually navigated away from localhost
|
||||||
@@ -106,16 +94,25 @@ pub struct TuiApp {
|
|||||||
|
|
||||||
impl TuiApp {
|
impl TuiApp {
|
||||||
pub fn new(config: DashboardConfig) -> Self {
|
pub fn new(config: DashboardConfig) -> Self {
|
||||||
Self {
|
let mut app = Self {
|
||||||
host_widgets: HashMap::new(),
|
host_widgets: HashMap::new(),
|
||||||
current_host: None,
|
current_host: None,
|
||||||
available_hosts: Vec::new(),
|
available_hosts: config.hosts.keys().cloned().collect(),
|
||||||
host_index: 0,
|
host_index: 0,
|
||||||
focused_panel: PanelType::System, // Start with System panel focused
|
|
||||||
should_quit: false,
|
should_quit: false,
|
||||||
user_navigated_away: false,
|
user_navigated_away: false,
|
||||||
config,
|
config,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Sort predefined hosts
|
||||||
|
app.available_hosts.sort();
|
||||||
|
|
||||||
|
// Initialize with first host if available
|
||||||
|
if !app.available_hosts.is_empty() {
|
||||||
|
app.current_host = Some(app.available_hosts[0].clone());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
app
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Get or create host widgets for the given hostname
|
/// Get or create host widgets for the given hostname
|
||||||
@@ -200,21 +197,28 @@ impl TuiApp {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Update available hosts with localhost prioritization
|
/// Update available hosts with localhost prioritization
|
||||||
pub fn update_hosts(&mut self, hosts: Vec<String>) {
|
pub fn update_hosts(&mut self, discovered_hosts: Vec<String>) {
|
||||||
// Sort hosts alphabetically
|
// Start with configured hosts (always visible)
|
||||||
let mut sorted_hosts = hosts.clone();
|
let mut all_hosts: Vec<String> = self.config.hosts.keys().cloned().collect();
|
||||||
|
|
||||||
|
// Add any discovered hosts that aren't already configured
|
||||||
|
for host in discovered_hosts {
|
||||||
|
if !all_hosts.contains(&host) {
|
||||||
|
all_hosts.push(host);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Keep hosts that have pending transitions even if they're offline
|
// Keep hosts that have pending transitions even if they're offline
|
||||||
for (hostname, host_widgets) in &self.host_widgets {
|
for (hostname, host_widgets) in &self.host_widgets {
|
||||||
if !host_widgets.pending_service_transitions.is_empty() {
|
if !host_widgets.pending_service_transitions.is_empty() {
|
||||||
if !sorted_hosts.contains(hostname) {
|
if !all_hosts.contains(hostname) {
|
||||||
sorted_hosts.push(hostname.clone());
|
all_hosts.push(hostname.clone());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
sorted_hosts.sort();
|
all_hosts.sort();
|
||||||
self.available_hosts = sorted_hosts;
|
self.available_hosts = all_hosts;
|
||||||
|
|
||||||
// Get the current hostname (localhost) for auto-selection
|
// Get the current hostname (localhost) for auto-selection
|
||||||
let localhost = gethostname::gethostname().to_string_lossy().to_string();
|
let localhost = gethostname::gethostname().to_string_lossy().to_string();
|
||||||
@@ -256,86 +260,143 @@ impl TuiApp {
|
|||||||
self.navigate_host(1);
|
self.navigate_host(1);
|
||||||
}
|
}
|
||||||
KeyCode::Char('r') => {
|
KeyCode::Char('r') => {
|
||||||
match self.focused_panel {
|
// System rebuild command - works on any panel for current host
|
||||||
PanelType::System => {
|
if let Some(hostname) = self.current_host.clone() {
|
||||||
// Simple tmux popup with SSH rebuild using configured user and alias
|
// Create command that shows logo, rebuilds, and waits for user input
|
||||||
if let Some(hostname) = self.current_host.clone() {
|
let logo_and_rebuild = format!(
|
||||||
// Launch tmux popup with SSH using config values
|
"bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {}\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
|
||||||
let ssh_command = format!(
|
hostname,
|
||||||
"ssh -tt {}@{} 'bash -ic {}'",
|
self.config.ssh.rebuild_user,
|
||||||
self.config.ssh.rebuild_user,
|
hostname,
|
||||||
hostname,
|
self.config.ssh.rebuild_alias
|
||||||
self.config.ssh.rebuild_alias
|
);
|
||||||
);
|
|
||||||
std::process::Command::new("tmux")
|
std::process::Command::new("tmux")
|
||||||
.arg("display-popup")
|
.arg("split-window")
|
||||||
.arg(&ssh_command)
|
.arg("-v")
|
||||||
.spawn()
|
.arg("-p")
|
||||||
.ok(); // Ignore errors, tmux will handle them
|
.arg("30")
|
||||||
}
|
.arg(&logo_and_rebuild)
|
||||||
}
|
.spawn()
|
||||||
PanelType::Services => {
|
.ok(); // Ignore errors, tmux will handle them
|
||||||
// Service restart command
|
|
||||||
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
|
||||||
if self.start_command(&hostname, CommandType::ServiceRestart, service_name.clone()) {
|
|
||||||
return Ok(Some(UiCommand::ServiceRestart { hostname, service_name }));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
_ => {
|
|
||||||
info!("Manual refresh requested");
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
KeyCode::Char('s') => {
|
KeyCode::Char('s') => {
|
||||||
if self.focused_panel == PanelType::Services {
|
// Service start command
|
||||||
// Service start command
|
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
||||||
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
if self.start_command(&hostname, CommandType::ServiceStart, service_name.clone()) {
|
||||||
if self.start_command(&hostname, CommandType::ServiceStart, service_name.clone()) {
|
return Ok(Some(UiCommand::ServiceStart { hostname, service_name }));
|
||||||
return Ok(Some(UiCommand::ServiceStart { hostname, service_name }));
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
KeyCode::Char('S') => {
|
KeyCode::Char('S') => {
|
||||||
if self.focused_panel == PanelType::Services {
|
// Service stop command
|
||||||
// Service stop command
|
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
||||||
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
if self.start_command(&hostname, CommandType::ServiceStop, service_name.clone()) {
|
||||||
if self.start_command(&hostname, CommandType::ServiceStop, service_name.clone()) {
|
return Ok(Some(UiCommand::ServiceStop { hostname, service_name }));
|
||||||
return Ok(Some(UiCommand::ServiceStop { hostname, service_name }));
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
KeyCode::Char('J') => {
|
||||||
|
// Show service logs via journalctl in tmux split window
|
||||||
|
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
||||||
|
let journalctl_command = format!(
|
||||||
|
"bash -c \"ssh -tt {}@{} 'sudo journalctl -u {}.service -f --no-pager -n 50'; exit\"",
|
||||||
|
self.config.ssh.rebuild_user,
|
||||||
|
hostname,
|
||||||
|
service_name
|
||||||
|
);
|
||||||
|
|
||||||
|
std::process::Command::new("tmux")
|
||||||
|
.arg("split-window")
|
||||||
|
.arg("-v")
|
||||||
|
.arg("-p")
|
||||||
|
.arg("30")
|
||||||
|
.arg(&journalctl_command)
|
||||||
|
.spawn()
|
||||||
|
.ok(); // Ignore errors, tmux will handle them
|
||||||
|
}
|
||||||
|
}
|
||||||
|
KeyCode::Char('L') => {
|
||||||
|
// Show custom service log file in tmux split window
|
||||||
|
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
|
||||||
|
// Check if this service has a custom log file configured
|
||||||
|
if let Some(host_logs) = self.config.service_logs.get(&hostname) {
|
||||||
|
if let Some(log_config) = host_logs.iter().find(|config| config.service_name == service_name) {
|
||||||
|
let tail_command = format!(
|
||||||
|
"bash -c \"ssh -tt {}@{} 'sudo tail -n 50 -f {}'; exit\"",
|
||||||
|
self.config.ssh.rebuild_user,
|
||||||
|
hostname,
|
||||||
|
log_config.log_file_path
|
||||||
|
);
|
||||||
|
|
||||||
|
std::process::Command::new("tmux")
|
||||||
|
.arg("split-window")
|
||||||
|
.arg("-v")
|
||||||
|
.arg("-p")
|
||||||
|
.arg("30")
|
||||||
|
.arg(&tail_command)
|
||||||
|
.spawn()
|
||||||
|
.ok(); // Ignore errors, tmux will handle them
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
KeyCode::Char('b') => {
|
KeyCode::Char('b') => {
|
||||||
if self.focused_panel == PanelType::Backup {
|
// Trigger backup
|
||||||
// Trigger backup
|
if let Some(hostname) = self.current_host.clone() {
|
||||||
if let Some(hostname) = self.current_host.clone() {
|
self.start_command(&hostname, CommandType::BackupTrigger, hostname.clone());
|
||||||
self.start_command(&hostname, CommandType::BackupTrigger, hostname.clone());
|
return Ok(Some(UiCommand::TriggerBackup { hostname }));
|
||||||
return Ok(Some(UiCommand::TriggerBackup { hostname }));
|
}
|
||||||
|
}
|
||||||
|
KeyCode::Char('w') => {
|
||||||
|
// Wake on LAN for offline hosts
|
||||||
|
if let Some(hostname) = self.current_host.clone() {
|
||||||
|
// Check if host has MAC address configured
|
||||||
|
if let Some(host_details) = self.config.hosts.get(&hostname) {
|
||||||
|
if let Some(mac_address) = &host_details.mac_address {
|
||||||
|
// Parse MAC address and send WoL packet
|
||||||
|
let mac_bytes = Self::parse_mac_address(mac_address);
|
||||||
|
match mac_bytes {
|
||||||
|
Ok(mac) => {
|
||||||
|
match MagicPacket::new(&mac).send() {
|
||||||
|
Ok(_) => {
|
||||||
|
info!("WakeOnLAN packet sent successfully to {} ({})", hostname, mac_address);
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
tracing::error!("Failed to send WakeOnLAN packet to {}: {}", hostname, e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Err(_) => {
|
||||||
|
tracing::error!("Invalid MAC address format for {}: {}", hostname, mac_address);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
KeyCode::Tab => {
|
KeyCode::Tab => {
|
||||||
if key.modifiers.contains(KeyModifiers::SHIFT) {
|
// Tab cycles to next host
|
||||||
// Shift+Tab cycles through panels
|
self.navigate_host(1);
|
||||||
self.next_panel();
|
}
|
||||||
} else {
|
KeyCode::Up | KeyCode::Char('k') => {
|
||||||
// Tab cycles to next host
|
// Move service selection up
|
||||||
self.navigate_host(1);
|
if let Some(hostname) = self.current_host.clone() {
|
||||||
|
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
||||||
|
host_widgets.services_widget.select_previous();
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
KeyCode::BackTab => {
|
KeyCode::Down | KeyCode::Char('j') => {
|
||||||
// BackTab (Shift+Tab on some terminals) also cycles panels
|
// Move service selection down
|
||||||
self.next_panel();
|
if let Some(hostname) = self.current_host.clone() {
|
||||||
}
|
let total_services = {
|
||||||
KeyCode::Up => {
|
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
||||||
// Scroll up in focused panel
|
host_widgets.services_widget.get_total_services_count()
|
||||||
self.scroll_focused_panel(-1);
|
};
|
||||||
}
|
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
||||||
KeyCode::Down => {
|
host_widgets.services_widget.select_next(total_services);
|
||||||
// Scroll down in focused panel
|
}
|
||||||
self.scroll_focused_panel(1);
|
|
||||||
}
|
}
|
||||||
_ => {}
|
_ => {}
|
||||||
}
|
}
|
||||||
@@ -376,25 +437,6 @@ impl TuiApp {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/// Switch to next panel (Shift+Tab) - only cycles through visible panels
|
|
||||||
pub fn next_panel(&mut self) {
|
|
||||||
let visible_panels = self.get_visible_panels();
|
|
||||||
if visible_panels.len() <= 1 {
|
|
||||||
return; // Can't switch if only one or no panels visible
|
|
||||||
}
|
|
||||||
|
|
||||||
// Find current panel index in visible panels
|
|
||||||
if let Some(current_index) = visible_panels.iter().position(|&p| p == self.focused_panel) {
|
|
||||||
// Move to next visible panel
|
|
||||||
let next_index = (current_index + 1) % visible_panels.len();
|
|
||||||
self.focused_panel = visible_panels[next_index];
|
|
||||||
} else {
|
|
||||||
// Current panel not visible, switch to first visible panel
|
|
||||||
self.focused_panel = visible_panels[0];
|
|
||||||
}
|
|
||||||
|
|
||||||
info!("Switched to panel: {:?}", self.focused_panel);
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -431,7 +473,6 @@ impl TuiApp {
|
|||||||
let should_execute = match (&command_type, current_status.as_deref()) {
|
let should_execute = match (&command_type, current_status.as_deref()) {
|
||||||
(CommandType::ServiceStart, Some("inactive") | Some("failed") | Some("dead")) => true,
|
(CommandType::ServiceStart, Some("inactive") | Some("failed") | Some("dead")) => true,
|
||||||
(CommandType::ServiceStop, Some("active")) => true,
|
(CommandType::ServiceStop, Some("active")) => true,
|
||||||
(CommandType::ServiceRestart, Some("active") | Some("inactive") | Some("failed") | Some("dead")) => true,
|
|
||||||
(CommandType::ServiceStart, Some("active")) => {
|
(CommandType::ServiceStart, Some("active")) => {
|
||||||
// Already running - don't execute
|
// Already running - don't execute
|
||||||
false
|
false
|
||||||
@@ -447,26 +488,25 @@ impl TuiApp {
|
|||||||
_ => true, // Default: allow other combinations
|
_ => true, // Default: allow other combinations
|
||||||
};
|
};
|
||||||
|
|
||||||
if should_execute {
|
// ALWAYS store the pending transition for immediate visual feedback, even if we don't execute
|
||||||
if let Some(host_widgets) = self.host_widgets.get_mut(hostname) {
|
if let Some(host_widgets) = self.host_widgets.get_mut(hostname) {
|
||||||
// Store the pending transition for immediate visual feedback
|
host_widgets.pending_service_transitions.insert(
|
||||||
host_widgets.pending_service_transitions.insert(
|
target.clone(),
|
||||||
target.clone(),
|
(command_type, current_status.unwrap_or_else(|| "unknown".to_string()), Instant::now())
|
||||||
(command_type, current_status.unwrap_or_else(|| "unknown".to_string()))
|
);
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
should_execute
|
should_execute
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Clear pending transitions when real status updates arrive
|
/// Clear pending transitions when real status updates arrive or timeout
|
||||||
fn clear_completed_transitions(&mut self, hostname: &str, service_metrics: &[&Metric]) {
|
fn clear_completed_transitions(&mut self, hostname: &str, service_metrics: &[&Metric]) {
|
||||||
if let Some(host_widgets) = self.host_widgets.get_mut(hostname) {
|
if let Some(host_widgets) = self.host_widgets.get_mut(hostname) {
|
||||||
let mut completed_services = Vec::new();
|
let mut completed_services = Vec::new();
|
||||||
|
|
||||||
// Check each pending transition to see if real status has changed
|
// Check each pending transition to see if real status has changed
|
||||||
for (service_name, (command_type, original_status)) in &host_widgets.pending_service_transitions {
|
for (service_name, (command_type, original_status, _start_time)) in &host_widgets.pending_service_transitions {
|
||||||
|
|
||||||
// Look for status metric for this service
|
// Look for status metric for this service
|
||||||
for metric in service_metrics {
|
for metric in service_metrics {
|
||||||
if metric.name == format!("service_{}_status", service_name) {
|
if metric.name == format!("service_{}_status", service_name) {
|
||||||
@@ -478,7 +518,6 @@ impl TuiApp {
|
|||||||
let expected_change = match command_type {
|
let expected_change = match command_type {
|
||||||
CommandType::ServiceStart => &new_status == "active",
|
CommandType::ServiceStart => &new_status == "active",
|
||||||
CommandType::ServiceStop => &new_status != "active",
|
CommandType::ServiceStop => &new_status != "active",
|
||||||
CommandType::ServiceRestart => true, // Any change indicates restart completed
|
|
||||||
_ => false,
|
_ => false,
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -498,61 +537,8 @@ impl TuiApp {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Scroll the focused panel up or down
|
|
||||||
pub fn scroll_focused_panel(&mut self, direction: i32) {
|
|
||||||
if let Some(hostname) = self.current_host.clone() {
|
|
||||||
let focused_panel = self.focused_panel; // Get the value before borrowing
|
|
||||||
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
|
||||||
|
|
||||||
match focused_panel {
|
|
||||||
PanelType::System => {
|
|
||||||
if direction > 0 {
|
|
||||||
host_widgets.system_scroll_offset = host_widgets.system_scroll_offset.saturating_add(1);
|
|
||||||
} else {
|
|
||||||
host_widgets.system_scroll_offset = host_widgets.system_scroll_offset.saturating_sub(1);
|
|
||||||
}
|
|
||||||
info!("System panel scroll offset: {}", host_widgets.system_scroll_offset);
|
|
||||||
}
|
|
||||||
PanelType::Services => {
|
|
||||||
// For services panel, Up/Down moves selection cursor, not scroll
|
|
||||||
let total_services = host_widgets.services_widget.get_total_services_count();
|
|
||||||
|
|
||||||
if direction > 0 {
|
|
||||||
host_widgets.services_widget.select_next(total_services);
|
|
||||||
info!("Services selection moved down");
|
|
||||||
} else {
|
|
||||||
host_widgets.services_widget.select_previous();
|
|
||||||
info!("Services selection moved up");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
PanelType::Backup => {
|
|
||||||
if direction > 0 {
|
|
||||||
host_widgets.backup_scroll_offset = host_widgets.backup_scroll_offset.saturating_add(1);
|
|
||||||
} else {
|
|
||||||
host_widgets.backup_scroll_offset = host_widgets.backup_scroll_offset.saturating_sub(1);
|
|
||||||
}
|
|
||||||
info!("Backup panel scroll offset: {}", host_widgets.backup_scroll_offset);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
/// Get list of currently visible panels
|
|
||||||
fn get_visible_panels(&self) -> Vec<PanelType> {
|
|
||||||
let mut visible_panels = vec![PanelType::System, PanelType::Services];
|
|
||||||
|
|
||||||
// Check if backup panel should be shown
|
|
||||||
if let Some(hostname) = &self.current_host {
|
|
||||||
if let Some(host_widgets) = self.host_widgets.get(hostname) {
|
|
||||||
if host_widgets.backup_widget.has_data() {
|
|
||||||
visible_panels.push(PanelType::Backup);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
visible_panels
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Render the dashboard (real btop-style multi-panel layout)
|
/// Render the dashboard (real btop-style multi-panel layout)
|
||||||
pub fn render(&mut self, frame: &mut Frame, metric_store: &MetricStore) {
|
pub fn render(&mut self, frame: &mut Frame, metric_store: &MetricStore) {
|
||||||
@@ -621,7 +607,7 @@ impl TuiApp {
|
|||||||
|
|
||||||
// Render services widget for current host
|
// Render services widget for current host
|
||||||
if let Some(hostname) = self.current_host.clone() {
|
if let Some(hostname) = self.current_host.clone() {
|
||||||
let is_focused = self.focused_panel == PanelType::Services;
|
let is_focused = true; // Always show service selection
|
||||||
let (scroll_offset, pending_transitions) = {
|
let (scroll_offset, pending_transitions) = {
|
||||||
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
||||||
(host_widgets.services_scroll_offset, host_widgets.pending_service_transitions.clone())
|
(host_widgets.services_scroll_offset, host_widgets.pending_service_transitions.clone())
|
||||||
@@ -645,48 +631,90 @@ impl TuiApp {
|
|||||||
|
|
||||||
if self.available_hosts.is_empty() {
|
if self.available_hosts.is_empty() {
|
||||||
let title_text = "cm-dashboard • no hosts discovered";
|
let title_text = "cm-dashboard • no hosts discovered";
|
||||||
let title = Paragraph::new(title_text).style(Typography::title());
|
let title = Paragraph::new(title_text)
|
||||||
|
.style(Style::default().fg(Theme::background()).bg(Theme::status_color(Status::Unknown)));
|
||||||
frame.render_widget(title, area);
|
frame.render_widget(title, area);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Create spans for each host with status indicators
|
// Calculate worst-case status across all hosts (excluding offline)
|
||||||
let mut spans = vec![Span::styled("cm-dashboard • ", Typography::title())];
|
let mut worst_status = Status::Ok;
|
||||||
|
for host in &self.available_hosts {
|
||||||
|
let host_status = self.calculate_host_status(host, metric_store);
|
||||||
|
// Don't include offline hosts in status aggregation
|
||||||
|
if host_status != Status::Offline {
|
||||||
|
worst_status = Status::aggregate(&[worst_status, host_status]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use the worst status color as background
|
||||||
|
let background_color = Theme::status_color(worst_status);
|
||||||
|
|
||||||
|
// Split the title bar into left and right sections
|
||||||
|
let chunks = Layout::default()
|
||||||
|
.direction(Direction::Horizontal)
|
||||||
|
.constraints([Constraint::Length(15), Constraint::Min(0)])
|
||||||
|
.split(area);
|
||||||
|
|
||||||
|
// Left side: "cm-dashboard" text
|
||||||
|
let left_span = Span::styled(
|
||||||
|
" cm-dashboard",
|
||||||
|
Style::default().fg(Theme::background()).bg(background_color).add_modifier(Modifier::BOLD)
|
||||||
|
);
|
||||||
|
let left_title = Paragraph::new(Line::from(vec![left_span]))
|
||||||
|
.style(Style::default().bg(background_color));
|
||||||
|
frame.render_widget(left_title, chunks[0]);
|
||||||
|
|
||||||
|
// Right side: hosts with status indicators
|
||||||
|
let mut host_spans = Vec::new();
|
||||||
|
|
||||||
for (i, host) in self.available_hosts.iter().enumerate() {
|
for (i, host) in self.available_hosts.iter().enumerate() {
|
||||||
if i > 0 {
|
if i > 0 {
|
||||||
spans.push(Span::styled(" ", Typography::title()));
|
host_spans.push(Span::styled(
|
||||||
|
" ",
|
||||||
|
Style::default().fg(Theme::background()).bg(background_color)
|
||||||
|
));
|
||||||
}
|
}
|
||||||
|
|
||||||
// Always show normal status icon based on metrics (no command status at host level)
|
// Always show normal status icon based on metrics (no command status at host level)
|
||||||
let host_status = self.calculate_host_status(host, metric_store);
|
let host_status = self.calculate_host_status(host, metric_store);
|
||||||
let (status_icon, status_color) = (StatusIcons::get_icon(host_status), Theme::status_color(host_status));
|
let status_icon = StatusIcons::get_icon(host_status);
|
||||||
|
|
||||||
// Add status icon
|
// Add status icon with background color as foreground against status background
|
||||||
spans.push(Span::styled(
|
host_spans.push(Span::styled(
|
||||||
format!("{} ", status_icon),
|
format!("{} ", status_icon),
|
||||||
Style::default().fg(status_color),
|
Style::default().fg(Theme::background()).bg(background_color),
|
||||||
));
|
));
|
||||||
|
|
||||||
if Some(host) == self.current_host.as_ref() {
|
if Some(host) == self.current_host.as_ref() {
|
||||||
// Selected host in bold bright white
|
// Selected host in bold background color against status background
|
||||||
spans.push(Span::styled(
|
host_spans.push(Span::styled(
|
||||||
host.clone(),
|
host.clone(),
|
||||||
Typography::title().add_modifier(Modifier::BOLD),
|
Style::default()
|
||||||
|
.fg(Theme::background())
|
||||||
|
.bg(background_color)
|
||||||
|
.add_modifier(Modifier::BOLD),
|
||||||
));
|
));
|
||||||
} else {
|
} else {
|
||||||
// Other hosts in normal style with status color
|
// Other hosts in normal background color against status background
|
||||||
spans.push(Span::styled(
|
host_spans.push(Span::styled(
|
||||||
host.clone(),
|
host.clone(),
|
||||||
Style::default().fg(status_color),
|
Style::default().fg(Theme::background()).bg(background_color),
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
let title_line = Line::from(spans);
|
// Add right padding
|
||||||
let title = Paragraph::new(vec![title_line]);
|
host_spans.push(Span::styled(
|
||||||
|
" ",
|
||||||
|
Style::default().fg(Theme::background()).bg(background_color)
|
||||||
|
));
|
||||||
|
|
||||||
frame.render_widget(title, area);
|
let host_line = Line::from(host_spans);
|
||||||
|
let host_title = Paragraph::new(vec![host_line])
|
||||||
|
.style(Style::default().bg(background_color))
|
||||||
|
.alignment(ratatui::layout::Alignment::Right);
|
||||||
|
frame.render_widget(host_title, chunks[1]);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Calculate overall status for a host based on its metrics
|
/// Calculate overall status for a host based on its metrics
|
||||||
@@ -694,7 +722,7 @@ impl TuiApp {
|
|||||||
let metrics = metric_store.get_metrics_for_host(hostname);
|
let metrics = metric_store.get_metrics_for_host(hostname);
|
||||||
|
|
||||||
if metrics.is_empty() {
|
if metrics.is_empty() {
|
||||||
return Status::Unknown;
|
return Status::Offline;
|
||||||
}
|
}
|
||||||
|
|
||||||
// First check if we have the aggregated host status summary from the agent
|
// First check if we have the aggregated host status summary from the agent
|
||||||
@@ -714,7 +742,8 @@ impl TuiApp {
|
|||||||
Status::Warning => has_warning = true,
|
Status::Warning => has_warning = true,
|
||||||
Status::Pending => has_pending = true,
|
Status::Pending => has_pending = true,
|
||||||
Status::Ok => ok_count += 1,
|
Status::Ok => ok_count += 1,
|
||||||
Status::Unknown => {} // Ignore unknown for aggregation
|
Status::Unknown => {}, // Ignore unknown for aggregation
|
||||||
|
Status::Offline => {}, // Ignore offline for aggregation
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -749,39 +778,22 @@ impl TuiApp {
|
|||||||
let mut shortcuts = Vec::new();
|
let mut shortcuts = Vec::new();
|
||||||
|
|
||||||
// Global shortcuts
|
// Global shortcuts
|
||||||
shortcuts.push("Tab: Switch Host".to_string());
|
shortcuts.push("Tab: Host".to_string());
|
||||||
shortcuts.push("Shift+Tab: Switch Panel".to_string());
|
shortcuts.push("↑↓/jk: Select".to_string());
|
||||||
|
shortcuts.push("r: Rebuild".to_string());
|
||||||
// Scroll shortcuts (always available)
|
shortcuts.push("s/S: Start/Stop".to_string());
|
||||||
shortcuts.push("↑↓: Scroll".to_string());
|
shortcuts.push("J: Logs".to_string());
|
||||||
|
shortcuts.push("L: Custom".to_string());
|
||||||
// Panel-specific shortcuts
|
shortcuts.push("w: Wake".to_string());
|
||||||
match self.focused_panel {
|
|
||||||
PanelType::System => {
|
|
||||||
shortcuts.push("R: Rebuild".to_string());
|
|
||||||
}
|
|
||||||
PanelType::Services => {
|
|
||||||
shortcuts.push("S: Start".to_string());
|
|
||||||
shortcuts.push("Shift+S: Stop".to_string());
|
|
||||||
shortcuts.push("R: Restart".to_string());
|
|
||||||
}
|
|
||||||
PanelType::Backup => {
|
|
||||||
shortcuts.push("B: Trigger Backup".to_string());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Always show quit
|
// Always show quit
|
||||||
shortcuts.push("Q: Quit".to_string());
|
shortcuts.push("q: Quit".to_string());
|
||||||
|
|
||||||
shortcuts
|
shortcuts
|
||||||
}
|
}
|
||||||
|
|
||||||
fn render_system_panel(&mut self, frame: &mut Frame, area: Rect, _metric_store: &MetricStore) {
|
fn render_system_panel(&mut self, frame: &mut Frame, area: Rect, _metric_store: &MetricStore) {
|
||||||
let system_block = if self.focused_panel == PanelType::System {
|
let system_block = Components::widget_block("system");
|
||||||
Components::focused_widget_block("system")
|
|
||||||
} else {
|
|
||||||
Components::widget_block("system")
|
|
||||||
};
|
|
||||||
let inner_area = system_block.inner(area);
|
let inner_area = system_block.inner(area);
|
||||||
frame.render_widget(system_block, area);
|
frame.render_widget(system_block, area);
|
||||||
// Get current host widgets, create if none exist
|
// Get current host widgets, create if none exist
|
||||||
@@ -791,16 +803,12 @@ impl TuiApp {
|
|||||||
host_widgets.system_scroll_offset
|
host_widgets.system_scroll_offset
|
||||||
};
|
};
|
||||||
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
let host_widgets = self.get_or_create_host_widgets(&hostname);
|
||||||
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset);
|
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn render_backup_panel(&mut self, frame: &mut Frame, area: Rect) {
|
fn render_backup_panel(&mut self, frame: &mut Frame, area: Rect) {
|
||||||
let backup_block = if self.focused_panel == PanelType::Backup {
|
let backup_block = Components::widget_block("backup");
|
||||||
Components::focused_widget_block("backup")
|
|
||||||
} else {
|
|
||||||
Components::widget_block("backup")
|
|
||||||
};
|
|
||||||
let inner_area = backup_block.inner(area);
|
let inner_area = backup_block.inner(area);
|
||||||
frame.render_widget(backup_block, area);
|
frame.render_widget(backup_block, area);
|
||||||
|
|
||||||
@@ -815,5 +823,20 @@ impl TuiApp {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Parse MAC address string (e.g., "AA:BB:CC:DD:EE:FF") to [u8; 6]
|
||||||
|
fn parse_mac_address(mac_str: &str) -> Result<[u8; 6], &'static str> {
|
||||||
|
let parts: Vec<&str> = mac_str.split(':').collect();
|
||||||
|
if parts.len() != 6 {
|
||||||
|
return Err("MAC address must have 6 parts separated by colons");
|
||||||
|
}
|
||||||
|
|
||||||
|
let mut mac = [0u8; 6];
|
||||||
|
for (i, part) in parts.iter().enumerate() {
|
||||||
|
match u8::from_str_radix(part, 16) {
|
||||||
|
Ok(byte) => mac[i] = byte,
|
||||||
|
Err(_) => return Err("Invalid hexadecimal byte in MAC address"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Ok(mac)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -147,6 +147,7 @@ impl Theme {
|
|||||||
Status::Warning => Self::warning(),
|
Status::Warning => Self::warning(),
|
||||||
Status::Critical => Self::error(),
|
Status::Critical => Self::error(),
|
||||||
Status::Unknown => Self::muted_text(),
|
Status::Unknown => Self::muted_text(),
|
||||||
|
Status::Offline => Self::muted_text(), // Dark gray for offline
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -244,8 +245,9 @@ impl StatusIcons {
|
|||||||
Status::Ok => "●",
|
Status::Ok => "●",
|
||||||
Status::Pending => "◉", // Hollow circle for pending
|
Status::Pending => "◉", // Hollow circle for pending
|
||||||
Status::Warning => "◐",
|
Status::Warning => "◐",
|
||||||
Status::Critical => "◯",
|
Status::Critical => "!",
|
||||||
Status::Unknown => "?",
|
Status::Unknown => "?",
|
||||||
|
Status::Offline => "○", // Empty circle for offline
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -258,6 +260,7 @@ impl StatusIcons {
|
|||||||
Status::Warning => Theme::warning(), // Yellow
|
Status::Warning => Theme::warning(), // Yellow
|
||||||
Status::Critical => Theme::error(), // Red
|
Status::Critical => Theme::error(), // Red
|
||||||
Status::Unknown => Theme::muted_text(), // Gray
|
Status::Unknown => Theme::muted_text(), // Gray
|
||||||
|
Status::Offline => Theme::muted_text(), // Dark gray for offline
|
||||||
};
|
};
|
||||||
|
|
||||||
vec![
|
vec![
|
||||||
@@ -289,27 +292,9 @@ impl Components {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Widget block with focus indicator (blue border)
|
|
||||||
pub fn focused_widget_block(title: &str) -> Block<'_> {
|
|
||||||
Block::default()
|
|
||||||
.title(title)
|
|
||||||
.borders(Borders::ALL)
|
|
||||||
.style(Style::default().fg(Theme::highlight()).bg(Theme::background())) // Blue border for focus
|
|
||||||
.title_style(
|
|
||||||
Style::default()
|
|
||||||
.fg(Theme::highlight()) // Blue title for focus
|
|
||||||
.bg(Theme::background()),
|
|
||||||
)
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Typography {
|
impl Typography {
|
||||||
/// Main title style (dashboard header)
|
|
||||||
pub fn title() -> Style {
|
|
||||||
Style::default()
|
|
||||||
.fg(Theme::primary_text())
|
|
||||||
.bg(Theme::background())
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Widget title style (panel headers) - bold bright white
|
/// Widget title style (panel headers) - bold bright white
|
||||||
pub fn widget_title() -> Style {
|
pub fn widget_title() -> Style {
|
||||||
|
|||||||
@@ -113,13 +113,10 @@ impl ServicesWidget {
|
|||||||
name.to_string()
|
name.to_string()
|
||||||
};
|
};
|
||||||
|
|
||||||
// Parent services always show active/inactive status
|
// Parent services always show actual systemctl status
|
||||||
let status_str = match info.widget_status {
|
let status_str = match info.widget_status {
|
||||||
Status::Ok => "active".to_string(),
|
|
||||||
Status::Pending => "pending".to_string(),
|
Status::Pending => "pending".to_string(),
|
||||||
Status::Warning => "inactive".to_string(),
|
_ => info.status.clone(), // Use actual status from agent (active/inactive/failed)
|
||||||
Status::Critical => "failed".to_string(),
|
|
||||||
Status::Unknown => "unknown".to_string(),
|
|
||||||
};
|
};
|
||||||
|
|
||||||
format!(
|
format!(
|
||||||
@@ -129,12 +126,11 @@ impl ServicesWidget {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Get status icon for service, considering pending transitions for visual feedback
|
/// Get status icon for service, considering pending transitions for visual feedback
|
||||||
fn get_service_icon_and_status(&self, service_name: &str, info: &ServiceInfo, pending_transitions: &HashMap<String, (CommandType, String)>) -> (String, String, ratatui::prelude::Color) {
|
fn get_service_icon_and_status(&self, service_name: &str, info: &ServiceInfo, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) -> (String, String, ratatui::prelude::Color) {
|
||||||
// Check if this service has a pending transition
|
// Check if this service has a pending transition
|
||||||
if let Some((command_type, _original_status)) = pending_transitions.get(service_name) {
|
if let Some((command_type, _original_status, _start_time)) = pending_transitions.get(service_name) {
|
||||||
// Show transitional icons for pending commands
|
// Show transitional icons for pending commands
|
||||||
let (icon, status_text) = match command_type {
|
let (icon, status_text) = match command_type {
|
||||||
CommandType::ServiceRestart => ("↻", "restarting"),
|
|
||||||
CommandType::ServiceStart => ("↑", "starting"),
|
CommandType::ServiceStart => ("↑", "starting"),
|
||||||
CommandType::ServiceStop => ("↓", "stopping"),
|
CommandType::ServiceStop => ("↓", "stopping"),
|
||||||
_ => return (StatusIcons::get_icon(info.widget_status).to_string(), info.status.clone(), Theme::status_color(info.widget_status)), // Not a service command
|
_ => return (StatusIcons::get_icon(info.widget_status).to_string(), info.status.clone(), Theme::status_color(info.widget_status)), // Not a service command
|
||||||
@@ -150,6 +146,7 @@ impl ServicesWidget {
|
|||||||
Status::Warning => Theme::warning(),
|
Status::Warning => Theme::warning(),
|
||||||
Status::Critical => Theme::error(),
|
Status::Critical => Theme::error(),
|
||||||
Status::Unknown => Theme::muted_text(),
|
Status::Unknown => Theme::muted_text(),
|
||||||
|
Status::Offline => Theme::muted_text(),
|
||||||
};
|
};
|
||||||
|
|
||||||
(icon.to_string(), info.status.clone(), status_color)
|
(icon.to_string(), info.status.clone(), status_color)
|
||||||
@@ -162,7 +159,7 @@ impl ServicesWidget {
|
|||||||
name: &str,
|
name: &str,
|
||||||
info: &ServiceInfo,
|
info: &ServiceInfo,
|
||||||
is_last: bool,
|
is_last: bool,
|
||||||
pending_transitions: &HashMap<String, (CommandType, String)>,
|
pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>,
|
||||||
) -> Vec<ratatui::text::Span<'static>> {
|
) -> Vec<ratatui::text::Span<'static>> {
|
||||||
// Truncate long sub-service names to fit layout (accounting for indentation)
|
// Truncate long sub-service names to fit layout (accounting for indentation)
|
||||||
let short_name = if name.len() > 18 {
|
let short_name = if name.len() > 18 {
|
||||||
@@ -233,13 +230,14 @@ impl ServicesWidget {
|
|||||||
/// Get currently selected service name (for actions)
|
/// Get currently selected service name (for actions)
|
||||||
pub fn get_selected_service(&self) -> Option<String> {
|
pub fn get_selected_service(&self) -> Option<String> {
|
||||||
// Build the same display list to find the selected service
|
// Build the same display list to find the selected service
|
||||||
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>)> = Vec::new();
|
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>, String)> = Vec::new();
|
||||||
|
|
||||||
let mut parent_services: Vec<_> = self.parent_services.iter().collect();
|
let mut parent_services: Vec<_> = self.parent_services.iter().collect();
|
||||||
parent_services.sort_by(|(a, _), (b, _)| a.cmp(b));
|
parent_services.sort_by(|(a, _), (b, _)| a.cmp(b));
|
||||||
|
|
||||||
for (parent_name, parent_info) in parent_services {
|
for (parent_name, parent_info) in parent_services {
|
||||||
display_lines.push((parent_name.clone(), parent_info.widget_status, false, None));
|
let parent_line = self.format_parent_service_line(parent_name, parent_info);
|
||||||
|
display_lines.push((parent_line, parent_info.widget_status, false, None, parent_name.clone()));
|
||||||
|
|
||||||
if let Some(sub_list) = self.sub_services.get(parent_name) {
|
if let Some(sub_list) = self.sub_services.get(parent_name) {
|
||||||
let mut sorted_subs = sub_list.clone();
|
let mut sorted_subs = sub_list.clone();
|
||||||
@@ -247,17 +245,19 @@ impl ServicesWidget {
|
|||||||
|
|
||||||
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
|
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
|
||||||
let is_last_sub = i == sorted_subs.len() - 1;
|
let is_last_sub = i == sorted_subs.len() - 1;
|
||||||
|
let full_sub_name = format!("{}_{}", parent_name, sub_name);
|
||||||
display_lines.push((
|
display_lines.push((
|
||||||
format!("{}_{}", parent_name, sub_name), // Use parent_sub format for sub-services
|
sub_name.clone(),
|
||||||
sub_info.widget_status,
|
sub_info.widget_status,
|
||||||
true,
|
true,
|
||||||
Some((sub_info.clone(), is_last_sub)),
|
Some((sub_info.clone(), is_last_sub)),
|
||||||
|
full_sub_name,
|
||||||
));
|
));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
display_lines.get(self.selected_index).map(|(name, _, _, _)| name.clone())
|
display_lines.get(self.selected_index).map(|(_, _, _, _, raw_name)| raw_name.clone())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Get total count of selectable services (parent services only, not sub-services)
|
/// Get total count of selectable services (parent services only, not sub-services)
|
||||||
@@ -440,12 +440,8 @@ impl Widget for ServicesWidget {
|
|||||||
impl ServicesWidget {
|
impl ServicesWidget {
|
||||||
|
|
||||||
/// Render with focus, scroll, and pending transitions for visual feedback
|
/// Render with focus, scroll, and pending transitions for visual feedback
|
||||||
pub fn render_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String)>) {
|
pub fn render_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) {
|
||||||
let services_block = if is_focused {
|
let services_block = Components::widget_block("services");
|
||||||
Components::focused_widget_block("services")
|
|
||||||
} else {
|
|
||||||
Components::widget_block("services")
|
|
||||||
};
|
|
||||||
let inner_area = services_block.inner(area);
|
let inner_area = services_block.inner(area);
|
||||||
frame.render_widget(services_block, area);
|
frame.render_widget(services_block, area);
|
||||||
|
|
||||||
@@ -474,9 +470,9 @@ impl ServicesWidget {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Render services list with pending transitions awareness
|
/// Render services list with pending transitions awareness
|
||||||
fn render_services_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String)>) {
|
fn render_services_with_transitions(&mut self, frame: &mut Frame, area: Rect, is_focused: bool, scroll_offset: usize, pending_transitions: &HashMap<String, (CommandType, String, std::time::Instant)>) {
|
||||||
// Build hierarchical service list for display (same as existing logic)
|
// Build hierarchical service list for display - include raw service name for pending transition lookups
|
||||||
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>)> = Vec::new();
|
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>, String)> = Vec::new(); // Added raw service name
|
||||||
|
|
||||||
// Sort parent services alphabetically for consistent order
|
// Sort parent services alphabetically for consistent order
|
||||||
let mut parent_services: Vec<_> = self.parent_services.iter().collect();
|
let mut parent_services: Vec<_> = self.parent_services.iter().collect();
|
||||||
@@ -485,7 +481,7 @@ impl ServicesWidget {
|
|||||||
for (parent_name, parent_info) in parent_services {
|
for (parent_name, parent_info) in parent_services {
|
||||||
// Add parent service line
|
// Add parent service line
|
||||||
let parent_line = self.format_parent_service_line(parent_name, parent_info);
|
let parent_line = self.format_parent_service_line(parent_name, parent_info);
|
||||||
display_lines.push((parent_line, parent_info.widget_status, false, None)); // false = not sub-service
|
display_lines.push((parent_line, parent_info.widget_status, false, None, parent_name.clone())); // Include raw name
|
||||||
|
|
||||||
// Add sub-services for this parent (if any)
|
// Add sub-services for this parent (if any)
|
||||||
if let Some(sub_list) = self.sub_services.get(parent_name) {
|
if let Some(sub_list) = self.sub_services.get(parent_name) {
|
||||||
@@ -495,12 +491,14 @@ impl ServicesWidget {
|
|||||||
|
|
||||||
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
|
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
|
||||||
let is_last_sub = i == sorted_subs.len() - 1;
|
let is_last_sub = i == sorted_subs.len() - 1;
|
||||||
|
let full_sub_name = format!("{}_{}", parent_name, sub_name);
|
||||||
// Store sub-service info for custom span rendering
|
// Store sub-service info for custom span rendering
|
||||||
display_lines.push((
|
display_lines.push((
|
||||||
sub_name.clone(),
|
sub_name.clone(),
|
||||||
sub_info.widget_status,
|
sub_info.widget_status,
|
||||||
true,
|
true,
|
||||||
Some((sub_info.clone(), is_last_sub)),
|
Some((sub_info.clone(), is_last_sub)),
|
||||||
|
full_sub_name, // Raw service name for pending transition lookup
|
||||||
)); // true = sub-service, with is_last info
|
)); // true = sub-service, with is_last info
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -533,7 +531,7 @@ impl ServicesWidget {
|
|||||||
.constraints(vec![Constraint::Length(1); lines_to_show])
|
.constraints(vec![Constraint::Length(1); lines_to_show])
|
||||||
.split(area);
|
.split(area);
|
||||||
|
|
||||||
for (i, (line_text, line_status, is_sub, sub_info)) in visible_lines.iter().enumerate()
|
for (i, (line_text, line_status, is_sub, sub_info, raw_service_name)) in visible_lines.iter().enumerate()
|
||||||
{
|
{
|
||||||
let actual_index = effective_scroll + i; // Real index in the full list
|
let actual_index = effective_scroll + i; // Real index in the full list
|
||||||
|
|
||||||
@@ -551,34 +549,44 @@ impl ServicesWidget {
|
|||||||
let (service_info, is_last) = sub_info.as_ref().unwrap();
|
let (service_info, is_last) = sub_info.as_ref().unwrap();
|
||||||
self.create_sub_service_spans_with_transitions(line_text, service_info, *is_last, pending_transitions)
|
self.create_sub_service_spans_with_transitions(line_text, service_info, *is_last, pending_transitions)
|
||||||
} else {
|
} else {
|
||||||
// Parent services - check if this parent service has a pending transition
|
// Parent services - check if this parent service has a pending transition using RAW service name
|
||||||
if pending_transitions.contains_key(line_text) {
|
if pending_transitions.contains_key(raw_service_name) {
|
||||||
// Create spans with transitional status
|
// Create spans with transitional status
|
||||||
let (icon, status_text, status_color) = self.get_service_icon_and_status(line_text, &ServiceInfo {
|
let (icon, status_text, _) = self.get_service_icon_and_status(raw_service_name, &ServiceInfo {
|
||||||
status: "".to_string(),
|
status: "".to_string(),
|
||||||
memory_mb: None,
|
memory_mb: None,
|
||||||
disk_gb: None,
|
disk_gb: None,
|
||||||
latency_ms: None,
|
latency_ms: None,
|
||||||
widget_status: *line_status
|
widget_status: *line_status
|
||||||
}, pending_transitions);
|
}, pending_transitions);
|
||||||
|
|
||||||
|
// Use blue for transitional icons when not selected, background color when selected
|
||||||
|
let icon_color = if is_selected && !*is_sub && is_focused {
|
||||||
|
Theme::background() // Dark background color for visibility against blue selection
|
||||||
|
} else {
|
||||||
|
Theme::highlight() // Blue for normal case
|
||||||
|
};
|
||||||
|
|
||||||
vec![
|
vec![
|
||||||
ratatui::text::Span::styled(format!("{} ", icon), Style::default().fg(status_color)),
|
ratatui::text::Span::styled(format!("{} ", icon), Style::default().fg(icon_color)),
|
||||||
ratatui::text::Span::styled(line_text.clone(), Style::default().fg(Theme::primary_text())),
|
ratatui::text::Span::styled(line_text.clone(), Style::default().fg(Theme::primary_text())),
|
||||||
ratatui::text::Span::styled(format!(" {}", status_text), Style::default().fg(status_color)),
|
ratatui::text::Span::styled(format!(" {}", status_text), Style::default().fg(icon_color)),
|
||||||
]
|
]
|
||||||
} else {
|
} else {
|
||||||
StatusIcons::create_status_spans(*line_status, line_text)
|
StatusIcons::create_status_spans(*line_status, line_text)
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
// Apply selection highlighting to parent services only, preserving status icon color
|
// Apply selection highlighting to parent services only, making icons background color when selected
|
||||||
// Only show selection when Services panel is focused
|
// Only show selection when Services panel is focused
|
||||||
// IMPORTANT: Don't override transitional icons that show pending commands
|
// Show selection highlighting even when transitional icons are present
|
||||||
if is_selected && !*is_sub && is_focused && !pending_transitions.contains_key(line_text) {
|
if is_selected && !*is_sub && is_focused {
|
||||||
for (i, span) in spans.iter_mut().enumerate() {
|
for (i, span) in spans.iter_mut().enumerate() {
|
||||||
if i == 0 {
|
if i == 0 {
|
||||||
// First span is the status icon - preserve its color
|
// First span is the status icon - use background color for visibility against blue selection
|
||||||
span.style = span.style.bg(Theme::highlight());
|
span.style = span.style
|
||||||
|
.bg(Theme::highlight())
|
||||||
|
.fg(Theme::background());
|
||||||
} else {
|
} else {
|
||||||
// Other spans (text) get full selection highlighting
|
// Other spans (text) get full selection highlighting
|
||||||
span.style = span.style
|
span.style = span.style
|
||||||
|
|||||||
@@ -230,9 +230,30 @@ impl SystemWidget {
|
|||||||
|
|
||||||
/// Extract pool name from disk metric name
|
/// Extract pool name from disk metric name
|
||||||
fn extract_pool_name(&self, metric_name: &str) -> Option<String> {
|
fn extract_pool_name(&self, metric_name: &str) -> Option<String> {
|
||||||
if let Some(captures) = metric_name.strip_prefix("disk_") {
|
// Pattern: disk_{pool_name}_{drive_name}_{metric_type}
|
||||||
if let Some(pos) = captures.find('_') {
|
// Since pool_name can contain underscores, work backwards from known metric suffixes
|
||||||
return Some(captures[..pos].to_string());
|
if metric_name.starts_with("disk_") {
|
||||||
|
// First try drive-specific metrics that have device names
|
||||||
|
if let Some(suffix_pos) = metric_name.rfind("_temperature")
|
||||||
|
.or_else(|| metric_name.rfind("_wear_percent"))
|
||||||
|
.or_else(|| metric_name.rfind("_health")) {
|
||||||
|
// Find the second-to-last underscore to get pool name
|
||||||
|
let before_suffix = &metric_name[..suffix_pos];
|
||||||
|
if let Some(drive_start) = before_suffix.rfind('_') {
|
||||||
|
return Some(metric_name[5..drive_start].to_string()); // Skip "disk_"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// For pool-level metrics (usage_percent, used_gb, total_gb), take everything before the metric suffix
|
||||||
|
else if let Some(suffix_pos) = metric_name.rfind("_usage_percent")
|
||||||
|
.or_else(|| metric_name.rfind("_used_gb"))
|
||||||
|
.or_else(|| metric_name.rfind("_total_gb")) {
|
||||||
|
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
|
||||||
|
}
|
||||||
|
// Fallback to old behavior for unknown patterns
|
||||||
|
else if let Some(captures) = metric_name.strip_prefix("disk_") {
|
||||||
|
if let Some(pos) = captures.find('_') {
|
||||||
|
return Some(captures[..pos].to_string());
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
None
|
None
|
||||||
@@ -240,10 +261,18 @@ impl SystemWidget {
|
|||||||
|
|
||||||
/// Extract drive name from disk metric name
|
/// Extract drive name from disk metric name
|
||||||
fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
|
fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
|
||||||
// Pattern: disk_pool_drive_metric
|
// Pattern: disk_{pool_name}_{drive_name}_{metric_type}
|
||||||
let parts: Vec<&str> = metric_name.split('_').collect();
|
// Since pool_name can contain underscores, work backwards from known metric suffixes
|
||||||
if parts.len() >= 3 && parts[0] == "disk" {
|
if metric_name.starts_with("disk_") {
|
||||||
return Some(parts[2].to_string());
|
if let Some(suffix_pos) = metric_name.rfind("_temperature")
|
||||||
|
.or_else(|| metric_name.rfind("_wear_percent"))
|
||||||
|
.or_else(|| metric_name.rfind("_health")) {
|
||||||
|
// Find the second-to-last underscore to get the drive name
|
||||||
|
let before_suffix = &metric_name[..suffix_pos];
|
||||||
|
if let Some(drive_start) = before_suffix.rfind('_') {
|
||||||
|
return Some(before_suffix[drive_start + 1..].to_string());
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
None
|
None
|
||||||
}
|
}
|
||||||
@@ -410,12 +439,12 @@ impl Widget for SystemWidget {
|
|||||||
|
|
||||||
impl SystemWidget {
|
impl SystemWidget {
|
||||||
/// Render with scroll offset support
|
/// Render with scroll offset support
|
||||||
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize) {
|
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str) {
|
||||||
let mut lines = Vec::new();
|
let mut lines = Vec::new();
|
||||||
|
|
||||||
// NixOS section
|
// NixOS section
|
||||||
lines.push(Line::from(vec![
|
lines.push(Line::from(vec![
|
||||||
Span::styled("NixOS:", Typography::widget_title())
|
Span::styled(format!("NixOS {}:", hostname), Typography::widget_title())
|
||||||
]));
|
]));
|
||||||
|
|
||||||
let build_text = self.nixos_build.as_deref().unwrap_or("unknown");
|
let build_text = self.nixos_build.as_deref().unwrap_or("unknown");
|
||||||
|
|||||||
@@ -1,88 +0,0 @@
|
|||||||
# Hardcoded Values Removed - Configuration Summary
|
|
||||||
|
|
||||||
## ✅ All Hardcoded Values Converted to Configuration
|
|
||||||
|
|
||||||
### **1. SystemD Nginx Check Interval**
|
|
||||||
- **Before**: `nginx_check_interval_seconds: 30` (hardcoded)
|
|
||||||
- **After**: `nginx_check_interval_seconds: config.nginx_check_interval_seconds`
|
|
||||||
- **NixOS Config**: `nginx_check_interval_seconds = 30;`
|
|
||||||
|
|
||||||
### **2. ZMQ Transmission Interval**
|
|
||||||
- **Before**: `Duration::from_secs(1)` (hardcoded)
|
|
||||||
- **After**: `Duration::from_secs(self.config.zmq.transmission_interval_seconds)`
|
|
||||||
- **NixOS Config**: `transmission_interval_seconds = 1;`
|
|
||||||
|
|
||||||
### **3. HTTP Timeouts in SystemD Collector**
|
|
||||||
- **Before**:
|
|
||||||
```rust
|
|
||||||
.timeout(Duration::from_secs(10))
|
|
||||||
.connect_timeout(Duration::from_secs(10))
|
|
||||||
```
|
|
||||||
- **After**:
|
|
||||||
```rust
|
|
||||||
.timeout(Duration::from_secs(self.config.http_timeout_seconds))
|
|
||||||
.connect_timeout(Duration::from_secs(self.config.http_connect_timeout_seconds))
|
|
||||||
```
|
|
||||||
- **NixOS Config**:
|
|
||||||
```nix
|
|
||||||
http_timeout_seconds = 10;
|
|
||||||
http_connect_timeout_seconds = 10;
|
|
||||||
```
|
|
||||||
|
|
||||||
## **Configuration Structure Changes**
|
|
||||||
|
|
||||||
### **SystemdConfig** (agent/src/config/mod.rs)
|
|
||||||
```rust
|
|
||||||
pub struct SystemdConfig {
|
|
||||||
// ... existing fields ...
|
|
||||||
pub nginx_check_interval_seconds: u64, // NEW
|
|
||||||
pub http_timeout_seconds: u64, // NEW
|
|
||||||
pub http_connect_timeout_seconds: u64, // NEW
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### **ZmqConfig** (agent/src/config/mod.rs)
|
|
||||||
```rust
|
|
||||||
pub struct ZmqConfig {
|
|
||||||
// ... existing fields ...
|
|
||||||
pub transmission_interval_seconds: u64, // NEW
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## **NixOS Configuration Updates**
|
|
||||||
|
|
||||||
### **ZMQ Section** (hosts/common/cm-dashboard.nix)
|
|
||||||
```nix
|
|
||||||
zmq = {
|
|
||||||
# ... existing fields ...
|
|
||||||
transmission_interval_seconds = 1; # NEW
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
### **SystemD Section** (hosts/common/cm-dashboard.nix)
|
|
||||||
```nix
|
|
||||||
systemd = {
|
|
||||||
# ... existing fields ...
|
|
||||||
nginx_check_interval_seconds = 30; # NEW
|
|
||||||
http_timeout_seconds = 10; # NEW
|
|
||||||
http_connect_timeout_seconds = 10; # NEW
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
## **Benefits**
|
|
||||||
|
|
||||||
✅ **No hardcoded values** - All timing/timeout values configurable
|
|
||||||
✅ **Consistent configuration** - Everything follows NixOS config pattern
|
|
||||||
✅ **Environment-specific tuning** - Can adjust timeouts per deployment
|
|
||||||
✅ **Maintainability** - No magic numbers scattered in code
|
|
||||||
✅ **Testing flexibility** - Can configure different values for testing
|
|
||||||
|
|
||||||
## **Runtime Behavior**
|
|
||||||
|
|
||||||
All previously hardcoded values now respect configuration:
|
|
||||||
- **Nginx latency checks**: Every 30s (configurable)
|
|
||||||
- **ZMQ transmission**: Every 1s (configurable)
|
|
||||||
- **HTTP requests**: 10s timeout (configurable)
|
|
||||||
- **HTTP connections**: 10s timeout (configurable)
|
|
||||||
|
|
||||||
The codebase is now **100% configuration-driven** with no hardcoded timing values.
|
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "cm-dashboard-shared"
|
name = "cm-dashboard-shared"
|
||||||
version = "0.1.25"
|
version = "0.1.57"
|
||||||
edition = "2021"
|
edition = "2021"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
|
|||||||
@@ -87,6 +87,7 @@ pub enum Status {
|
|||||||
Warning,
|
Warning,
|
||||||
Critical,
|
Critical,
|
||||||
Unknown,
|
Unknown,
|
||||||
|
Offline,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Status {
|
impl Status {
|
||||||
@@ -190,6 +191,16 @@ impl HysteresisThresholds {
|
|||||||
Status::Ok
|
Status::Ok
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
Status::Offline => {
|
||||||
|
// Host coming back online, use normal thresholds like first measurement
|
||||||
|
if value >= self.critical_high {
|
||||||
|
Status::Critical
|
||||||
|
} else if value >= self.warning_high {
|
||||||
|
Status::Warning
|
||||||
|
} else {
|
||||||
|
Status::Ok
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,42 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
# Test script to verify collector intervals are working correctly
|
|
||||||
# Expected behavior:
|
|
||||||
# - CPU/Memory: Every 2 seconds
|
|
||||||
# - Systemd/Network: Every 10 seconds
|
|
||||||
# - Backup/NixOS: Every 60 seconds
|
|
||||||
# - Disk: Every 300 seconds (5 minutes)
|
|
||||||
|
|
||||||
echo "=== Testing Collector Interval Implementation ==="
|
|
||||||
echo "Expected intervals from NixOS config:"
|
|
||||||
echo " CPU: 2s, Memory: 2s"
|
|
||||||
echo " Systemd: 10s, Network: 10s"
|
|
||||||
echo " Backup: 60s, NixOS: 60s"
|
|
||||||
echo " Disk: 300s (5m)"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Note: Cannot run actual agent without proper config, but we can verify the code logic
|
|
||||||
echo "✅ Code Implementation Status:"
|
|
||||||
echo " - TimedCollector struct with interval tracking: IMPLEMENTED"
|
|
||||||
echo " - Individual collector intervals from config: IMPLEMENTED"
|
|
||||||
echo " - collect_metrics_timed() respects intervals: IMPLEMENTED"
|
|
||||||
echo " - Debug logging shows interval compliance: IMPLEMENTED"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "🔍 Key Implementation Details:"
|
|
||||||
echo " - MetricCollectionManager now tracks last_collection time per collector"
|
|
||||||
echo " - Each collector gets Duration::from_secs(config.{collector}.interval_seconds)"
|
|
||||||
echo " - Only collectors with elapsed >= interval are called"
|
|
||||||
echo " - Debug logs show actual collection with interval info"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "📊 Expected Runtime Behavior:"
|
|
||||||
echo " At 0s: All collectors run (startup)"
|
|
||||||
echo " At 2s: CPU, Memory run"
|
|
||||||
echo " At 4s: CPU, Memory run"
|
|
||||||
echo " At 10s: CPU, Memory, Systemd, Network run"
|
|
||||||
echo " At 60s: CPU, Memory, Systemd, Network, Backup, NixOS run"
|
|
||||||
echo " At 300s: All collectors run including Disk"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "✅ CONCLUSION: Codebase now follows NixOS configuration intervals correctly!"
|
|
||||||
@@ -1,32 +0,0 @@
|
|||||||
#!/usr/bin/env rust-script
|
|
||||||
|
|
||||||
use std::process;
|
|
||||||
|
|
||||||
/// Check if running inside tmux session
|
|
||||||
fn check_tmux_session() {
|
|
||||||
// Check for TMUX environment variable which is set when inside a tmux session
|
|
||||||
if std::env::var("TMUX").is_err() {
|
|
||||||
eprintln!("╭─────────────────────────────────────────────────────────────╮");
|
|
||||||
eprintln!("│ ⚠️ TMUX REQUIRED │");
|
|
||||||
eprintln!("├─────────────────────────────────────────────────────────────┤");
|
|
||||||
eprintln!("│ CM Dashboard must be run inside a tmux session for proper │");
|
|
||||||
eprintln!("│ terminal handling and remote operation functionality. │");
|
|
||||||
eprintln!("│ │");
|
|
||||||
eprintln!("│ Please start a tmux session first: │");
|
|
||||||
eprintln!("│ tmux new-session -d -s dashboard cm-dashboard │");
|
|
||||||
eprintln!("│ tmux attach-session -t dashboard │");
|
|
||||||
eprintln!("│ │");
|
|
||||||
eprintln!("│ Or simply: │");
|
|
||||||
eprintln!("│ tmux │");
|
|
||||||
eprintln!("│ cm-dashboard │");
|
|
||||||
eprintln!("╰─────────────────────────────────────────────────────────────╯");
|
|
||||||
process::exit(1);
|
|
||||||
} else {
|
|
||||||
println!("✅ Running inside tmux session - OK");
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
fn main() {
|
|
||||||
println!("Testing tmux check function...");
|
|
||||||
check_tmux_session();
|
|
||||||
}
|
|
||||||
@@ -1,53 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
echo "=== TMUX Check Implementation Test ==="
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "📋 Testing tmux check logic:"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "1. Current environment:"
|
|
||||||
if [ -n "$TMUX" ]; then
|
|
||||||
echo " ✅ Running inside tmux session"
|
|
||||||
echo " TMUX variable: $TMUX"
|
|
||||||
else
|
|
||||||
echo " ❌ NOT running inside tmux session"
|
|
||||||
echo " TMUX variable: (not set)"
|
|
||||||
fi
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "2. Simulating dashboard tmux check logic:"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
# Simulate the Rust check logic
|
|
||||||
if [ -z "$TMUX" ]; then
|
|
||||||
echo " Dashboard would show:"
|
|
||||||
echo " ╭─────────────────────────────────────────────────────────────╮"
|
|
||||||
echo " │ ⚠️ TMUX REQUIRED │"
|
|
||||||
echo " ├─────────────────────────────────────────────────────────────┤"
|
|
||||||
echo " │ CM Dashboard must be run inside a tmux session for proper │"
|
|
||||||
echo " │ terminal handling and remote operation functionality. │"
|
|
||||||
echo " │ │"
|
|
||||||
echo " │ Please start a tmux session first: │"
|
|
||||||
echo " │ tmux new-session -d -s dashboard cm-dashboard │"
|
|
||||||
echo " │ tmux attach-session -t dashboard │"
|
|
||||||
echo " │ │"
|
|
||||||
echo " │ Or simply: │"
|
|
||||||
echo " │ tmux │"
|
|
||||||
echo " │ cm-dashboard │"
|
|
||||||
echo " ╰─────────────────────────────────────────────────────────────╯"
|
|
||||||
echo " Then exit with code 1"
|
|
||||||
else
|
|
||||||
echo " ✅ Dashboard tmux check would PASS - continuing normally"
|
|
||||||
fi
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "3. Implementation status:"
|
|
||||||
echo " ✅ check_tmux_session() function added to dashboard/src/main.rs"
|
|
||||||
echo " ✅ Called early in main() but only for TUI mode (not headless)"
|
|
||||||
echo " ✅ Uses std::env::var(\"TMUX\") to detect tmux session"
|
|
||||||
echo " ✅ Shows helpful error message with usage instructions"
|
|
||||||
echo " ✅ Exits with code 1 if not in tmux"
|
|
||||||
echo ""
|
|
||||||
|
|
||||||
echo "✅ TMUX check implementation complete!"
|
|
||||||
Reference in New Issue
Block a user