cm-dashboard/CLAUDE.md
Christoffer Martinsson d12689f3b5 Update CLAUDE.md to reflect simplified navigation and current status
Updated documentation to reflect major UI improvements:

- Documented simplified navigation system (no more panel switching)
- Updated current status to October 28, 2025 with v0.1.33
- Described complete service discovery and visibility features
- Added vi-style j/k navigation documentation
- Removed outdated panel-focused navigation descriptions
- Updated visual feedback documentation for transitional icons
- Consolidated service discovery achievements and current working state
2025-10-28 17:00:40 +01:00

16 KiB

CM Dashboard - Infrastructure Monitoring TUI

Overview

A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection.

Implementation Strategy

Current Implementation Status

System Panel Enhancement - COMPLETED

All system panel features successfully implemented:

  • NixOS Collector: Created collector for version and active users
  • System Widget: Unified widget combining NixOS, CPU, RAM, and Storage
  • Build Display: Shows NixOS build information without codename
  • Active Users: Displays currently logged in users
  • Tmpfs Monitoring: Added /tmp usage to RAM section
  • Agent Deployment: NixOS collector working in production

Simplified Navigation and Service Management - COMPLETED

All navigation and service management features successfully implemented:

  • Direct Service Control: Up/Down (or j/k) arrows directly control service selection
  • Always Visible Selection: Service selection highlighting always visible (no panel focus needed)
  • Complete Service Discovery: All configured services visible regardless of state
  • Transitional Visual Feedback: Service operations show directional arrows (↑ ↓ ↻)
  • Simplified Interface: Removed panel switching complexity, uniform appearance
  • Vi-style Navigation: Added j/k keys for vim users alongside arrow keys

Current Status - October 28, 2025:

  • All service discovery and display features working correctly
  • Simplified navigation system implemented
  • Service selection always visible with direct control
  • Complete service visibility (all configured services show regardless of state)
  • Transitional service icons working with proper color handling
  • Build display working: "Build: 25.05.20251004.3bcc93c"
  • Agent version display working: "Agent: v0.1.33"
  • Cross-host version comparison implemented
  • Automated binary release system working
  • SMART data consolidated into disk collector

RESOLVED - Remote Rebuild Functionality:

  • System Rebuild: Now uses simple SSH + tmux popup approach
  • Process Isolation: Rebuild runs independently via SSH, survives agent/dashboard restarts
  • Configuration: SSH user and rebuild alias configurable in dashboard config
  • Service Control: Works correctly for start/stop/restart of services

Solution Implemented:

  • Replaced complex SystemRebuild command infrastructure with direct tmux popup
  • Uses tmux display-popup "ssh -tt {user}@{hostname} 'bash -ic {alias}'"
  • Configurable SSH user and rebuild alias in dashboard config
  • Eliminates all agent crashes during rebuilds
  • Simple, reliable, and follows standard tmux interface patterns

Current Layout:

NixOS:
Build: 25.05.20251004.3bcc93c
Agent: v0.1.17   # Shows agent version from Cargo.toml
Active users: cm, simon
CPU:
● Load: 0.02 0.31 0.86 • 3000MHz
RAM:
● Usage: 33% 2.6GB/7.6GB  
● /tmp: 0% 0B/2.0GB  
Storage:  
● root (Single):  
 ├─ ● nvme0n1 W: 1%
 └─ ● 18% 167.4GB/928.2GB

System panel layout fully implemented with blue tree symbols Tree symbols now use consistent blue theming across all panels Overflow handling restored for all widgets ("... and X more") Agent version display working correctly Cross-host version comparison logging warnings Backup panel visibility fixed - only shows when meaningful data exists SSH-based rebuild system fully implemented and working

Current Simplified Navigation Implementation

Navigation Controls:

  • Tab: Switch between hosts (cmbox, srv01, srv02, steambox, etc.)
  • ↑↓ or j/k: Move service selection cursor (always works)
  • q: Quit dashboard

Service Control:

  • s: Start selected service
  • S: Stop selected service
  • R: Rebuild current host (works from any context)

Visual Features:

  • Service Selection: Always visible blue background highlighting current service
  • Status Icons: Green ● (active), Yellow ◐ (inactive), Red ◯ (failed), ? (unknown)
  • Transitional Icons: Blue ↑ (starting), ↓ (stopping), ↻ (restarting) when not selected
  • Transitional Icons: Dark gray arrows when service is selected (for visibility)
  • Uniform Interface: All panels have consistent appearance (no focus borders)

Service Discovery and Display - WORKING

All Issues Resolved (as of 2025-10-28):

  • Complete Service Discovery: Uses systemctl list-unit-files + list-units --all for comprehensive service detection
  • All Services Visible: Shows all configured services regardless of current state (active/inactive)
  • Proper Status Display: Active services show green ●, inactive show yellow ◐, failed show red ◯
  • Transitional Icons: Visual feedback during service operations with proper color handling
  • Simplified Navigation: Removed panel complexity, direct service control always available
  • Service Control: Start (s) and Stop (S) commands work from anywhere
  • System Rebuild: SSH + tmux popup approach for reliable remote rebuilds

Terminal Popup for Real-time Output - IMPLEMENTED

Status (as of 2025-10-26):

  • Terminal Popup UI: 80% screen coverage with terminal styling and color-coded output
  • ZMQ Streaming Protocol: CommandOutputMessage for real-time output transmission
  • Keyboard Controls: ESC/Q to close, ↑↓ to scroll, manual close (no auto-close)
  • Real-time Display: Live streaming of command output as it happens
  • Version-based Agent Reporting: Shows "Agent: v0.1.13" instead of nix store hash

Current Implementation Issues:

  • Agent Process Crashes: Agent dies during nixos-rebuild execution
  • Inconsistent Output: Different outputs each time 'R' is pressed
  • Limited Output Visibility: Not capturing all nixos-rebuild progress

PLANNED SOLUTION - Systemd Service Approach:

Problem: Direct nixos-rebuild execution in agent causes process crashes and inconsistent output.

Solution: Create dedicated systemd service for rebuild operations.

Implementation Plan:

  1. NixOS Systemd Service:

    systemd.services.cm-rebuild = {
      description = "CM Dashboard NixOS Rebuild";
      serviceConfig = {
        Type = "oneshot";
        ExecStart = "${pkgs.nixos-rebuild}/bin/nixos-rebuild switch --flake . --option sandbox false";
        WorkingDirectory = "/var/lib/cm-dashboard/nixos-config";
        User = "root";
        StandardOutput = "journal";
        StandardError = "journal";
      };
    };
    
  2. Agent Modification:

    • Replace direct nixos-rebuild execution with: systemctl start cm-rebuild
    • Stream output via: journalctl -u cm-rebuild -f --no-pager
    • Monitor service status for completion detection
  3. Benefits:

    • Process Isolation: Service runs independently, won't crash agent
    • Consistent Output: Always same deterministic rebuild process
    • Proper Logging: systemd journal handles all output management
    • Resource Management: systemd manages cleanup and resource limits
    • Status Tracking: Can query service status (running/failed/success)

Next Priority: Implement systemd service approach for reliable rebuild operations.

Keyboard Controls Status:

  • Services Panel:
    • R (restart) Working
    • s (start) Working
    • S (stop) Working
  • System Panel: R (nixos-rebuild) Working with --option sandbox false
  • Backup Panel: B (trigger backup) Not implemented

Visual Feedback Implementation - IN PROGRESS:

Context-appropriate progress indicators for each panel:

Services Panel (Service status transitions):

● nginx          active    →  ⏳ nginx      restarting  →  ● nginx          active
● docker         active    →  ⏳ docker     stopping    →  ● docker         inactive  

System Panel (Build progress in NixOS section):

NixOS:
Build: 25.05.20251004.3bcc93c    →    Build: [████████████     ] 65%
Active users: cm, simon               Active users: cm, simon

Backup Panel (OnGoing status with progress):

Latest backup:              →    Latest backup:
● 2024-10-23 14:32:15            ● OnGoing  
└─ Duration: 1.3m                 └─ [██████       ] 60%

Critical Configuration Hash Fix - HIGH PRIORITY:

Problem: Configuration hash currently shows git commit hash instead of actual deployed system hash.

Current (incorrect):

  • Shows git hash: db11f82 (source repository commit)
  • Not accurate - doesn't reflect what's actually deployed

Target (correct):

  • Show nix store hash: d8ivwiar (first 8 chars from deployed system)
  • Source: /nix/store/d8ivwiarhwhgqzskj6q2482r58z46qjf-nixos-system-cmbox-25.05.20251004.3bcc93c
  • Pattern: Extract hash from /nix/store/HASH-nixos-system-HOSTNAME-VERSION

Benefits:

  1. Deployment Verification: Confirms rebuild actually succeeded
  2. Accurate Status: Shows what's truly running, not just source
  3. Rebuild Completion Detection: Hash change = rebuild completed
  4. Rollback Tracking: Each deployment has unique identifier

Implementation Required:

  1. Agent extracts nix store hash from ls -la /run/current-system
  2. Reports this as system_config_hash metric instead of git hash
  3. Dashboard displays first 8 characters: Config: d8ivwiar

Next Session Priority Tasks:

Remaining Features:

  1. Fix Configuration Hash Display (CRITICAL):

    • Use nix store hash instead of git commit hash
    • Extract from /run/current-system -> /nix/store/HASH-nixos-system-*
    • Enables proper rebuild completion detection
  2. Command Response Protocol:

    • Agent sends command completion/failure back to dashboard via ZMQ
    • Dashboard updates UI status from to ● when commands complete
    • Clear success/failure status after timeout
  3. Backup Panel Features:

    • Implement backup trigger functionality (B key)
    • Complete visual feedback for backup operations
    • Add backup progress indicators

Enhancement Tasks:

  • Add confirmation dialogs for destructive actions (stop/restart/rebuild)
  • Implement command history/logging
  • Add keyboard shortcuts help overlay

Future Enhanced Navigation:

  • Add Page Up/Down for faster scrolling through long service lists
  • Implement search/filter functionality for services
  • Add jump-to-service shortcuts (first letter navigation)

Future Advanced Features:

  • Service dependency visualization
  • Historical service status tracking
  • Real-time log viewing integration

Core Architecture Principles - CRITICAL

Individual Metrics Philosophy

NEW ARCHITECTURE: Agent collects individual metrics, dashboard composes widgets from those metrics.

Maintenance Mode

Purpose:

  • Suppress email notifications during planned maintenance or backups
  • Prevents false alerts when services are intentionally stopped

Implementation:

  • Agent checks for /tmp/cm-maintenance file before sending notifications
  • File presence suppresses all email notifications while continuing monitoring
  • Dashboard continues to show real status, only notifications are blocked

Usage:

# Enable maintenance mode
touch /tmp/cm-maintenance

# Run maintenance tasks (backups, service restarts, etc.)
systemctl stop service
# ... maintenance work ...
systemctl start service

# Disable maintenance mode
rm /tmp/cm-maintenance

NixOS Integration:

  • Borgbackup script automatically creates/removes maintenance file
  • Automatic cleanup via trap ensures maintenance mode doesn't stick
  • All cinfiguration are shall be done from nixos config

ARCHITECTURE ENFORCEMENT:

  • ZERO legacy code reuse - Fresh implementation following ARCHITECT.md exactly
  • Individual metrics only - NO grouped metric structures
  • Reference-only legacy - Study old functionality, implement new architecture
  • Clean slate mindset - Build as if legacy codebase never existed

Implementation Rules:

  1. Individual Metrics: Each metric is collected, transmitted, and stored individually
  2. Agent Status Authority: Agent calculates status for each metric using thresholds
  3. Dashboard Composition: Dashboard widgets subscribe to specific metrics by name
  4. Status Aggregation: Dashboard aggregates individual metric statuses for widget status Testing & Building:
  • Workspace builds: cargo build --workspace for all testing
  • Clean compilation: Remove target/ between architecture changes
  • ZMQ testing: Test agent-dashboard communication independently
  • Widget testing: Verify UI layout matches legacy appearance exactly

NEVER in New Implementation:

  • Copy/paste ANY code from legacy backup
  • Calculate status in dashboard widgets
  • Hardcode metric names in widgets (use const arrays)

Important Communication Guidelines

NEVER write that you have "successfully implemented" something or generate extensive summary text without first verifying with the user that the implementation is correct. This wastes tokens. Keep responses concise.

NEVER implement code without first getting explicit user agreement on the approach. Always ask for confirmation before proceeding with implementation.

Commit Message Guidelines

NEVER mention:

  • Claude or any AI assistant names
  • Automation or AI-generated content
  • Any reference to automated code generation

ALWAYS:

  • Focus purely on technical changes and their purpose
  • Use standard software development commit message format
  • Describe what was changed and why, not how it was created
  • Write from the perspective of a human developer

Examples:

  • "Generated with Claude Code"
  • "AI-assisted implementation"
  • "Automated refactoring"
  • "Implement maintenance mode for backup operations"
  • "Restructure storage widget with improved layout"
  • "Update CPU thresholds to production values"

Development and Deployment Architecture

CRITICAL: Development and deployment paths are completely separate:

Development Path

  • Location: ~/projects/nixosbox
  • Purpose: Development workflow only - for committing new cm-dashboard code
  • Access: Only for developers to commit changes
  • Code Access: Running cm-dashboard code shall NEVER access this path

Deployment Path

  • Location: /var/lib/cm-dashboard/nixos-config
  • Purpose: Production deployment only - agent clones/pulls from git
  • Access: Only cm-dashboard agent for deployment operations
  • Workflow: git pull → /var/lib/cm-dashboard/nixos-config → nixos-rebuild

Git Flow

Development: ~/projects/nixosbox → git commit → git push
Deployment:  git pull → /var/lib/cm-dashboard/nixos-config → rebuild

Automated Binary Release System

IMPLEMENTED: cm-dashboard now uses automated binary releases instead of source builds.

Release Workflow

  1. Automated Release Creation

    • Gitea Actions workflow builds static binaries on tag push
    • Creates release with cm-dashboard-linux-x86_64.tar.gz tarball
    • No manual intervention required for binary generation
  2. Creating New Releases

    cd ~/projects/cm-dashboard
    git tag v0.1.X
    git push origin v0.1.X
    

    This automatically:

    • Builds static binaries with RUSTFLAGS="-C target-feature=+crt-static"
    • Creates GitHub-style release with tarball
    • Uploads binaries via Gitea API
  3. NixOS Configuration Updates Edit ~/projects/nixosbox/hosts/common/cm-dashboard.nix:

    version = "v0.1.X";
    src = pkgs.fetchurl {
      url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/${version}/cm-dashboard-linux-x86_64.tar.gz";
      sha256 = "sha256-NEW_HASH_HERE";
    };
    
  4. Get Release Hash

    cd ~/projects/nixosbox
    nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
      url = "https://gitea.cmtec.se/cm/cm-dashboard/releases/download/v0.1.X/cm-dashboard-linux-x86_64.tar.gz";
      sha256 = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
    }' 2>&1 | grep "got:"
    
  5. Commit and Deploy

    cd ~/projects/nixosbox
    git add hosts/common/cm-dashboard.nix
    git commit -m "Update cm-dashboard to v0.1.X with static binaries"
    git push
    

Benefits

  • No compilation overhead on each host
  • Consistent static binaries across all hosts
  • Faster deployments - download vs compile
  • No library dependency issues - static linking
  • Automated pipeline - tag push triggers everything