Implement real-time process monitoring and fix UI hardcoded data

This commit addresses several key issues identified during development: Major Changes: - Replace hardcoded top CPU/RAM process display with real system data - Add intelligent process monitoring to CpuCollector using ps command - Fix disk metrics permission issues in systemd collector - Optimize service collection to focus on status, memory, and disk only - Update dashboard widgets to display live process information Process Monitoring Implementation: - Added collect_top_cpu_process() and collect_top_ram_process() methods - Implemented ps-based monitoring with accurate CPU percentages - Added filtering to prevent self-monitoring artifacts (ps commands) - Enhanced error handling and validation for process data - Dashboard now shows realistic values like "claude (PID 2974) 11.0%" Service Collection Optimization: - Removed CPU monitoring from systemd collector for efficiency - Enhanced service directory permission error logging - Simplified services widget to show essential metrics only - Fixed service-to-directory mapping accuracy UI and Dashboard Improvements: - Reorganized dashboard layout with btop-inspired multi-panel design - Updated system panel to include real top CPU/RAM process display - Enhanced widget formatting and data presentation - Removed placeholder/hardcoded data throughout the interface Technical Details: - Updated agent/src/collectors/cpu.rs with process monitoring - Modified dashboard/src/ui/mod.rs for real-time process display - Enhanced systemd collector error handling and disk metrics - Updated CLAUDE.md documentation with implementation details
2025-10-16 23:55:05 +02:00
parent 7a664ef0fb
commit 8a36472a3d
81 changed files with 7702 additions and 9608 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -2,207 +2,270 @@

 ## Overview

-A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and API integrations.
+A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure. Built to replace Glance with a custom solution tailored for our specific monitoring needs and ZMQ-based metric collection.

-## Project Goals
+## CRITICAL: Architecture Redesign in Progress
+
+**LEGACY CODE DEPRECATION**: The current codebase is being completely rewritten with a new individual metrics architecture. ALL existing code will be moved to a backup folder for reference only.
+
+**NEW IMPLEMENTATION STRATEGY**: 
+- **NO legacy code reuse** - Fresh implementation following ARCHITECT.md
+- **Clean slate approach** - Build entirely new codebase structure
+- **Reference-only legacy** - Current code preserved only for functionality reference
+
+## Implementation Strategy
+
+### Phase 1: Legacy Code Backup (IMMEDIATE)
+
+**Backup Current Implementation:**
+```bash
+# Create backup folder for reference
+mkdir -p backup/legacy-2025-10-16
+
+# Move all current source code to backup  
+mv agent/ backup/legacy-2025-10-16/
+mv dashboard/ backup/legacy-2025-10-16/
+mv shared/ backup/legacy-2025-10-16/
+
+# Preserve configuration examples
+cp -r config/ backup/legacy-2025-10-16/
+
+# Keep important documentation
+cp CLAUDE.md backup/legacy-2025-10-16/CLAUDE-legacy.md
+cp README.md backup/legacy-2025-10-16/README-legacy.md
+```
+
+**Reference Usage Rules:**
+- Legacy code is **REFERENCE ONLY** - never copy/paste
+- Study existing functionality and UI layout patterns
+- Understand current widget behavior and status mapping
+- Reference notification logic and email formatting
+- NO legacy code in new implementation
+
+### Phase 2: Clean Slate Implementation
+
+**New Codebase Structure:**
+Following ARCHITECT.md precisely with zero legacy dependencies:
+
+```
+cm-dashboard/                      # New clean repository root
+├── ARCHITECT.md                   # Architecture documentation  
+├── CLAUDE.md                      # This file (updated)
+├── README.md                      # New implementation documentation
+├── Cargo.toml                     # Workspace configuration
+├── agent/                         # New agent implementation
+│   ├── Cargo.toml
+│   └── src/ ... (per ARCHITECT.md)
+├── dashboard/                     # New dashboard implementation  
+│   ├── Cargo.toml
+│   └── src/ ... (per ARCHITECT.md)
+├── shared/                        # New shared types
+│   ├── Cargo.toml
+│   └── src/ ... (per ARCHITECT.md)
+├── config/                        # New configuration examples
+└── backup/                        # Legacy code for reference
+    └── legacy-2025-10-16/
+```
+
+### Phase 3: Implementation Priorities
+
+**Agent Implementation (Priority 1):**
+1. Individual metrics collection system
+2. ZMQ communication protocol
+3. Basic collectors (CPU, memory, disk, services)
+4. Status calculation and thresholds
+5. Email notification system
+
+**Dashboard Implementation (Priority 2):**
+1. ZMQ metric consumer
+2. Metric storage and subscription system
+3. Base widget trait and framework
+4. Core widgets (CPU, memory, storage, services)
+5. Host management and navigation
+
+**Testing & Integration (Priority 3):**
+1. End-to-end metric flow validation
+2. Multi-host connection testing
+3. UI layout validation against legacy appearance
+4. Performance benchmarking
+
+## Project Goals (Updated)

 ### Core Objectives

- **Real-time monitoring** of all infrastructure components
+- **Individual metric architecture** for maximum dashboard flexibility
 - **Multi-host support** for cmbox, labbox, simonbox, steambox, srv01
 - **Performance-focused** with minimal resource usage
- **Keyboard-driven interface** for power users
- **Integration** with existing monitoring APIs (ports 6127, 6128, 6129)
+- **Keyboard-driven interface** preserving current UI layout
+- **ZMQ-based communication** replacing HTTP API polling

 ### Key Features

- **NVMe health monitoring** with wear prediction
- **CPU / memory / GPU telemetry** with automatic thresholding
- **Service resource monitoring** with per-service CPU and RAM usage
- **Disk usage overview** for root filesystems
- **Backup status** with detailed metrics and history
- **Unified alert pipeline** summarising host health
- **Historical data tracking** and trend analysis
+- **Granular metric collection** (cpu_load_1min, memory_usage_percent, etc.)
+- **Widget-based metric subscription** for flexible dashboard composition
+- **Preserved UI layout** maintaining current visual design
+- **Intelligent caching** for optimal performance
+- **Auto-discovery** of services and system components
+- **Email notifications** for status changes with rate limiting
+- **Maintenance mode** integration for planned downtime

-## Technical Architecture
+## New Technical Architecture

-### Technology Stack
+### Technology Stack (Updated)

 - **Language**: Rust 🦀
+- **Communication**: ZMQ (zeromq) for agent-dashboard messaging
 - **TUI Framework**: ratatui (modern tui-rs fork)
 - **Async Runtime**: tokio
- **HTTP Client**: reqwest
- **Serialization**: serde
+- **Serialization**: serde (JSON for metrics)
 - **CLI**: clap
- **Error Handling**: anyhow
+- **Error Handling**: thiserror + anyhow
 - **Time**: chrono
+- **Email**: lettre (SMTP notifications)

-### Dependencies
+### New Dependencies

 ```toml
-[dependencies]
-ratatui = "0.24"           # Modern TUI framework
-crossterm = "0.27"         # Cross-platform terminal handling
-tokio = { version = "1.0", features = ["full"] }  # Async runtime
-reqwest = { version = "0.11", features = ["json"] }  # HTTP client
-serde = { version = "1.0", features = ["derive"] }   # JSON parsing
-clap = { version = "4.0", features = ["derive"] }    # CLI args
-anyhow = "1.0"             # Error handling
-chrono = "0.4"             # Time handling
+# Workspace Cargo.toml
+[workspace]
+members = ["agent", "dashboard", "shared"]
+
+# Agent dependencies
+[dependencies.agent]
+zmq = "0.10"                                    # ZMQ communication
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+tokio = { version = "1.0", features = ["full"] }
+clap = { version = "4.0", features = ["derive"] }
+thiserror = "1.0"
+anyhow = "1.0"
+chrono = { version = "0.4", features = ["serde"] }
+lettre = { version = "0.11", features = ["smtp-transport"] }
+gethostname = "0.4"
+
+# Dashboard dependencies  
+[dependencies.dashboard]
+ratatui = "0.24"
+crossterm = "0.27"
+zmq = "0.10"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+tokio = { version = "1.0", features = ["full"] }
+clap = { version = "4.0", features = ["derive"] }
+thiserror = "1.0"
+anyhow = "1.0"
+chrono = { version = "0.4", features = ["serde"] }
+
+# Shared dependencies
+[dependencies.shared]
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+chrono = { version = "0.4", features = ["serde"] }
+thiserror = "1.0"
 ```

-## Project Structure
+## New Project Structure

-```
-cm-dashboard/
-├── Cargo.toml
-├── README.md
-├── CLAUDE.md              # This file
-├── src/
-│   ├── main.rs            # Entry point & CLI
-│   ├── app.rs             # Main application state
-│   ├── ui/
-│   │   ├── mod.rs
-│   │   ├── dashboard.rs   # Main dashboard layout
-│   │   ├── nvme.rs        # NVMe health widget
-│   │   ├── services.rs    # Services status widget
-│   │   ├── memory.rs      # RAM optimization widget
-│   │   ├── backup.rs      # Backup status widget
-│   │   └── alerts.rs      # Alerts/notifications widget
-│   ├── api/
-│   │   ├── mod.rs
-│   │   ├── client.rs      # HTTP client wrapper
-│   │   ├── smart.rs       # Smart metrics API (port 6127)
-│   │   ├── service.rs     # Service metrics API (port 6128)
-│   │   └── backup.rs      # Backup metrics API (port 6129)
-│   ├── data/
-│   │   ├── mod.rs
-│   │   ├── metrics.rs     # Data structures
-│   │   ├── history.rs     # Historical data storage
-│   │   └── config.rs      # Host configuration
-│   └── config.rs          # Application configuration
-├── config/
-│   ├── hosts.toml         # Host definitions
-│   └── dashboard.toml     # Dashboard layout config
-└── docs/
-    ├── API.md             # API integration documentation
-    └── WIDGETS.md         # Widget development guide
-```
+**REFERENCE**: See ARCHITECT.md for complete folder structure specification.

-### Data Structures
+**Current Status**: Legacy code preserved in `backup/legacy-2025-10-16/` for reference only.
+
+**Implementation Progress**:
+- [x] Architecture documentation (ARCHITECT.md)
+- [x] Implementation strategy (CLAUDE.md updates)
+- [ ] Legacy code backup
+- [ ] New workspace setup
+- [ ] Shared types implementation  
+- [ ] Agent implementation
+- [ ] Dashboard implementation
+- [ ] Integration testing
+
+### New Individual Metrics Architecture
+
+**REPLACED**: Legacy grouped structures (SmartMetrics, ServiceMetrics, etc.) are replaced with individual metrics.
+
+**New Approach**: See ARCHITECT.md for individual metric definitions:

 ```rust
-#[derive(Deserialize, Debug)]
-pub struct SmartMetrics {
-    pub status: String,
-    pub drives: Vec<DriveInfo>,
-    pub summary: DriveSummary,
-    pub issues: Vec<String>,
+// Individual metrics examples:
+"cpu_load_1min" -> 2.5
+"cpu_temperature_celsius" -> 45.0  
+"memory_usage_percent" -> 78.5
+"disk_nvme0_wear_percent" -> 12.3
+"service_ssh_status" -> "active"
+"backup_last_run_timestamp" -> 1697123456
+```
+
+**Shared Types**: Located in `shared/src/metrics.rs`:
+
+```rust
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Metric {
+    pub name: String,
+    pub value: MetricValue,
+    pub status: Status,
    pub timestamp: u64,
+    pub description: Option<String>,
+    pub unit: Option<String>,
 }

-#[derive(Deserialize, Debug)]
-pub struct ServiceMetrics {
-    pub summary: ServiceSummary,
-    pub services: Vec<ServiceInfo>,
-    pub timestamp: u64,
+#[derive(Debug, Clone, Serialize, Deserialize)]  
+pub enum MetricValue {
+    Float(f32),
+    Integer(i64),
+    String(String),
+    Boolean(bool),
 }

-#[derive(Deserialize, Debug)]
-pub struct ServiceSummary {
-    pub healthy: usize,
-    pub degraded: usize,
-    pub failed: usize,
-    pub memory_used_mb: f32,
-    pub memory_quota_mb: f32,
-    pub system_memory_used_mb: f32,
-    pub system_memory_total_mb: f32,
-    pub disk_used_gb: f32,
-    pub disk_total_gb: f32,
-    pub cpu_load_1: f32,
-    pub cpu_load_5: f32,
-    pub cpu_load_15: f32,
-    pub cpu_freq_mhz: Option<f32>,
-    pub cpu_temp_c: Option<f32>,
-    pub gpu_load_percent: Option<f32>,
-    pub gpu_temp_c: Option<f32>,
-}
-
-#[derive(Deserialize, Debug)]
-pub struct BackupMetrics {
-    pub overall_status: String,
-    pub backup: BackupInfo,
-    pub service: BackupServiceInfo,
-    pub timestamp: u64,
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub enum Status {
+    Ok,
+    Warning,
+    Critical,
+    Unknown,
 }
 ```

-## Dashboard Layout Design
+## UI Layout Preservation

-### Main Dashboard View
+**CRITICAL**: The exact visual layout shown above is **PRESERVED** in the new implementation.

-```
-┌─────────────────────────────────────────────────────────────────────┐
-│ CM Dashboard • cmbox                                                 │
-├─────────────────────────────────────────────────────────────────────┤
-│ Storage • ok:1 warn:0 crit:0       │ Services • ok:1 warn:0 fail:0   │
-│ ┌─────────────────────────────────┐ │ ┌─────────────────────────────── │ │
-│ │Drive    Temp  Wear Spare Hours │ │ │Service memory: 7.1/23899.7 MiB│ │
-│ │nvme0n1  28°C  1%   100%  14489 │ │ │Disk usage: —                  │ │
-│ │         Capacity Usage          │ │ │  Service  Memory     Disk      │ │
-│ │         954G     77G (8%)       │ │ │✔ sshd     7.1 MiB   —          │ │
-│ └─────────────────────────────────┘ │ └─────────────────────────────── │ │
-├─────────────────────────────────────────────────────────────────────┤
-│ CPU / Memory • warn                 │ Backups                         │
-│ System memory: 5251.7/23899.7 MiB  │ Host cmbox awaiting backup      │ │
-│ CPU load (1/5/15): 2.18 2.66 2.56  │ metrics                         │ │
-│ CPU freq: 1100.1 MHz               │                                 │ │
-│ CPU temp: 47.0°C                    │                                 │ │
-├─────────────────────────────────────────────────────────────────────┤
-│ Alerts • ok:0 warn:3 fail:0        │ Status • ZMQ connected          │
-│ cmbox: warning: CPU load 2.18      │ Monitoring • hosts: 3           │ │
-│ srv01: pending: awaiting metrics    │ Data source: ZMQ – connected    │ │
-│ labbox: pending: awaiting metrics   │ Active host: cmbox (1/3)        │ │
-└─────────────────────────────────────────────────────────────────────┘
-Keys: [←→] hosts [r]efresh [q]uit
-```
+**Implementation Strategy**:
+- New widgets subscribe to individual metrics but render identically
+- Same positions, colors, borders, and keyboard shortcuts
+- Enhanced with flexible metric composition under the hood

-### Multi-Host View
+**Reference**: Legacy widgets in `backup/legacy-2025-10-16/dashboard/src/ui/` show exact rendering logic to replicate.

-```
-┌─────────────────────────────────────────────────────────────────────┐
-│ 🖥️  CMTEC Host Overview                                              │
-├─────────────────────────────────────────────────────────────────────┤
-│ Host      │ NVMe Wear │ RAM Usage │ Services │ Last Alert            │
-├─────────────────────────────────────────────────────────────────────┤
-│ srv01     │ 4%   ✅   │ 32%  ✅   │ 8/8  ✅  │ 04:00 Backup OK       │
-│ cmbox     │ 12%  ✅   │ 45%  ✅   │ 3/3  ✅  │ Yesterday Email test  │
-│ labbox    │ 8%   ✅   │ 28%  ✅   │ 2/2  ✅  │ 2h ago NVMe temp OK   │
-│ simonbox  │ 15%  ✅   │ 67%  ⚠️   │ 4/4  ✅  │ Gaming session active │
-│ steambox  │ 23%  ✅   │ 78%  ⚠️   │ 2/2  ✅  │ High RAM usage        │
-└─────────────────────────────────────────────────────────────────────┘
-Keys: [Enter] details [r]efresh [s]ort [f]ilter [q]uit
-```
+## Core Architecture Principles - CRITICAL

-## Architecture Principles - CRITICAL
+### Individual Metrics Philosophy  

-### Agent-Dashboard Separation of Concerns
+**NEW ARCHITECTURE**: Agent collects individual metrics, dashboard composes widgets from those metrics.

-**AGENT IS SINGLE SOURCE OF TRUTH FOR ALL STATUS CALCULATIONS**
- Agent calculates status ("ok"/"warning"/"critical"/"unknown") using defined thresholds
- Agent sends status to dashboard via ZMQ
- Dashboard NEVER calculates status - only displays what agent provides
+**Status Calculation**: 
+- Agent calculates status for each individual metric
+- Agent sends individual metrics with status via ZMQ  
+- Dashboard aggregates metric statuses for widget-level status
+- Dashboard NEVER calculates metric status - only displays and aggregates

 **Data Flow Architecture:**
 ```
-Agent (calculations + thresholds) → Status → Dashboard (display only) → TableBuilder (colors)
+Agent (individual metrics + status) → ZMQ → Dashboard (subscribe + display) → Widgets (compose + render)
 ```

-**Status Handling Rules:**
- Agent provides status → Dashboard uses agent status
- Agent doesn't provide status → Dashboard shows "unknown" (NOT "ok")
- Dashboard widgets NEVER contain hardcoded thresholds
- TableBuilder converts status to colors for display
+### Migration from Legacy Architecture
+
+**OLD (DEPRECATED)**:
+```
+Agent → ServiceMetrics{summary, services} → Dashboard → Widget
+Agent → SmartMetrics{drives, summary} → Dashboard → Widget  
+```
+
+**NEW (IMPLEMENTING)**:
+```
+Agent → ["cpu_load_1min", "memory_usage_percent", ...] → Dashboard → Widgets subscribe to needed metrics
+```

 ### Current Agent Thresholds (as of 2025-10-12)

@@ -295,6 +358,15 @@ Agent (calculations + thresholds) → Status → Dashboard (display only) → Ta
 - [x] ZMQ broadcast mechanism ensuring continuous data delivery to dashboard
 - [x] Immich service quota detection fix (500GB instead of hardcoded 200GB)
 - [x] Service-to-directory mapping for accurate disk usage calculation
+- [x] **Real-time process monitoring implementation (2025-10-16)**
+- [x] Fixed hardcoded top CPU/RAM process display with real data
+- [x] Added top CPU and RAM process collection to CpuCollector
+- [x] Implemented ps-based process monitoring with accurate percentages
+- [x] Added intelligent filtering to avoid self-monitoring artifacts
+- [x] Dashboard updated to display real-time top processes instead of placeholder text
+- [x] Fixed disk metrics permission issues in systemd collector
+- [x] Enhanced error logging for service directory access problems
+- [x] Optimized service collection focusing on status, memory, and disk metrics only

 **Production Configuration:**
 - CPU load thresholds: Warning ≥ 9.0, Critical ≥ 10.0
@@ -332,86 +404,111 @@ rm /tmp/cm-maintenance
 - Borgbackup script automatically creates/removes maintenance file
 - Automatic cleanup via trap ensures maintenance mode doesn't stick

-### Smart Caching System
+### Configuration-Based Smart Caching System

 **Purpose:**
- Reduce agent CPU usage from 9.5% to <2% through intelligent caching
- Maintain dashboard responsiveness with tiered refresh strategies
- Optimize for different data volatility characteristics
+- Reduce agent CPU usage from 10% to <1% through configuration-driven intelligent caching
+- Maintain dashboard responsiveness with configurable refresh strategies
+- Optimize for different data volatility characteristics via config files

-**Architecture:**
-```
-Cache Tiers:
- RealTime (5s):  CPU load, memory usage, quick-changing metrics
- Fast (30s):     Network stats, process lists, medium-volatility
- Medium (5min):  Service status, disk usage, slow-changing data  
- Slow (15min):   SMART data, backup status, rarely-changing metrics
- Static (1h):    Hardware info, system capabilities, fixed data
+**Configuration-Driven Architecture:**
+```toml
+# Cache tiers defined in agent.toml
+[cache.tiers.realtime]
+interval_seconds = 5
+description = "High-frequency metrics (CPU load, memory usage)"
+
+[cache.tiers.medium]
+interval_seconds = 300
+description = "Low-frequency metrics (service status, disk usage)"
+
+[cache.tiers.slow]
+interval_seconds = 900
+description = "Very low-frequency metrics (SMART data, backup status)"
+
+# Metric assignments via configuration
+[cache.metric_assignments]
+"cpu_load_*" = "realtime"
+"service_*_disk_gb" = "medium"
+"disk_*_temperature" = "slow"
 ```

 **Implementation:**
- **SmartCache**: Central cache manager with RwLock for thread safety
- **CachedCollector**: Wrapper adding caching to any collector
- **CollectionScheduler**: Manages tier-based refresh timing
+- **ConfigurableCache**: Central cache manager reading tier config from files
+- **MetricCacheManager**: Assigns metrics to tiers based on configuration patterns
+- **TierScheduler**: Manages configurable tier-based refresh timing
 - **Cache warming**: Parallel startup population for instant responsiveness
- **Background refresh**: Proactive updates to prevent cache misses
+- **Background refresh**: Proactive updates based on configured intervals

-**Usage:**
-```bash
-# Start the agent with intelligent caching
-cm-dashboard-agent [-v]
+**Configuration:**
+```toml
+[cache]
+enabled = true
+default_ttl_seconds = 30
+max_entries = 10000
+warming_timeout_seconds = 3
+background_refresh_enabled = true
+cleanup_interval_seconds = 1800
 ```

 **Performance Benefits:**
- CPU usage reduction: 9.5% → <2% expected
- Instant dashboard startup through cache warming
- Reduced disk I/O through intelligent du command caching
- Network efficiency with selective refresh strategies
+- CPU usage reduction: 10% → <1% target through configuration optimization
+- Configurable cache intervals prevent expensive operations from running too frequently
+- Disk usage detection cached at 5-minute intervals instead of every 5 seconds
+- Selective metric refresh based on configured volatility patterns

-**Configuration:**
- Cache warming timeout: 3 seconds
- Background refresh: Enabled at 80% of tier interval
- Cache cleanup: Every 30 minutes  
- Stale data threshold: 2x tier interval
+**Usage:**
+```bash
+# Start agent with config-based caching
+cm-dashboard-agent --config /etc/cm-dashboard/agent.toml [-v]
+```

 **Architecture:**
- **Intelligent caching**: Tiered collection with optimal CPU usage
- **Auto-discovery**: No configuration files required  
+- **Configuration-driven caching**: Tiered collection with configurable intervals
+- **Config file management**: All cache behavior defined in TOML configuration
 - **Responsive design**: Cache warming for instant dashboard startup

-### Development Guidelines
+### New Implementation Guidelines - CRITICAL

-**When Adding New Metrics:**
-1. Agent calculates status with thresholds
-2. Agent adds `{metric}_status` field to JSON output  
-3. Dashboard data structure adds `{metric}_status: Option<String>`
-4. Dashboard uses `status_level_from_agent_status()` for display
-5. Agent adds notification monitoring for status changes
+**ARCHITECTURE ENFORCEMENT**:
+- **ZERO legacy code reuse** - Fresh implementation following ARCHITECT.md exactly
+- **Individual metrics only** - NO grouped metric structures  
+- **Reference-only legacy** - Study old functionality, implement new architecture
+- **Clean slate mindset** - Build as if legacy codebase never existed

-**Testing & Building:**
- ALWAYS use `cargo build --workspace` to match NixOS build configuration
- Test with OpenSSL environment variables when building locally:
-  ```bash
-  OPENSSL_DIR=/nix/store/.../openssl-dev \
-  OPENSSL_LIB_DIR=/nix/store/.../openssl/lib \
-  OPENSSL_INCLUDE_DIR=/nix/store/.../openssl-dev/include \
-  PKG_CONFIG_PATH=/nix/store/.../openssl-dev/lib/pkgconfig \
-  OPENSSL_NO_VENDOR=1 cargo build --workspace
-  ```
- This prevents build failures that only appear in NixOS deployment
+**Implementation Rules**:
+1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
+2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
+3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
+4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
+5. **ZMQ Communication**: All metrics transmitted via ZMQ, no HTTP APIs

-**Notification System:**
- Universal automatic detection of all `_status` fields across all collectors
- Sends emails from `hostname@cmtec.se` to `cm@cmtec.se` for any status changes
- Status stored in-memory: `HashMap<"component.metric", status>`
- Recovery emails sent when status changes from warning/critical → ok
+**When Adding New Metrics**:
+1. Define metric name in shared registry (e.g., "disk_nvme1_temperature_celsius") 
+2. Implement collector that returns individual Metric struct
+3. Agent calculates status using configured thresholds
+4. Dashboard widgets subscribe to metric by name
+5. Notification system automatically detects status changes

-**NEVER:**
- Add hardcoded thresholds to dashboard widgets
- Calculate status in dashboard with different thresholds than agent
- Use "ok" as default when agent status is missing (use "unknown")
- Calculate colors in widgets (TableBuilder's responsibility)
- Use `cargo build` without `--workspace` for final testing
+**Testing & Building**:
+- **Workspace builds**: `cargo build --workspace` for all testing
+- **Clean compilation**: Remove `target/` between architecture changes
+- **ZMQ testing**: Test agent-dashboard communication independently
+- **Widget testing**: Verify UI layout matches legacy appearance exactly
+
+**NEVER in New Implementation**:
+- Copy/paste ANY code from legacy backup
+- Create grouped metric structures (SystemMetrics, etc.)
+- Calculate status in dashboard widgets  
+- Hardcode metric names in widgets (use const arrays)
+- Skip individual metric architecture for "simplicity"
+
+**Legacy Reference Usage**:
+- Study UI layout and rendering logic only
+- Understand email notification formatting
+- Reference status color mapping  
+- Learn host navigation patterns
+- NO code copying or structural influence

 # Important Communication Guidelines