This commit addresses several key issues identified during development: Major Changes: - Replace hardcoded top CPU/RAM process display with real system data - Add intelligent process monitoring to CpuCollector using ps command - Fix disk metrics permission issues in systemd collector - Optimize service collection to focus on status, memory, and disk only - Update dashboard widgets to display live process information Process Monitoring Implementation: - Added collect_top_cpu_process() and collect_top_ram_process() methods - Implemented ps-based monitoring with accurate CPU percentages - Added filtering to prevent self-monitoring artifacts (ps commands) - Enhanced error handling and validation for process data - Dashboard now shows realistic values like "claude (PID 2974) 11.0%" Service Collection Optimization: - Removed CPU monitoring from systemd collector for efficiency - Enhanced service directory permission error logging - Simplified services widget to show essential metrics only - Fixed service-to-directory mapping accuracy UI and Dashboard Improvements: - Reorganized dashboard layout with btop-inspired multi-panel design - Updated system panel to include real top CPU/RAM process display - Enhanced widget formatting and data presentation - Removed placeholder/hardcoded data throughout the interface Technical Details: - Updated agent/src/collectors/cpu.rs with process monitoring - Modified dashboard/src/ui/mod.rs for real-time process display - Enhanced systemd collector error handling and disk metrics - Updated CLAUDE.md documentation with implementation details
26 KiB
CM Dashboard Agent Architecture
Overview
This document defines the architecture for the CM Dashboard Agent. The agent collects individual metrics and sends them to the dashboard via ZMQ. The dashboard decides which metrics to use in which widgets.
Core Philosophy
Individual Metrics Approach: The agent collects and transmits individual metrics (e.g., cpu_load_1min, memory_usage_percent, backup_last_run) rather than grouped metric structures. This provides maximum flexibility for dashboard widget composition.
Folder Structure
cm-dashboard/
├── agent/ # Agent application
│ ├── Cargo.toml
│ ├── src/
│ │ ├── main.rs # Entry point with CLI parsing
│ │ ├── agent.rs # Main Agent orchestrator
│ │ ├── config/
│ │ │ ├── mod.rs # Configuration module exports
│ │ │ ├── loader.rs # TOML configuration loading
│ │ │ ├── defaults.rs # Default configuration values
│ │ │ └── validation.rs # Configuration validation
│ │ ├── communication/
│ │ │ ├── mod.rs # Communication module exports
│ │ │ ├── zmq_config.rs # ZMQ configuration structures
│ │ │ ├── zmq_handler.rs # ZMQ socket management
│ │ │ ├── protocol.rs # Message format definitions
│ │ │ └── error.rs # Communication errors
│ │ ├── metrics/
│ │ │ ├── mod.rs # Metrics module exports
│ │ │ ├── registry.rs # Metric name registry and types
│ │ │ ├── value.rs # Metric value types and status
│ │ │ ├── cache.rs # Individual metric caching
│ │ │ └── collection.rs # Metric collection storage
│ │ ├── collectors/
│ │ │ ├── mod.rs # Collector trait definition
│ │ │ ├── cpu.rs # CPU-related metrics
│ │ │ ├── memory.rs # Memory-related metrics
│ │ │ ├── disk.rs # Disk usage metrics
│ │ │ ├── processes.rs # Process-related metrics
│ │ │ ├── systemd.rs # Systemd service metrics
│ │ │ ├── smart.rs # Storage SMART metrics
│ │ │ ├── backup.rs # Backup status metrics
│ │ │ ├── network.rs # Network metrics
│ │ │ └── error.rs # Collector errors
│ │ ├── notifications/
│ │ │ ├── mod.rs # Notification exports
│ │ │ ├── manager.rs # Status change detection
│ │ │ ├── email.rs # Email notification backend
│ │ │ └── status_tracker.rs # Individual metric status tracking
│ │ └── utils/
│ │ ├── mod.rs # Utility exports
│ │ ├── system.rs # System command utilities
│ │ ├── time.rs # Timestamp utilities
│ │ └── discovery.rs # Auto-discovery functions
│ ├── config/
│ │ ├── agent.example.toml # Example configuration
│ │ └── production.toml # Production template
│ └── tests/
│ ├── integration/ # Integration tests
│ ├── unit/ # Unit tests by module
│ └── fixtures/ # Test data and mocks
├── dashboard/ # Dashboard application
│ ├── Cargo.toml
│ ├── src/
│ │ ├── main.rs # Entry point with CLI parsing
│ │ ├── app.rs # Main Dashboard application state
│ │ ├── config/
│ │ │ ├── mod.rs # Configuration module exports
│ │ │ ├── loader.rs # TOML configuration loading
│ │ │ └── defaults.rs # Default configuration values
│ │ ├── communication/
│ │ │ ├── mod.rs # Communication module exports
│ │ │ ├── zmq_consumer.rs # ZMQ metric consumer
│ │ │ ├── protocol.rs # Shared message protocol
│ │ │ └── error.rs # Communication errors
│ │ ├── metrics/
│ │ │ ├── mod.rs # Metrics module exports
│ │ │ ├── store.rs # Metric storage and retrieval
│ │ │ ├── filter.rs # Metric filtering and selection
│ │ │ ├── history.rs # Historical metric storage
│ │ │ └── subscription.rs # Metric subscription management
│ │ ├── ui/
│ │ │ ├── mod.rs # UI module exports
│ │ │ ├── app.rs # Main UI application loop
│ │ │ ├── layout.rs # Layout management
│ │ │ ├── widgets/
│ │ │ │ ├── mod.rs # Widget exports
│ │ │ │ ├── base.rs # Base widget trait
│ │ │ │ ├── cpu.rs # CPU metrics widget
│ │ │ │ ├── memory.rs # Memory metrics widget
│ │ │ │ ├── storage.rs # Storage metrics widget
│ │ │ │ ├── services.rs # Services metrics widget
│ │ │ │ ├── backup.rs # Backup metrics widget
│ │ │ │ ├── hosts.rs # Host selection widget
│ │ │ │ └── alerts.rs # Alerts/status widget
│ │ │ ├── theme.rs # UI theming and colors
│ │ │ └── input.rs # Input handling
│ │ ├── hosts/
│ │ │ ├── mod.rs # Host management exports
│ │ │ ├── manager.rs # Host connection management
│ │ │ ├── discovery.rs # Host auto-discovery
│ │ │ └── connection.rs # Individual host connections
│ │ └── utils/
│ │ ├── mod.rs # Utility exports
│ │ ├── formatting.rs # Data formatting utilities
│ │ └── time.rs # Time formatting utilities
│ ├── config/
│ │ ├── dashboard.example.toml # Example configuration
│ │ └── hosts.example.toml # Example host configuration
│ └── tests/
│ ├── integration/ # Integration tests
│ ├── unit/ # Unit tests by module
│ └── fixtures/ # Test data and mocks
├── shared/ # Shared types and utilities
│ ├── Cargo.toml
│ ├── src/
│ │ ├── lib.rs # Shared library exports
│ │ ├── protocol.rs # Shared message protocol
│ │ ├── metrics.rs # Shared metric types
│ │ └── error.rs # Shared error types
└── tests/ # End-to-end tests
├── e2e/ # End-to-end test scenarios
└── fixtures/ # Shared test data
Architecture Principles
1. Individual Metrics Philosophy
No Grouped Structures: Instead of SystemMetrics or BackupMetrics, we collect individual metrics:
// Good - Individual metrics
"cpu_load_1min" -> 2.5
"cpu_load_5min" -> 2.8
"cpu_temperature" -> 45.0
"memory_usage_percent" -> 78.5
"memory_total_gb" -> 32.0
"disk_root_usage_percent" -> 15.2
"service_ssh_status" -> "active"
"backup_last_run_timestamp" -> 1697123456
// Bad - Grouped structures
SystemMetrics { cpu: {...}, memory: {...} }
Dashboard Flexibility: The dashboard consumes individual metrics and decides which ones to display in each widget.
2. Metric Definition
Each metric has:
- Name: Unique identifier (e.g.,
cpu_load_1min) - Value: Typed value (f32, i64, String, bool)
- Status: Health status (ok, warning, critical, unknown)
- Timestamp: When the metric was collected
- Metadata: Optional description, units, etc.
3. Module Responsibilities
- Communication: ZMQ protocol and message handling
- Metrics: Value types, caching, and storage
- Collectors: Gather specific metrics from system
- Notifications: Track status changes across all metrics
- Config: Configuration loading and validation
4. Data Flow
Collectors → Individual Metrics → Cache → ZMQ → Dashboard
↓ ↓ ↓
Status Calc → Status Tracker → Notifications
Metric Design Rules
1. Naming Convention
Metrics follow hierarchical naming:
{category}_{subcategory}_{property}_{unit}
Examples:
cpu_load_1min
cpu_temperature_celsius
memory_usage_percent
memory_total_gb
disk_root_usage_percent
disk_nvme0_temperature_celsius
service_ssh_status
service_ssh_memory_mb
backup_last_run_timestamp
backup_status
network_eth0_rx_bytes
2. Value Types
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum MetricValue {
Float(f32),
Integer(i64),
String(String),
Boolean(bool),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum Status {
Ok,
Warning,
Critical,
Unknown,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Metric {
pub name: String,
pub value: MetricValue,
pub status: Status,
pub timestamp: u64,
pub description: Option<String>,
pub unit: Option<String>,
}
3. Collector Interface
Each collector provides individual metrics:
#[async_trait]
pub trait Collector {
fn name(&self) -> &str;
async fn collect(&self) -> Result<Vec<Metric>>;
}
// Example CPU collector output:
vec![
Metric { name: "cpu_load_1min", value: Float(2.5), status: Ok, ... },
Metric { name: "cpu_load_5min", value: Float(2.8), status: Ok, ... },
Metric { name: "cpu_temperature", value: Float(45.0), status: Ok, ... },
]
Communication Protocol
ZMQ Message Format
#[derive(Debug, Serialize, Deserialize)]
pub struct MetricMessage {
pub hostname: String,
pub timestamp: u64,
pub metrics: Vec<Metric>,
}
ZMQ Configuration
#[derive(Debug, Deserialize)]
pub struct ZmqConfig {
pub publisher_port: u16, // Default: 6130
pub command_port: u16, // Default: 6131
pub bind_address: String, // Default: "0.0.0.0"
pub timeout_ms: u64, // Default: 5000
pub heartbeat_interval: u64, // Default: 30000
}
Caching Strategy
Configuration-Based Individual Metric Cache
pub struct MetricCache {
cache: HashMap<String, CachedMetric>,
config: CacheConfig,
}
struct CachedMetric {
metric: Metric,
collected_at: Instant,
access_count: u64,
cache_tier: CacheTier,
}
#[derive(Debug, Deserialize)]
pub struct CacheConfig {
pub enabled: bool,
pub default_ttl_seconds: u64,
pub max_entries: usize,
pub metric_tiers: HashMap<String, CacheTier>,
}
#[derive(Debug, Deserialize, Clone)]
pub struct CacheTier {
pub interval_seconds: u64,
pub description: String,
}
Configuration-Based Caching Rules:
- Each metric type has configurable cache intervals via config files
- Cache tiers defined in configuration, not hardcoded
- Individual metrics cached by name with tier-specific TTL
- Cache miss triggers single metric collection
- No grouped cache invalidation
- Performance target: <2% CPU usage through intelligent caching
Configuration System
Configuration Structure
[zmq]
publisher_port = 6130
command_port = 6131
bind_address = "0.0.0.0"
timeout_ms = 5000
[cache]
enabled = true
default_ttl_seconds = 30
max_entries = 10000
# Cache tiers for different metric types
[cache.tiers.realtime]
interval_seconds = 5
description = "High-frequency metrics (CPU load, memory usage)"
[cache.tiers.fast]
interval_seconds = 30
description = "Medium-frequency metrics (network stats, process lists)"
[cache.tiers.medium]
interval_seconds = 300
description = "Low-frequency metrics (service status, disk usage)"
[cache.tiers.slow]
interval_seconds = 900
description = "Very low-frequency metrics (SMART data, backup status)"
[cache.tiers.static]
interval_seconds = 3600
description = "Rarely changing metrics (hardware info, system capabilities)"
# Metric type to tier mapping
[cache.metric_assignments]
"cpu_load_*" = "realtime"
"memory_usage_*" = "realtime"
"service_*_cpu_percent" = "realtime"
"service_*_memory_mb" = "realtime"
"service_*_status" = "medium"
"service_*_disk_gb" = "medium"
"disk_*_temperature" = "slow"
"disk_*_wear_percent" = "slow"
"backup_*" = "slow"
"network_*" = "fast"
[collectors.cpu]
enabled = true
interval_seconds = 5
temperature_warning = 70.0
temperature_critical = 80.0
load_warning = 5.0
load_critical = 8.0
[collectors.memory]
enabled = true
interval_seconds = 5
usage_warning_percent = 80.0
usage_critical_percent = 95.0
[collectors.systemd]
enabled = true
interval_seconds = 30
services = ["ssh", "nginx", "docker", "gitea"]
[notifications]
enabled = true
smtp_host = "localhost"
smtp_port = 25
from_email = "{{hostname}}@cmtec.se"
to_email = "cm@cmtec.se"
rate_limit_minutes = 30
Implementation Guidelines
1. Adding New Metrics
// 1. Define metric names in registry
pub const NETWORK_ETH0_RX_BYTES: &str = "network_eth0_rx_bytes";
pub const NETWORK_ETH0_TX_BYTES: &str = "network_eth0_tx_bytes";
// 2. Implement collector
pub struct NetworkCollector {
config: NetworkConfig,
}
impl Collector for NetworkCollector {
async fn collect(&self) -> Result<Vec<Metric>> {
vec![
Metric {
name: NETWORK_ETH0_RX_BYTES.to_string(),
value: MetricValue::Integer(rx_bytes),
status: Status::Ok,
timestamp: now(),
unit: Some("bytes".to_string()),
..Default::default()
},
// ... more metrics
]
}
}
// 3. Register in agent
agent.register_collector(Box::new(NetworkCollector::new(config.network)));
2. Status Calculation
Each collector calculates status for its metrics:
impl CpuCollector {
fn calculate_temperature_status(&self, temp: f32) -> Status {
if temp >= self.config.critical_threshold {
Status::Critical
} else if temp >= self.config.warning_threshold {
Status::Warning
} else {
Status::Ok
}
}
}
3. Dashboard Usage
Dashboard widgets subscribe to specific metrics:
// Dashboard CPU widget
let cpu_metrics = [
"cpu_load_1min",
"cpu_load_5min",
"cpu_load_15min",
"cpu_temperature",
];
// Dashboard memory widget
let memory_metrics = [
"memory_usage_percent",
"memory_total_gb",
"memory_available_gb",
];
Dashboard Architecture
Dashboard Principles
1. UI Layout Preservation
Current UI Layout Maintained: The existing dashboard UI layout is preserved and enhanced with the new metric-centric architecture. All current widgets remain in their established positions and functionality.
Widget Enhancement, Not Replacement: Widgets are enhanced to consume individual metrics rather than grouped structures, but maintain their visual appearance and user interaction patterns.
2. Metric-to-Widget Mapping
Each widget subscribes to specific individual metrics and composes them for display:
// CPU Widget Metrics
const CPU_WIDGET_METRICS: &[&str] = &[
"cpu_load_1min",
"cpu_load_5min",
"cpu_load_15min",
"cpu_temperature_celsius",
"cpu_frequency_mhz",
"cpu_usage_percent",
];
// Memory Widget Metrics
const MEMORY_WIDGET_METRICS: &[&str] = &[
"memory_usage_percent",
"memory_total_gb",
"memory_available_gb",
"memory_used_gb",
"memory_swap_total_gb",
"memory_swap_used_gb",
];
// Storage Widget Metrics
const STORAGE_WIDGET_METRICS: &[&str] = &[
"disk_nvme0_temperature_celsius",
"disk_nvme0_wear_percent",
"disk_nvme0_spare_percent",
"disk_nvme0_hours",
"disk_nvme0_capacity_gb",
"disk_nvme0_usage_gb",
"disk_nvme0_usage_percent",
];
// Services Widget Metrics
const SERVICES_WIDGET_METRICS: &[&str] = &[
"service_ssh_status",
"service_ssh_memory_mb",
"service_ssh_cpu_percent",
"service_nginx_status",
"service_nginx_memory_mb",
"service_docker_status",
// ... per discovered service
];
// Backup Widget Metrics
const BACKUP_WIDGET_METRICS: &[&str] = &[
"backup_last_run_timestamp",
"backup_status",
"backup_size_gb",
"backup_duration_minutes",
"backup_next_scheduled_timestamp",
];
Dashboard Communication
ZMQ Consumer Architecture
// dashboard/src/communication/zmq_consumer.rs
pub struct ZmqConsumer {
subscriber: Socket,
config: ZmqConfig,
metric_filter: MetricFilter,
}
impl ZmqConsumer {
pub async fn subscribe_to_host(&mut self, hostname: &str) -> Result<()>
pub async fn receive_metrics(&mut self) -> Result<Vec<Metric>>
pub fn set_metric_filter(&mut self, filter: MetricFilter)
pub async fn request_metrics(&self, metric_names: &[String]) -> Result<()>
}
#[derive(Debug, Clone)]
pub struct MetricFilter {
pub include_patterns: Vec<String>,
pub exclude_patterns: Vec<String>,
pub hosts: Vec<String>,
}
Protocol Compatibility
The dashboard uses the same protocol as defined in the agent:
// shared/src/protocol.rs (shared between agent and dashboard)
#[derive(Debug, Serialize, Deserialize)]
pub struct MetricMessage {
pub hostname: String,
pub timestamp: u64,
pub metrics: Vec<Metric>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Metric {
pub name: String,
pub value: MetricValue,
pub status: Status,
pub timestamp: u64,
pub description: Option<String>,
pub unit: Option<String>,
}
Dashboard Metric Management
Metric Store
// dashboard/src/metrics/store.rs
pub struct MetricStore {
current_metrics: HashMap<String, HashMap<String, Metric>>, // host -> metric_name -> metric
historical_metrics: HistoricalStore,
subscriptions: SubscriptionManager,
}
impl MetricStore {
pub fn update_metrics(&mut self, hostname: &str, metrics: Vec<Metric>)
pub fn get_metric(&self, hostname: &str, metric_name: &str) -> Option<&Metric>
pub fn get_metrics_for_widget(&self, hostname: &str, widget: WidgetType) -> Vec<&Metric>
pub fn get_hosts(&self) -> Vec<String>
pub fn get_latest_timestamp(&self, hostname: &str) -> Option<u64>
}
Metric Subscription Management
// dashboard/src/metrics/subscription.rs
pub struct SubscriptionManager {
widget_subscriptions: HashMap<WidgetType, Vec<String>>,
active_hosts: HashSet<String>,
metric_filters: HashMap<String, MetricFilter>,
}
impl SubscriptionManager {
pub fn subscribe_widget(&mut self, widget: WidgetType, metrics: &[String])
pub fn get_required_metrics(&self) -> Vec<String>
pub fn add_host(&mut self, hostname: String)
pub fn remove_host(&mut self, hostname: &str)
pub fn is_metric_needed(&self, metric_name: &str) -> bool
}
Widget Architecture
Base Widget Trait
// dashboard/src/ui/widgets/base.rs
pub trait Widget {
fn widget_type(&self) -> WidgetType;
fn required_metrics(&self) -> &[&str];
fn update_metrics(&mut self, metrics: &HashMap<String, Metric>);
fn render(&self, frame: &mut Frame, area: Rect);
fn handle_input(&mut self, event: &Event) -> bool;
fn get_status(&self) -> Status;
}
#[derive(Debug, Clone, Copy, Hash, Eq, PartialEq)]
pub enum WidgetType {
Cpu,
Memory,
Storage,
Services,
Backup,
Hosts,
Alerts,
}
Enhanced Widget Implementation
// dashboard/src/ui/widgets/cpu.rs
pub struct CpuWidget {
metrics: HashMap<String, Metric>,
config: CpuWidgetConfig,
}
impl Widget for CpuWidget {
fn required_metrics(&self) -> &[&str] {
CPU_WIDGET_METRICS
}
fn update_metrics(&mut self, metrics: &HashMap<String, Metric>) {
// Update only the metrics this widget cares about
for &metric_name in self.required_metrics() {
if let Some(metric) = metrics.get(metric_name) {
self.metrics.insert(metric_name.to_string(), metric.clone());
}
}
}
fn render(&self, frame: &mut Frame, area: Rect) {
// Extract specific metric values for display
let load_1min = self.get_metric_value("cpu_load_1min").unwrap_or(0.0);
let load_5min = self.get_metric_value("cpu_load_5min").unwrap_or(0.0);
let temperature = self.get_metric_value("cpu_temperature_celsius");
// Maintain existing UI layout and styling
// ... render implementation preserving current appearance
}
fn get_status(&self) -> Status {
// Aggregate status from individual metric statuses
self.metrics.values()
.map(|m| &m.status)
.max()
.copied()
.unwrap_or(Status::Unknown)
}
}
Host Management
Multi-Host Connection Management
// dashboard/src/hosts/manager.rs
pub struct HostManager {
connections: HashMap<String, HostConnection>,
discovery: HostDiscovery,
active_host: Option<String>,
metric_store: Arc<Mutex<MetricStore>>,
}
impl HostManager {
pub async fn discover_hosts(&mut self) -> Result<Vec<String>>
pub async fn connect_to_host(&mut self, hostname: &str) -> Result<()>
pub fn disconnect_from_host(&mut self, hostname: &str)
pub fn set_active_host(&mut self, hostname: String)
pub fn get_active_host(&self) -> Option<&str>
pub fn get_connected_hosts(&self) -> Vec<&str>
pub async fn refresh_all_hosts(&mut self) -> Result<()>
}
// dashboard/src/hosts/connection.rs
pub struct HostConnection {
hostname: String,
zmq_consumer: ZmqConsumer,
last_seen: Instant,
connection_status: ConnectionStatus,
metric_buffer: VecDeque<Metric>,
}
#[derive(Debug, Clone)]
pub enum ConnectionStatus {
Connected,
Connecting,
Disconnected,
Error(String),
}
Configuration Integration
Dashboard Configuration
# dashboard/config/dashboard.toml
[zmq]
subscriber_ports = [6130] # Ports to listen on for metrics
connection_timeout_ms = 15000
reconnect_interval_ms = 5000
[ui]
refresh_rate_ms = 100
theme = "default"
preserve_layout = true
[hosts]
auto_discovery = true
predefined_hosts = ["cmbox", "labbox", "simonbox", "steambox", "srv01"]
default_host = "cmbox"
[metrics]
history_retention_hours = 24
max_metrics_per_host = 10000
[widgets.cpu]
enabled = true
metrics = [
"cpu_load_1min",
"cpu_load_5min",
"cpu_load_15min",
"cpu_temperature_celsius"
]
[widgets.memory]
enabled = true
metrics = [
"memory_usage_percent",
"memory_total_gb",
"memory_available_gb"
]
[widgets.storage]
enabled = true
metrics = [
"disk_nvme0_temperature_celsius",
"disk_nvme0_wear_percent",
"disk_nvme0_usage_percent"
]
UI Layout Preservation Rules
1. Maintain Current Widget Positions
- CPU widget: Top-left position preserved
- Memory widget: Top-right position preserved
- Storage widget: Left-center position preserved
- Services widget: Right-center position preserved
- Backup widget: Bottom-right position preserved
- Host navigation: Bottom status bar preserved
2. Preserve Visual Styling
- Colors: Existing status colors (green, yellow, red) maintained
- Borders: Current border styles and characters preserved
- Text formatting: Font styles, alignment, and spacing preserved
- Progress bars: Current progress bar implementations maintained
3. Maintain User Interactions
- Navigation keys:
←→for host switching preserved - Refresh key:
rfor manual refresh preserved - Quit key:
qfor exit preserved - Additional keys: All current keyboard shortcuts maintained
4. Status Display Consistency
- Status aggregation: Widget-level status calculated from individual metric statuses
- Color mapping: Status enum maps to existing color scheme
- Status indicators: Current status display format preserved
Implementation Migration Strategy
Phase 1: Shared Types
- Create
shared/crate with common protocol and metric types - Update both agent and dashboard to use shared types
Phase 2: Agent Migration
- Implement new agent architecture with individual metrics
- Maintain backward compatibility during transition
Phase 3: Dashboard Migration
- Update dashboard to consume individual metrics
- Preserve all existing UI layouts and interactions
- Enhance widgets with new metric subscription system
Phase 4: Integration Testing
- End-to-end testing with real multi-host scenarios
- Performance validation and optimization
- UI/UX validation to ensure no regressions
Benefits of This Architecture
- Maximum Flexibility: Dashboard can compose any widget from any metrics
- Easy Extension: Adding new metrics doesn't affect existing code
- Granular Caching: Cache individual metrics based on collection cost
- Simple Testing: Test individual metric collection in isolation
- Clear Separation: Agent collects, dashboard consumes and displays
- Efficient Updates: Only send changed metrics to dashboard
Future Extensions
- Metric Filtering: Dashboard requests only needed metrics
- Historical Storage: Store metric history for trending
- Metric Aggregation: Calculate derived metrics from base metrics
- Dynamic Discovery: Auto-discover new metric sources
- Metric Validation: Validate metric values and ranges