Complete atomic migration to structured data architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 1m7s
All checks were successful
Build and Release / build-and-release (push) Successful in 1m7s
Implements clean structured data collection eliminating all string metric parsing bugs. Collectors now populate AgentData directly with type-safe field access. Key improvements: - Mount points preserved correctly (/ and /boot instead of root/boot) - Tmpfs discovery added to memory collector - Temperature data flows as typed f32 fields - Zero string parsing overhead - Complete removal of MetricCollectionManager bridge - Direct ZMQ transmission of structured JSON All functionality maintained: service tracking, notifications, status evaluation, and multi-host monitoring.
This commit is contained in:
222
CLAUDE.md
222
CLAUDE.md
@@ -7,6 +7,7 @@ A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure.
|
||||
## Current Features
|
||||
|
||||
### Core Functionality
|
||||
|
||||
- **Real-time Monitoring**: CPU, RAM, Storage, and Service status
|
||||
- **Service Management**: Start/stop services with user-stopped tracking
|
||||
- **Multi-host Support**: Monitor multiple servers from single dashboard
|
||||
@@ -14,6 +15,7 @@ A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure.
|
||||
- **Backup Monitoring**: Borgbackup status and scheduling
|
||||
|
||||
### User-Stopped Service Tracking
|
||||
|
||||
- Services stopped via dashboard are marked as "user-stopped"
|
||||
- User-stopped services report Status::OK instead of Warning
|
||||
- Prevents false alerts during intentional maintenance
|
||||
@@ -21,9 +23,11 @@ A high-performance Rust-based TUI dashboard for monitoring CMTEC infrastructure.
|
||||
- Automatic flag clearing when services are restarted via dashboard
|
||||
|
||||
### Custom Service Logs
|
||||
|
||||
- Configure service-specific log file paths per host in dashboard config
|
||||
- Press `L` on any service to view custom log files via `tail -f`
|
||||
- Configuration format in dashboard config:
|
||||
|
||||
```toml
|
||||
[service_logs]
|
||||
hostname1 = [
|
||||
@@ -36,8 +40,9 @@ hostname2 = [
|
||||
```
|
||||
|
||||
### Service Management
|
||||
|
||||
- **Direct Control**: Arrow keys (↑↓) or vim keys (j/k) navigate services
|
||||
- **Service Actions**:
|
||||
- **Service Actions**:
|
||||
- `s` - Start service (sends UserStart command)
|
||||
- `S` - Stop service (sends UserStop command)
|
||||
- `J` - Show service logs (journalctl in tmux popup)
|
||||
@@ -47,6 +52,7 @@ hostname2 = [
|
||||
- **Transitional Icons**: Blue arrows during operations
|
||||
|
||||
### Navigation
|
||||
|
||||
- **Tab**: Switch between hosts
|
||||
- **↑↓ or j/k**: Select services
|
||||
- **s**: Start selected service (UserStart)
|
||||
@@ -60,14 +66,17 @@ hostname2 = [
|
||||
## Core Architecture Principles
|
||||
|
||||
### Structured Data Architecture (✅ IMPLEMENTED v0.1.131)
|
||||
|
||||
Complete migration from string-based metrics to structured JSON data. Eliminates all string parsing bugs and provides type-safe data access.
|
||||
|
||||
**Previous (String Metrics):**
|
||||
|
||||
- ❌ Agent sent individual metrics with string names like `disk_nvme0n1_temperature`
|
||||
- ❌ Dashboard parsed metric names with underscore counting and string splitting
|
||||
- ❌ Complex and error-prone metric filtering and extraction logic
|
||||
|
||||
**Current (Structured Data):**
|
||||
|
||||
```json
|
||||
{
|
||||
"hostname": "cmbox",
|
||||
@@ -75,7 +84,7 @@ Complete migration from string-based metrics to structured JSON data. Eliminates
|
||||
"timestamp": 1763926877,
|
||||
"system": {
|
||||
"cpu": {
|
||||
"load_1min": 3.50,
|
||||
"load_1min": 3.5,
|
||||
"load_5min": 3.57,
|
||||
"load_15min": 3.58,
|
||||
"frequency_mhz": 1500,
|
||||
@@ -88,7 +97,12 @@ Complete migration from string-based metrics to structured JSON data. Eliminates
|
||||
"swap_total_gb": 10.7,
|
||||
"swap_used_gb": 0.99,
|
||||
"tmpfs": [
|
||||
{"mount": "/tmp", "usage_percent": 15.0, "used_gb": 0.3, "total_gb": 2.0}
|
||||
{
|
||||
"mount": "/tmp",
|
||||
"usage_percent": 15.0,
|
||||
"used_gb": 0.3,
|
||||
"total_gb": 2.0
|
||||
}
|
||||
]
|
||||
},
|
||||
"storage": {
|
||||
@@ -99,7 +113,12 @@ Complete migration from string-based metrics to structured JSON data. Eliminates
|
||||
"temperature_celsius": 29.0,
|
||||
"wear_percent": 1.0,
|
||||
"filesystems": [
|
||||
{"mount": "/", "usage_percent": 24.0, "used_gb": 224.9, "total_gb": 928.2}
|
||||
{
|
||||
"mount": "/",
|
||||
"usage_percent": 24.0,
|
||||
"used_gb": 224.9,
|
||||
"total_gb": 928.2
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -112,18 +131,14 @@ Complete migration from string-based metrics to structured JSON data. Eliminates
|
||||
"usage_percent": 63.0,
|
||||
"used_gb": 2355.2,
|
||||
"total_gb": 3686.4,
|
||||
"data_drives": [
|
||||
{"name": "sdb", "temperature_celsius": 24.0}
|
||||
],
|
||||
"parity_drives": [
|
||||
{"name": "sdc", "temperature_celsius": 24.0}
|
||||
]
|
||||
"data_drives": [{ "name": "sdb", "temperature_celsius": 24.0 }],
|
||||
"parity_drives": [{ "name": "sdc", "temperature_celsius": 24.0 }]
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"services": [
|
||||
{"name": "sshd", "status": "active", "memory_mb": 4.5, "disk_gb": 0.0}
|
||||
{ "name": "sshd", "status": "active", "memory_mb": 4.5, "disk_gb": 0.0 }
|
||||
],
|
||||
"backup": {
|
||||
"status": "completed",
|
||||
@@ -134,19 +149,21 @@ Complete migration from string-based metrics to structured JSON data. Eliminates
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- ✅ Agent sends structured JSON over ZMQ (no legacy support)
|
||||
- ✅ Type-safe data access: `data.system.storage.drives[0].temperature_celsius`
|
||||
- ✅ Complete metric coverage: CPU, memory, storage, services, backup
|
||||
- ✅ Backward compatibility via bridge conversion to existing UI widgets
|
||||
- ✅ All string parsing bugs eliminated
|
||||
|
||||
|
||||
### Maintenance Mode
|
||||
|
||||
- Agent checks for `/tmp/cm-maintenance` file before sending notifications
|
||||
- File presence suppresses all email notifications while continuing monitoring
|
||||
- Dashboard continues to show real status, only notifications are blocked
|
||||
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
# Enable maintenance mode
|
||||
touch /tmp/cm-maintenance
|
||||
@@ -163,16 +180,19 @@ rm /tmp/cm-maintenance
|
||||
## Development and Deployment Architecture
|
||||
|
||||
### Development Path
|
||||
- **Location:** `~/projects/cm-dashboard`
|
||||
|
||||
- **Location:** `~/projects/cm-dashboard`
|
||||
- **Purpose:** Development workflow only - for committing new code
|
||||
- **Access:** Only for developers to commit changes
|
||||
|
||||
### Deployment Path
|
||||
### Deployment Path
|
||||
|
||||
- **Location:** `/var/lib/cm-dashboard/nixos-config`
|
||||
- **Purpose:** Production deployment only - agent clones/pulls from git
|
||||
- **Workflow:** git pull → `/var/lib/cm-dashboard/nixos-config` → nixos-rebuild
|
||||
|
||||
### Git Flow
|
||||
|
||||
```
|
||||
Development: ~/projects/cm-dashboard → git commit → git push
|
||||
Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
|
||||
@@ -183,6 +203,7 @@ Deployment: git pull → /var/lib/cm-dashboard/nixos-config → rebuild
|
||||
CM Dashboard uses automated binary releases instead of source builds.
|
||||
|
||||
### Creating New Releases
|
||||
|
||||
```bash
|
||||
cd ~/projects/cm-dashboard
|
||||
git tag v0.1.X
|
||||
@@ -190,11 +211,13 @@ git push origin v0.1.X
|
||||
```
|
||||
|
||||
This automatically:
|
||||
|
||||
- Builds static binaries with `RUSTFLAGS="-C target-feature=+crt-static"`
|
||||
- Creates GitHub-style release with tarball
|
||||
- Uploads binaries via Gitea API
|
||||
|
||||
### NixOS Configuration Updates
|
||||
|
||||
Edit `~/projects/nixosbox/hosts/services/cm-dashboard.nix`:
|
||||
|
||||
```nix
|
||||
@@ -206,6 +229,7 @@ src = pkgs.fetchurl {
|
||||
```
|
||||
|
||||
### Get Release Hash
|
||||
|
||||
```bash
|
||||
cd ~/projects/nixosbox
|
||||
nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
|
||||
@@ -217,6 +241,7 @@ nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
|
||||
### Building
|
||||
|
||||
**Testing & Building:**
|
||||
|
||||
- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
|
||||
- **Clean compilation**: Remove `target/` between major changes
|
||||
|
||||
@@ -229,6 +254,7 @@ The dashboard uses automatic storage discovery to eliminate manual configuration
|
||||
### Discovery Process
|
||||
|
||||
**At Agent Startup:**
|
||||
|
||||
1. Parse `/proc/mounts` to identify all mounted filesystems
|
||||
2. Detect MergerFS pools by analyzing `fuse.mergerfs` mount sources
|
||||
3. Identify member disks and potential parity relationships via heuristics
|
||||
@@ -236,6 +262,7 @@ The dashboard uses automatic storage discovery to eliminate manual configuration
|
||||
5. Generate pool-aware metrics with hierarchical relationships
|
||||
|
||||
**Continuous Monitoring:**
|
||||
|
||||
- Use stored discovery data for efficient metric collection
|
||||
- Monitor individual drives for SMART data, temperature, wear
|
||||
- Calculate pool-level health based on member drive status
|
||||
@@ -244,11 +271,13 @@ The dashboard uses automatic storage discovery to eliminate manual configuration
|
||||
### Supported Storage Types
|
||||
|
||||
**Single Disks:**
|
||||
|
||||
- ext4, xfs, btrfs mounted directly
|
||||
- Individual drive monitoring with SMART data
|
||||
- Traditional single-disk display for root, boot, etc.
|
||||
|
||||
**MergerFS Pools:**
|
||||
|
||||
- Auto-detect from `/proc/mounts` fuse.mergerfs entries
|
||||
- Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2")
|
||||
- Heuristic parity disk detection (sequential device names, "parity" in path)
|
||||
@@ -256,6 +285,7 @@ The dashboard uses automatic storage discovery to eliminate manual configuration
|
||||
- Hierarchical tree display with data/parity disk grouping
|
||||
|
||||
**Future Extensions Ready:**
|
||||
|
||||
- RAID arrays via `/proc/mdstat` parsing
|
||||
- ZFS pools via `zpool status` integration
|
||||
- LVM logical volumes via `lvs` discovery
|
||||
@@ -274,76 +304,29 @@ exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
|
||||
### Display Format
|
||||
|
||||
```
|
||||
CPU:
|
||||
● Load: 0.23 0.21 0.13
|
||||
└─ Freq: 1048 MHz
|
||||
|
||||
RAM:
|
||||
● Usage: 25% 5.8GB/23.3GB
|
||||
├─ ● /tmp: 2% 0.5GB/2GB
|
||||
└─ ● /var/tmp: 0% 0GB/1.0GB
|
||||
|
||||
Storage:
|
||||
● /srv/media (mergerfs (2+1)):
|
||||
├─ Pool Status: ● Healthy (3 drives)
|
||||
● mergerfs (2+1):
|
||||
├─ Total: ● 63% 2355.2GB/3686.4GB
|
||||
├─ Data Disks:
|
||||
│ ├─ ● sdb T: 24°C
|
||||
│ └─ ● sdd T: 27°C
|
||||
└─ Parity: ● sdc T: 24°C
|
||||
● /:
|
||||
├─ ● nvme0n1 W: 13%
|
||||
└─ ● 7% 14.5GB/218.5GB
|
||||
│ ├─ ● sdb T: 24°C W: 5%
|
||||
│ └─ ● sdd T: 27°C W: 5%
|
||||
├─ Parity: ● sdc T: 24°C W: 5%
|
||||
└─ Mount: /srv/media
|
||||
|
||||
● nvme0n1 T: 25C W: 4%
|
||||
├─ ● /: 55% 250.5GB/456.4GB
|
||||
└─ ● /boot: 26% 0.3GB/1.0GB
|
||||
```
|
||||
|
||||
### Implementation Benefits
|
||||
|
||||
- **Zero Configuration**: No manual pool definitions required
|
||||
- **Always Accurate**: Reflects actual system state automatically
|
||||
- **Scales Automatically**: Handles any number of pools without config changes
|
||||
- **Backwards Compatible**: Single disks continue working unchanged
|
||||
- **Future Ready**: Easy extension for additional storage technologies
|
||||
|
||||
### Current Status (v0.1.100)
|
||||
|
||||
**✅ Completed:**
|
||||
- Auto-discovery system implemented and deployed
|
||||
- `/proc/mounts` parsing with smart heuristics for parity detection
|
||||
- Storage topology stored at agent startup for efficient monitoring
|
||||
- Universal zero-configuration for all hosts (cmbox, steambox, simonbox, srv01, srv02, srv03)
|
||||
- Enhanced pool health calculation (healthy/degraded/critical)
|
||||
- Hierarchical tree visualization with data/parity disk separation
|
||||
|
||||
**🔄 In Progress - Complete Disk Collector Rewrite:**
|
||||
|
||||
The current disk collector has grown complex with mixed legacy/auto-discovery approaches. Planning complete rewrite with clean, simple workflow supporting both physical drives and mergerfs pools.
|
||||
|
||||
**New Clean Architecture:**
|
||||
|
||||
**Discovery Workflow:**
|
||||
1. **`lsblk`** to detect all mount points and backing devices
|
||||
2. **`df`** to get filesystem usage for each mount point
|
||||
3. **Group by physical drive** (nvme0n1, sda, etc.)
|
||||
4. **Parse `/proc/mounts`** for mergerfs pools
|
||||
5. **Generate unified metrics** for both storage types
|
||||
|
||||
**Physical Drive Display:**
|
||||
```
|
||||
● nvme0n1:
|
||||
├─ ● Drive: T: 35°C W: 1%
|
||||
├─ ● Total: 23% 218.0GB/928.2GB
|
||||
├─ ● /boot: 11% 0.1GB/1.0GB
|
||||
└─ ● /: 23% 214.9GB/928.2GB
|
||||
```
|
||||
|
||||
**MergerFS Pool Display:**
|
||||
```
|
||||
● /srv/media (mergerfs):
|
||||
├─ ● Pool: 63% 2355.2GB/3686.4GB
|
||||
├─ Data Disks:
|
||||
│ ├─ ● sdb T: 24°C
|
||||
│ └─ ● sdd T: 27°C
|
||||
└─ ● sdc T: 24°C (parity)
|
||||
```
|
||||
|
||||
**Implementation Benefits:**
|
||||
- **Pure auto-discovery**: No configuration needed
|
||||
- **Clean code paths**: Single workflow for all storage types
|
||||
- **Consistent display**: Status icons on every line, no redundant text
|
||||
- **Simple pipeline**: lsblk → df → group → metrics
|
||||
- **Support for both**: Physical drives and mergerfs pools
|
||||
|
||||
## Important Communication Guidelines
|
||||
|
||||
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
|
||||
@@ -351,17 +334,20 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
|
||||
## Commit Message Guidelines
|
||||
|
||||
**NEVER mention:**
|
||||
|
||||
- Claude or any AI assistant names
|
||||
- Automation or AI-generated content
|
||||
- Any reference to automated code generation
|
||||
|
||||
**ALWAYS:**
|
||||
|
||||
- Focus purely on technical changes and their purpose
|
||||
- Use standard software development commit message format
|
||||
- Describe what was changed and why, not how it was created
|
||||
- Write from the perspective of a human developer
|
||||
|
||||
**Examples:**
|
||||
|
||||
- ❌ "Generated with Claude Code"
|
||||
- ❌ "AI-assisted implementation"
|
||||
- ❌ "Automated refactoring"
|
||||
@@ -371,47 +357,53 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
|
||||
|
||||
## Completed Architecture Migration (v0.1.131)
|
||||
|
||||
### ✅ Phase 1: Structured Data Types (Shared Crate) - COMPLETED
|
||||
- ✅ Created AgentData struct matching JSON structure
|
||||
- ✅ Added complete type hierarchy: CPU, memory, storage, services, backup
|
||||
- ✅ Implemented serde serialization/deserialization
|
||||
- ✅ Updated ZMQ protocol for structured data transmission
|
||||
## Agent Architecture Migration Plan (v0.1.139)
|
||||
|
||||
### ✅ Phase 2: Agent Refactor - COMPLETED
|
||||
- ✅ Agent converts all metrics to structured AgentData
|
||||
- ✅ Comprehensive metric parsing: storage (drives, temp, wear), services, backup
|
||||
- ✅ Structured JSON transmission over ZMQ (no legacy support)
|
||||
- ✅ Type-safe data flow throughout agent pipeline
|
||||
**🎯 Goal: Eliminate String Metrics Bridge, Direct Structured Data Collection**
|
||||
|
||||
### ✅ Phase 3: Dashboard Refactor - COMPLETED
|
||||
- ✅ Dashboard receives structured data and bridges to existing UI
|
||||
- ✅ Bridge conversion maintains compatibility with current widgets
|
||||
- ✅ All metric types converted: storage, services, backup, CPU, memory
|
||||
- ✅ Foundation ready for direct structured data widget migration
|
||||
### Current Architecture (v0.1.138)
|
||||
|
||||
### 🚀 Next Phase: Direct Widget Migration
|
||||
- Replace metric bridge with direct structured data access in widgets
|
||||
- Eliminate temporary conversion layer
|
||||
- Full end-to-end type safety from agent to UI
|
||||
**Current Flow:**
|
||||
```
|
||||
Collectors → String Metrics → MetricManager.cache
|
||||
↘
|
||||
process_metrics() → HostStatusManager → Notifications
|
||||
↘
|
||||
broadcast_all_metrics() → Bridge Conversion → AgentData → ZMQ
|
||||
```
|
||||
|
||||
## Key Achievements (v0.1.131)
|
||||
**Issues:**
|
||||
- Bridge conversion loses mount point information (`/` becomes `root`, `/boot` becomes `boot`)
|
||||
- Tmpfs mounts not properly displayed in RAM section
|
||||
- Unnecessary string parsing complexity and potential bugs
|
||||
- String-to-JSON conversion introduces data transformation errors
|
||||
|
||||
**✅ NVMe Temperature Issue SOLVED**
|
||||
- Temperature data now flows as typed field: `agent_data.system.storage.drives[0].temperature_celsius: f32`
|
||||
- Eliminates string parsing bugs: no more `"disk_nvme0n1_temperature"` extraction failures
|
||||
- Type-safe access prevents all similar parsing issues across the system
|
||||
### Target Architecture
|
||||
|
||||
**✅ Complete Structured Data Implementation**
|
||||
- Agent: Collects metrics → structured JSON → ZMQ transmission
|
||||
- Dashboard: Receives JSON → bridge conversion → existing UI widgets
|
||||
- Full metric coverage: CPU, memory, storage (drives, pools), services, backup
|
||||
- Zero legacy support - clean architecture with no compatibility cruft
|
||||
**Target Flow:**
|
||||
```
|
||||
Collectors → AgentData → HostStatusManager → Notifications
|
||||
↘
|
||||
Direct ZMQ Transmission
|
||||
```
|
||||
|
||||
**✅ Foundation for Future Enhancements**
|
||||
- Type-safe data structures enable easy feature additions
|
||||
- Self-documenting JSON schema shows all available metrics
|
||||
- Direct field access eliminates entire class of parsing bugs
|
||||
- Ready for next phase: direct widget migration for ultimate performance
|
||||
### Implementation Plan
|
||||
|
||||
#### Atomic Migration (v0.1.139) - Single Complete Rewrite
|
||||
- **Complete removal** of string metrics system - no legacy support
|
||||
- **Collectors output structured data directly** - populate `AgentData` with correct mount points
|
||||
- **HostStatusManager operates on `AgentData`** - status evaluation on structured fields
|
||||
- **Notifications process structured data** - preserve all notification logic
|
||||
- **Direct ZMQ transmission** - no bridge conversion code
|
||||
- **Service tracking preserved** - user-stopped flags, thresholds, all functionality intact
|
||||
- **Zero backward compatibility** - clean break from string metric architecture
|
||||
|
||||
### Benefits
|
||||
- **Correct Display**: `/` and `/boot` mount points, proper tmpfs in RAM section
|
||||
- **Performance**: Eliminate string parsing overhead
|
||||
- **Maintainability**: Type-safe data flow, no string parsing bugs
|
||||
- **Functionality Preserved**: Status evaluation, notifications, service tracking intact
|
||||
- **Clean Architecture**: NO legacy fallback code, complete migration to structured data
|
||||
|
||||
## Implementation Rules
|
||||
|
||||
@@ -420,6 +412,7 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
|
||||
3. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
|
||||
|
||||
**NEVER:**
|
||||
|
||||
- Copy/paste ANY code from legacy implementations
|
||||
- Calculate status in dashboard widgets
|
||||
- Hardcode metric names in widgets (use const arrays)
|
||||
@@ -427,7 +420,8 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
|
||||
- Create documentation files unless explicitly requested
|
||||
|
||||
**ALWAYS:**
|
||||
|
||||
- Prefer editing existing files to creating new ones
|
||||
- Follow existing code conventions and patterns
|
||||
- Use existing libraries and utilities
|
||||
- Follow security best practices
|
||||
- Follow security best practices
|
||||
|
||||
Reference in New Issue
Block a user