Compare commits

...

47 Commits

Author SHA1 Message Date
adf3b0f51c Implement complete structured data architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
Replace fragile string-based metrics with type-safe JSON data structures.
Agent converts all metrics to structured data, dashboard processes typed fields.

Changes:
- Add AgentData struct with CPU, memory, storage, services, backup fields
- Replace string parsing with direct field access throughout system
- Maintain UI compatibility via temporary metric bridge conversion
- Fix NVMe temperature display and eliminate string parsing bugs
- Update protocol to support structured data transmission over ZMQ
- Comprehensive metric type coverage: CPU, memory, storage, services, backup

Version bump to 0.1.131
2025-11-23 21:32:00 +01:00
41ded0170c Add wear percentage display and NVMe temperature collection
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
- Display wear percentage in storage headers for single physical drives
- Remove redundant drive type indicators, show wear data instead
- Fix wear metric parsing for physical drives (underscore count issue)
- Add NVMe temperature parsing support (Temperature: format)
- Add raw metrics debugging functionality for troubleshooting
- Clean up physical drive display to remove redundant information
2025-11-23 20:29:24 +01:00
9b4191b2c3 Fix physical drive name and health status display
All checks were successful
Build and Release / build-and-release (push) Successful in 2m13s
- Display actual drive name (e.g., nvme0n1) instead of mount point for physical drives
- Fix health status parsing for physical drives to show proper status icons
- Update pool name extraction to handle disk_{drive}_health metrics correctly
- Improve storage widget rendering for physical drive identification
2025-11-23 19:25:45 +01:00
53dbb43352 Fix SnapRAID parity association using directory-based discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
- Replace blanket parity drive inclusion with smart relationship detection
- Only associate parity drives from same parent directory as data drives
- Prevent incorrect exclusion of nvme0n1 physical drives from grouping
- Maintain zero-configuration auto-discovery without hardcoded paths
2025-11-23 18:42:48 +01:00
ba03623110 Remove hardcoded pool mount point mappings for true auto-discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Eliminate hardcoded mappings like 'root' -> '/' and 'steampool' -> '/mnt/steampool'
- Use device names directly for physical drives
- Rely on mount_point metrics from agent for actual mount paths
- Implement zero-configuration architecture as specified in CLAUDE.md
2025-11-23 18:34:45 +01:00
f24c4ed650 Fix pool name extraction to prevent wrong physical drive naming
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Remove fallback logic that could extract incorrect pool names
- Simplify pool suffix matching to use explicit arrays
- Ensure only valid metric patterns create pools
2025-11-23 18:24:39 +01:00
86501fd486 Fix display format to match CLAUDE.md specification
All checks were successful
Build and Release / build-and-release (push) Successful in 1m17s
- Use actual device names (sdb, sdc) instead of data_0, parity_0
- Fix physical drive naming to show device names instead of mount points
- Update pool name extraction to handle new device-based naming
- Ensure Drive: line shows temperature and wear data for physical drives
2025-11-23 18:13:35 +01:00
192eea6e0c Integrate SnapRAID parity drives into mergerfs pools
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Add SnapRAID parity drive detection to mergerfs discovery
- Remove Pool Status health line as discussed
- Update drive display to always show wear data when available
- Include /mnt/parity drives as part of mergerfs pool structure
2025-11-23 18:05:19 +01:00
43fb838c9b Fix duplicate drive display in mergerfs pools
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
- Restructure storage rendering logic to prevent drive duplication
- Use specific mergerfs check instead of generic multi-drive condition
- Ensure drives only appear once under organized data/parity sections
2025-11-23 17:46:09 +01:00
54483653f9 Fix mergerfs drive metric parsing for proper pool consolidation
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
- Update extract_pool_name to handle data_/parity_ drive metrics correctly
- Fix extract_drive_name to parse mergerfs drive roles properly
- Prevent srv_media_data from being parsed as separate pool
2025-11-23 17:40:12 +01:00
e47803b705 Fix mergerfs pool consolidation and naming
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
- Improve pool name extraction in dashboard parsing
- Use consistent mergerfs pool naming in agent
- Add mount_point metric parsing to use actual mount paths
- Fix pool consolidation to prevent duplicate entries
2025-11-23 17:35:23 +01:00
439d0d9af6 Fix mergerfs numeric reference parsing for proper pool detection
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
Add support for numeric mergerfs references like "1:2" by mapping them
to actual mount points (/mnt/disk1, /mnt/disk2). This enables proper
mergerfs pool detection and hides individual member drives as intended.
2025-11-23 17:27:45 +01:00
2242b5ddfe Make mergerfs detection more robust to prevent discovery failures
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Skip mergerfs pools with numeric device references (e.g., "1:2")
instead of crashing. This allows regular drive detection to work
even when mergerfs uses non-standard mount formats.

Preserves existing functionality for standard mergerfs setups.
2025-11-23 17:19:15 +01:00
9d0f42d55c Fix filesystem usage_percent parsing and remove hardcoded status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
1. Add missing _fs_ filter to usage_percent parsing in dashboard
2. Fix agent to use calculated fs_status instead of hardcoded Status::Ok

This completes the disk collector auto-discovery by ensuring filesystem
usage percentages and status indicators display correctly.
2025-11-23 16:47:20 +01:00
1da7b5f6e7 Fix both pool-level and filesystem metric parsing bugs
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
1. Prevent filesystem _fs_ metrics from overwriting pool totals
2. Fix filesystem name extraction to properly parse boot/root names

This resolves both the pool total display (showing 0.1GB instead of 220GB)
and individual filesystem display (showing —% —GB/—GB).
2025-11-23 16:29:00 +01:00
006f27f7d9 Fix lsblk parsing for filesystem discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Remove unused debug code and fix device name parsing to properly
handle lsblk tree characters. This resolves the issue where only
/boot filesystem was discovered instead of both /boot and /.
2025-11-23 16:09:48 +01:00
07422cd0a7 Add debug logging for filesystem discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
2025-11-23 15:26:49 +01:00
de30b80219 Fix filesystem metric parsing bounds error in dashboard
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Prevent string slicing panic in extract_filesystem_metric when
parsing individual filesystem metrics. This resolves the issue
where filesystem entries show —% —GB/—GB instead of actual usage.
2025-11-23 15:23:15 +01:00
7d96ca9fad Fix disk collector filesystem discovery with debug logging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Add debug logging to filesystem usage collection to identify why
some mount points are being dropped during discovery. This should
resolve the issue where total capacity shows incorrect values.
2025-11-23 15:15:56 +01:00
9b940ebd19 Fix string slicing bounds error in metric parsing
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Fixed critical bug where dashboard crashed with 'begin <= end' slice error
when parsing disk metrics with new naming format. Added bounds checking
to prevent invalid string slicing operations.

- Fixed extract_pool_name string slicing bounds check
- Removed ineffective panic handling that caused infinite loop
- Dashboard now handles new disk collector metrics correctly
2025-11-23 14:52:09 +01:00
6d4da1b7da Add robust error handling to prevent dashboard crashes
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Added comprehensive error handling to storage metrics parsing to prevent
dashboard crashes when encountering unexpected metric formats or parsing
errors. Dashboard now continues gracefully with empty storage display
instead of crashing, improving reliability during metric format changes.

- Wrapped storage metric parsing in panic recovery
- Added logging for metric parsing failures
- Dashboard shows empty storage on errors instead of crashing
- Ensures dashboard remains functional during agent updates
2025-11-23 14:45:00 +01:00
1e7f1616aa Complete disk collector rewrite with clean architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
Replaced complex disk collector with simple lsblk → df → group workflow.
Supports both physical drives and mergerfs pools with unified metrics.
Eliminates configuration complexity through pure auto-discovery.

- Clean discovery pipeline using lsblk and df commands
- Physical drive grouping with filesystem children
- MergerFS pool detection with parity heuristics
- Unified metric generation for consistent dashboard display
- SMART data collection for temperature, wear, and health
2025-11-23 14:22:19 +01:00
7a3ee3d5ba Fix physical drive grouping logic for unified pool visualization
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
Updated filesystem grouping to use extract_base_device method for proper
partition-to-drive mapping. This ensures nvme0n1p1 and nvme0n1p2 are
correctly grouped under nvme0n1 drive pool instead of separate pools.
2025-11-23 13:54:33 +01:00
0e8b149718 Add partial filesystem data display for debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
- Make filesystem display more forgiving - show partial data if available
- Will display usage% even if GB values are missing, or vice versa
- This should help identify which specific metrics aren't being populated
- Debug version to identify filesystem data population issues
2025-11-23 13:33:36 +01:00
2c27d0e1db Prepare v0.1.107 for filesystem data debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Current status: Filesystem children appear with correct mount points but show —% —GB/—GB
Need to debug why usage_percent, used_gb, total_gb metrics aren't populating filesystem entries
2025-11-23 13:24:13 +01:00
9f18488752 Fix filesystem metric parsing for correct mount point names
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Fix extract_filesystem_metric() to handle multi-underscore metric names correctly
- Parse known metric suffixes (usage_percent, mount_point, available_gb, etc.)
- Prevent incorrect parsing like boot_mount_point -> fs_name='boot_mount', metric_type='point'
- Should now correctly show /boot and / instead of /boot/mount and /root/mount
2025-11-23 13:11:05 +01:00
fab6404cca Fix filesystem children creation logic
All checks were successful
Build and Release / build-and-release (push) Successful in 1m17s
- Allow filesystem entries to be created with any metric, not just mount_point
- Ensure filesystem children appear under physical drive pools
- Improve mount point fallback logic for better compatibility
2025-11-23 13:04:01 +01:00
c3626cc362 Fix unified pool visualization filesystem children display issues
All checks were successful
Build and Release / build-and-release (push) Successful in 2m14s
- Fix extract_pool_name() to handle filesystem metrics (_fs_) correctly
- Prevent individual filesystem pools (nvme0n1_fs_boot, nvme0n1_fs_root) from being created
- Fix incorrect mount point names (was showing /root/mount instead of /)
- Only create filesystem entries when receiving mount_point metrics
- Add available_gb field to FileSystem struct for proper available space handling
- Ensure filesystem children show correct usage data instead of —% —GB/—GB
2025-11-23 12:58:16 +01:00
d68ecfbc64 Complete unified pool visualization with filesystem children
All checks were successful
Build and Release / build-and-release (push) Successful in 2m17s
- Implement filesystem children display under physical drive pools
- Agent generates individual filesystem metrics for each mount point
- Dashboard parses filesystem metrics and displays as tree children
- Add filesystem usage, total, and available space metrics
- Support target format: drive info + filesystem children hierarchy
- Fix compilation warnings by properly using available_bytes calculation
2025-11-23 12:48:24 +01:00
d1272a6c13 Implement unified pool visualization for single drives
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Group single disk filesystems by physical drive during auto-discovery
- Create physical drive pools with filesystem children
- Display temperature, wear, and health at drive level
- Provide consistent hierarchical storage visualization
- Fix borrow checker issues in create_physical_drive_pool method
- Add PhysicalDrive case to all StoragePoolType match statements
2025-11-23 12:10:42 +01:00
33b3beb342 Implement storage auto-discovery system
All checks were successful
Build and Release / build-and-release (push) Successful in 1m49s
- Add automatic detection of mergerfs pools by parsing /proc/mounts
- Implement smart heuristics for parity disk identification
- Store discovered topology at agent startup for efficient monitoring
- Eliminate need for manual storage pool configuration
- Support zero-config storage visualization with backward compatibility
- Clean up mount parsing and remove unused fields
2025-11-23 11:44:57 +01:00
f9384d9df6 Implement enhanced storage pool visualization
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
- Add support for mergerfs pool grouping with data and parity disk separation
- Implement pool health monitoring (healthy/degraded/critical status)
- Create hierarchical tree view for multi-disk storage arrays
- Add automatic pool type detection and member disk association
- Maintain backward compatibility for single disk configurations
- Support future extension for RAID and ZFS pool types
2025-11-23 11:18:21 +01:00
156d707377 Add version display and fix status aggregation priorities
All checks were successful
Build and Release / build-and-release (push) Successful in 2m37s
- Add dynamic version display in top bar using CARGO_PKG_VERSION
- Rewrite status aggregation to only show Critical/Warning/OK in top bar
- Fix Status enum ordering to prioritize OK over transitional states
- Remove blue/gray colors from top bar background
2025-11-21 16:19:45 +01:00
dc1a2e3a0f Add disk wear monitoring and fix storage overflow display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m15s
- Add disk wear percentage collection from SMART data in backup script
- Add backup_disk_wear_percent metric to backup collector with thresholds
- Display wear percentage in backup widget disk section
- Fix storage section overflow handling to use consistent "X more below" logic
- Update maintenance mode to return pending status instead of unknown
2025-11-20 20:36:45 +01:00
5d6b8e6253 Treat pending status as OK for title bar color aggregation
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
Apply same logic used for inactive status to pending status.
Pending services now contribute to OK count instead of being
ignored, preventing blue title bar during service transitions.
2025-11-20 18:09:59 +01:00
0cba083305 Remove pending status from title bar color aggregation
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Title bar now only shows Critical (red), Warning (yellow), and OK (green)
colors. Pending status is ignored in color calculation to prevent blue
title bar during service transitions.
2025-11-20 14:19:29 +01:00
a6be7a4788 Consolidate log viewing to use service-manage logs action
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Replace separate service_logs_cmd with service-manage logs action
to unify service management through single script interface.
Dashboard now calls 'service-manage logs <service>' which provides
intelligent log viewing based on service state and configuration.
2025-11-20 11:30:55 +01:00
2384f7f9b9 Unify log viewing with configurable script command
All checks were successful
Build and Release / build-and-release (push) Successful in 2m37s
Replace separate J/L keys with single L key that calls configurable
service_logs_cmd from dashboard config. Script handles both journalctl
and custom log files automatically based on service configuration.

Update status bar to show all available keybindings including
previously missing backup and terminal commands.
2025-11-20 11:00:38 +01:00
cd5ef65d3d Fix service selection for services with sub-services
All checks were successful
Build and Release / build-and-release (push) Successful in 2m35s
- Fix get_selected_service to always return parent service names
- Prevent selection of container sub-items when managing docker services
- Ensure service commands operate on correct systemd service names
- Simplify service selection logic to only consider parent services
- Update version to 0.1.92
2025-11-19 18:01:10 +01:00
7bf9ca6201 Fix SSH command quoting and remove duplicate user prompts
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
- Fix rebuild and backup commands with proper inner command quoting
- Remove duplicate "Press any key to close..." from SSH commands since scripts handle it
- Clean up SSH terminal command to avoid redundant prompts
- Ensure consistent command execution patterns across all SSH operations
- Update version to 0.1.91
2025-11-19 16:08:03 +01:00
f587b42797 Implement unified SSH command management with dedicated scripts
All checks were successful
Build and Release / build-and-release (push) Successful in 1m11s
- Replace complex SSH command patterns with simple script calls
- Create service-manage script for start/stop operations with proper logging
- Create rebuild script equivalent to rebuild_git alias with user feedback
- Update dashboard to use unified command pattern: sudo service-manage, sudo rebuild
- Simplify backup to use service management: service-manage start borgbackup
- Configure sudoers with wildcards for Nix store path compatibility
- Remove cmtec references from script names for better genericity
- Update version to 0.1.90
2025-11-19 15:37:33 +01:00
7ae464e172 Wrap service commands in bash -c to ensure session persistence
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
- Use bash -c to properly execute service start/stop command sequences
- Ensure SSH session stays alive for user input prompt
- Fix escaping issues with nested quotes in commands
- Update version to 0.1.89
2025-11-19 13:32:04 +01:00
980c9a20a2 Fix service start/stop popup auto-close issue
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
- Move 'Press any key to close...' prompt inside SSH session
- Ensure tmux popup stays open until user manually closes
- Maintain consistent behavior with other SSH commands
- Update version to 0.1.88
2025-11-19 13:21:48 +01:00
448a38dede Fix service management command issues
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Add sudo to pkill commands to resolve permission errors when killing journalctl processes
- Fix service stop command timing to show logs during shutdown process
- Add sleep delays to ensure log visibility before cleanup
- Update version to 0.1.87
2025-11-19 13:13:15 +01:00
f12e20b0f3 Standardize SSH command patterns with consistent user feedback
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Apply uniform pattern to all SSH commands: informational text + command + exit prompt
- Remove exit prompt from logging commands (J/L keys) that run continuously with -f flag
- Simplify rebuild and backup commands to match service command pattern
- Update version to 0.1.86
2025-11-19 12:57:18 +01:00
564d1f37e7 Streamline service commands with auto-close functionality
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
- Remove header text from start/stop commands for cleaner output
- Add automatic log termination when service reaches target state
- Start command auto-closes when service becomes active
- Stop command auto-closes when service becomes inactive
- Simplify SSH command structure by removing bash -c wrapper
- Version bump to 0.1.85
2025-11-19 12:30:36 +01:00
65bfb9f617 Add real-time logging to service stop command
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
- Update stop command to use background systemctl with immediate log following
- Use same approach as start command for consistent real-time log viewing
- Version bump to 0.1.84
2025-11-19 11:59:18 +01:00
23 changed files with 3856 additions and 910 deletions

237
CLAUDE.md
View File

@@ -59,11 +59,85 @@ hostname2 = [
## Core Architecture Principles ## Core Architecture Principles
### Individual Metrics Philosophy ### Structured Data Architecture (Planned Migration)
- Agent collects individual metrics, dashboard composes widgets Current system uses string-based metrics with complex parsing. Planning migration to structured JSON data to eliminate fragile string manipulation.
- Each metric collected, transmitted, and stored individually
- Agent calculates status for each metric using thresholds **Current (String Metrics):**
- Dashboard aggregates individual metric statuses for widget status - Agent sends individual metrics with string names like `disk_nvme0n1_temperature`
- Dashboard parses metric names with underscore counting and string splitting
- Complex and error-prone metric filtering and extraction logic
**Target (Structured Data):**
```json
{
"hostname": "cmbox",
"agent_version": "v0.1.130",
"timestamp": 1763926877,
"system": {
"cpu": {
"load_1min": 3.50,
"load_5min": 3.57,
"load_15min": 3.58,
"frequency_mhz": 1500,
"temperature_celsius": 45.2
},
"memory": {
"usage_percent": 25.0,
"total_gb": 23.3,
"used_gb": 5.9,
"swap_total_gb": 10.7,
"swap_used_gb": 0.99,
"tmpfs": [
{"mount": "/tmp", "usage_percent": 15.0, "used_gb": 0.3, "total_gb": 2.0}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 29.0,
"wear_percent": 1.0,
"filesystems": [
{"mount": "/", "usage_percent": 24.0, "used_gb": 224.9, "total_gb": 928.2}
]
}
],
"pools": [
{
"name": "srv_media",
"mount": "/srv/media",
"type": "mergerfs",
"health": "healthy",
"usage_percent": 63.0,
"used_gb": 2355.2,
"total_gb": 3686.4,
"data_drives": [
{"name": "sdb", "temperature_celsius": 24.0}
],
"parity_drives": [
{"name": "sdc", "temperature_celsius": 24.0}
]
}
]
}
},
"services": [
{"name": "sshd", "status": "active", "memory_mb": 4.5, "disk_gb": 0.0}
],
"backup": {
"status": "completed",
"last_run": 1763920000,
"next_scheduled": 1764006400,
"total_size_gb": 150.5,
"repository_health": "ok"
}
}
```
- Agent sends structured JSON over ZMQ
- Dashboard accesses data directly: `data.system.storage.drives[0].temperature_celsius`
- Type safety eliminates all parsing bugs
### Maintenance Mode ### Maintenance Mode
- Agent checks for `/tmp/cm-maintenance` file before sending notifications - Agent checks for `/tmp/cm-maintenance` file before sending notifications
@@ -144,6 +218,130 @@ nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"` - **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
- **Clean compilation**: Remove `target/` between major changes - **Clean compilation**: Remove `target/` between major changes
## Enhanced Storage Pool Visualization
### Auto-Discovery Architecture
The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping.
### Discovery Process
**At Agent Startup:**
1. Parse `/proc/mounts` to identify all mounted filesystems
2. Detect MergerFS pools by analyzing `fuse.mergerfs` mount sources
3. Identify member disks and potential parity relationships via heuristics
4. Store discovered storage topology for continuous monitoring
5. Generate pool-aware metrics with hierarchical relationships
**Continuous Monitoring:**
- Use stored discovery data for efficient metric collection
- Monitor individual drives for SMART data, temperature, wear
- Calculate pool-level health based on member drive status
- Generate enhanced metrics for dashboard visualization
### Supported Storage Types
**Single Disks:**
- ext4, xfs, btrfs mounted directly
- Individual drive monitoring with SMART data
- Traditional single-disk display for root, boot, etc.
**MergerFS Pools:**
- Auto-detect from `/proc/mounts` fuse.mergerfs entries
- Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2")
- Heuristic parity disk detection (sequential device names, "parity" in path)
- Pool health calculation (healthy/degraded/critical)
- Hierarchical tree display with data/parity disk grouping
**Future Extensions Ready:**
- RAID arrays via `/proc/mdstat` parsing
- ZFS pools via `zpool status` integration
- LVM logical volumes via `lvs` discovery
### Configuration
```toml
[collectors.disk]
enabled = true
auto_discover = true # Default: true
# Optional exclusions for special filesystems
exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"]
exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
```
### Display Format
```
Storage:
● /srv/media (mergerfs (2+1)):
├─ Pool Status: ● Healthy (3 drives)
├─ Total: ● 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ Parity: ● sdc T: 24°C
● /:
├─ ● nvme0n1 W: 13%
└─ ● 7% 14.5GB/218.5GB
```
### Implementation Benefits
- **Zero Configuration**: No manual pool definitions required
- **Always Accurate**: Reflects actual system state automatically
- **Scales Automatically**: Handles any number of pools without config changes
- **Backwards Compatible**: Single disks continue working unchanged
- **Future Ready**: Easy extension for additional storage technologies
### Current Status (v0.1.100)
**✅ Completed:**
- Auto-discovery system implemented and deployed
- `/proc/mounts` parsing with smart heuristics for parity detection
- Storage topology stored at agent startup for efficient monitoring
- Universal zero-configuration for all hosts (cmbox, steambox, simonbox, srv01, srv02, srv03)
- Enhanced pool health calculation (healthy/degraded/critical)
- Hierarchical tree visualization with data/parity disk separation
**🔄 In Progress - Complete Disk Collector Rewrite:**
The current disk collector has grown complex with mixed legacy/auto-discovery approaches. Planning complete rewrite with clean, simple workflow supporting both physical drives and mergerfs pools.
**New Clean Architecture:**
**Discovery Workflow:**
1. **`lsblk`** to detect all mount points and backing devices
2. **`df`** to get filesystem usage for each mount point
3. **Group by physical drive** (nvme0n1, sda, etc.)
4. **Parse `/proc/mounts`** for mergerfs pools
5. **Generate unified metrics** for both storage types
**Physical Drive Display:**
```
● nvme0n1:
├─ ● Drive: T: 35°C W: 1%
├─ ● Total: 23% 218.0GB/928.2GB
├─ ● /boot: 11% 0.1GB/1.0GB
└─ ● /: 23% 214.9GB/928.2GB
```
**MergerFS Pool Display:**
```
● /srv/media (mergerfs):
├─ ● Pool: 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ ● sdc T: 24°C (parity)
```
**Implementation Benefits:**
- **Pure auto-discovery**: No configuration needed
- **Clean code paths**: Single workflow for all storage types
- **Consistent display**: Status icons on every line, no redundant text
- **Simple pipeline**: lsblk → df → group → metrics
- **Support for both**: Physical drives and mergerfs pools
## Important Communication Guidelines ## Important Communication Guidelines
Keep responses concise and focused. Avoid extensive implementation summaries unless requested. Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
@@ -169,12 +367,33 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
- ✅ "Restructure storage widget with improved layout" - ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values" - ✅ "Update CPU thresholds to production values"
## Planned Architecture Migration
### Phase 1: Structured Data Types (Shared Crate)
- Create Rust structs matching target JSON structure
- Replace `Metric` enum with typed data structures
- Add serde serialization/deserialization
### Phase 2: Agent Refactor
- Update collectors to return typed structs instead of `Vec<Metric>`
- Remove string metric name generation
- Send structured JSON over ZMQ
### Phase 3: Dashboard Refactor
- Replace metric parsing logic with direct field access
- Remove `extract_pool_name()`, `extract_drive_name()`, underscore counting
- Widgets access `data.system.storage.drives[0].temperature_celsius`
### Phase 4: Migration & Cleanup
- Support both formats during transition
- Gradual rollout with backward compatibility
- Remove legacy string metric system
## Implementation Rules ## Implementation Rules
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually 1. **Agent Status Authority**: Agent calculates status for each metric using thresholds
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds 2. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name 3. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
**NEVER:** **NEVER:**
- Copy/paste ANY code from legacy implementations - Copy/paste ANY code from legacy implementations

230
Cargo.lock generated
View File

@@ -17,9 +17,9 @@ dependencies = [
[[package]] [[package]]
name = "aho-corasick" name = "aho-corasick"
version = "1.1.3" version = "1.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e60d3430d3a69478ad0993f19238d2df97c507009a52b3c10addcd7f6bcb916" checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301"
dependencies = [ dependencies = [
"memchr", "memchr",
] ]
@@ -71,22 +71,22 @@ dependencies = [
[[package]] [[package]]
name = "anstyle-query" name = "anstyle-query"
version = "1.1.4" version = "1.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9e231f6134f61b71076a3eab506c379d4f36122f2af15a9ff04415ea4c3339e2" checksum = "40c48f72fd53cd289104fc64099abca73db4166ad86ea0b4341abe65af83dadc"
dependencies = [ dependencies = [
"windows-sys 0.60.2", "windows-sys 0.61.2",
] ]
[[package]] [[package]]
name = "anstyle-wincon" name = "anstyle-wincon"
version = "3.0.10" version = "3.0.11"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3e0633414522a32ffaac8ac6cc8f748e090c5717661fddeea04219e2344f5f2a" checksum = "291e6a250ff86cd4a820112fb8898808a366d8f9f58ce16d1f538353ad55747d"
dependencies = [ dependencies = [
"anstyle", "anstyle",
"once_cell_polyfill", "once_cell_polyfill",
"windows-sys 0.60.2", "windows-sys 0.61.2",
] ]
[[package]] [[package]]
@@ -95,6 +95,15 @@ version = "1.0.100"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a23eb6b1614318a8071c9b2521f36b424b2c83db5eb3a0fead4a6c0809af6e61" checksum = "a23eb6b1614318a8071c9b2521f36b424b2c83db5eb3a0fead4a6c0809af6e61"
[[package]]
name = "ar_archive_writer"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0c269894b6fe5e9d7ada0cf69b5bf847ff35bc25fc271f08e1d080fce80339a"
dependencies = [
"object",
]
[[package]] [[package]]
name = "async-trait" name = "async-trait"
version = "0.1.89" version = "0.1.89"
@@ -144,9 +153,9 @@ checksum = "46c5e41b57b8bba42a04676d81cb89e9ee8e859a1a66f80a5a72e1cb76b34d43"
[[package]] [[package]]
name = "bytes" name = "bytes"
version = "1.10.1" version = "1.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d71b6127be86fdcfddb610f7182ac57211d4b18a3e9c82eb2d17662f2227ad6a" checksum = "b35204fbdc0b3f4446b89fc1ac2cf84a8a68971995d0bf2e925ec7cd960f9cb3"
[[package]] [[package]]
name = "cassowary" name = "cassowary"
@@ -156,9 +165,9 @@ checksum = "df8670b8c7b9dae1793364eafadf7239c40d669904660c5960d74cfd80b46a53"
[[package]] [[package]]
name = "cc" name = "cc"
version = "1.2.41" version = "1.2.46"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac9fe6cdbb24b6ade63616c0a0688e45bb56732262c158df3c0c4bea4ca47cb7" checksum = "b97463e1064cb1b1c1384ad0a0b9c8abd0988e2a91f52606c80ef14aadb63e36"
dependencies = [ dependencies = [
"find-msvc-tools", "find-msvc-tools",
"jobserver", "jobserver",
@@ -230,9 +239,9 @@ dependencies = [
[[package]] [[package]]
name = "clap" name = "clap"
version = "4.5.49" version = "4.5.52"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4512b90fa68d3a9932cea5184017c5d200f5921df706d45e853537dea51508f" checksum = "aa8120877db0e5c011242f96806ce3c94e0737ab8108532a76a3300a01db2ab8"
dependencies = [ dependencies = [
"clap_builder", "clap_builder",
"clap_derive", "clap_derive",
@@ -240,9 +249,9 @@ dependencies = [
[[package]] [[package]]
name = "clap_builder" name = "clap_builder"
version = "4.5.49" version = "4.5.52"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0025e98baa12e766c67ba13ff4695a887a1eba19569aad00a472546795bd6730" checksum = "02576b399397b659c26064fbc92a75fede9d18ffd5f80ca1cd74ddab167016e1"
dependencies = [ dependencies = [
"anstream", "anstream",
"anstyle", "anstyle",
@@ -270,7 +279,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]] [[package]]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.82" version = "0.1.130"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"chrono", "chrono",
@@ -292,7 +301,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.82" version = "0.1.130"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"async-trait", "async-trait",
@@ -315,7 +324,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.82" version = "0.1.130"
dependencies = [ dependencies = [
"chrono", "chrono",
"serde", "serde",
@@ -503,9 +512,9 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
[[package]] [[package]]
name = "find-msvc-tools" name = "find-msvc-tools"
version = "0.1.4" version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52051878f80a721bb68ebfbc930e07b65ba72f2da88968ea5c06fd6ca3d3a127" checksum = "3a3076410a55c90011c298b04d0cfa770b00fa04e1e3c97d3f6c9de105a03844"
[[package]] [[package]]
name = "fnv" name = "fnv"
@@ -768,9 +777,9 @@ dependencies = [
[[package]] [[package]]
name = "icu_collections" name = "icu_collections"
version = "2.0.0" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "200072f5d0e3614556f94a9930d5dc3e0662a652823904c3a75dc3b0af7fee47" checksum = "4c6b649701667bbe825c3b7e6388cb521c23d88644678e83c0c4d0a621a34b43"
dependencies = [ dependencies = [
"displaydoc", "displaydoc",
"potential_utf", "potential_utf",
@@ -781,9 +790,9 @@ dependencies = [
[[package]] [[package]]
name = "icu_locale_core" name = "icu_locale_core"
version = "2.0.0" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0cde2700ccaed3872079a65fb1a78f6c0a36c91570f28755dda67bc8f7d9f00a" checksum = "edba7861004dd3714265b4db54a3c390e880ab658fec5f7db895fae2046b5bb6"
dependencies = [ dependencies = [
"displaydoc", "displaydoc",
"litemap", "litemap",
@@ -794,11 +803,10 @@ dependencies = [
[[package]] [[package]]
name = "icu_normalizer" name = "icu_normalizer"
version = "2.0.0" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "436880e8e18df4d7bbc06d58432329d6458cc84531f7ac5f024e93deadb37979" checksum = "5f6c8828b67bf8908d82127b2054ea1b4427ff0230ee9141c54251934ab1b599"
dependencies = [ dependencies = [
"displaydoc",
"icu_collections", "icu_collections",
"icu_normalizer_data", "icu_normalizer_data",
"icu_properties", "icu_properties",
@@ -809,42 +817,38 @@ dependencies = [
[[package]] [[package]]
name = "icu_normalizer_data" name = "icu_normalizer_data"
version = "2.0.0" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "00210d6893afc98edb752b664b8890f0ef174c8adbb8d0be9710fa66fbbf72d3" checksum = "7aedcccd01fc5fe81e6b489c15b247b8b0690feb23304303a9e560f37efc560a"
[[package]] [[package]]
name = "icu_properties" name = "icu_properties"
version = "2.0.1" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "016c619c1eeb94efb86809b015c58f479963de65bdb6253345c1a1276f22e32b" checksum = "e93fcd3157766c0c8da2f8cff6ce651a31f0810eaa1c51ec363ef790bbb5fb99"
dependencies = [ dependencies = [
"displaydoc",
"icu_collections", "icu_collections",
"icu_locale_core", "icu_locale_core",
"icu_properties_data", "icu_properties_data",
"icu_provider", "icu_provider",
"potential_utf",
"zerotrie", "zerotrie",
"zerovec", "zerovec",
] ]
[[package]] [[package]]
name = "icu_properties_data" name = "icu_properties_data"
version = "2.0.1" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "298459143998310acd25ffe6810ed544932242d3f07083eee1084d83a71bd632" checksum = "02845b3647bb045f1100ecd6480ff52f34c35f82d9880e029d329c21d1054899"
[[package]] [[package]]
name = "icu_provider" name = "icu_provider"
version = "2.0.0" version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "03c80da27b5f4187909049ee2d72f276f0d9f99a42c306bd0131ecfe04d8e5af" checksum = "85962cf0ce02e1e0a629cc34e7ca3e373ce20dda4c4d7294bbd0bf1fdb59e614"
dependencies = [ dependencies = [
"displaydoc", "displaydoc",
"icu_locale_core", "icu_locale_core",
"stable_deref_trait",
"tinystr",
"writeable", "writeable",
"yoke", "yoke",
"zerofrom", "zerofrom",
@@ -885,9 +889,12 @@ dependencies = [
[[package]] [[package]]
name = "indoc" name = "indoc"
version = "2.0.6" version = "2.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4c7245a08504955605670dbf141fceab975f15ca21570696aebe9d2e71576bd" checksum = "79cf5c93f93228cf8efb3ba362535fb11199ac548a09ce117c9b1adc3030d706"
dependencies = [
"rustversion",
]
[[package]] [[package]]
name = "ipnet" name = "ipnet"
@@ -897,9 +904,9 @@ checksum = "469fb0b9cefa57e3ef31275ee7cacb78f2fdca44e4765491884a2b119d4eb130"
[[package]] [[package]]
name = "is_terminal_polyfill" name = "is_terminal_polyfill"
version = "1.70.1" version = "1.70.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7943c866cc5cd64cbc25b2e01621d07fa8eb2a1a23160ee81ce38704e97b8ecf" checksum = "a6cb138bb79a146c1bd460005623e142ef0181e3d0219cb493e02f7d08a35695"
[[package]] [[package]]
name = "itertools" name = "itertools"
@@ -928,9 +935,9 @@ dependencies = [
[[package]] [[package]]
name = "js-sys" name = "js-sys"
version = "0.3.81" version = "0.3.82"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ec48937a97411dcb524a265206ccd4c90bb711fca92b2792c407f268825b9305" checksum = "b011eec8cc36da2aab2d5cff675ec18454fad408585853910a202391cf9f8e65"
dependencies = [ dependencies = [
"once_cell", "once_cell",
"wasm-bindgen", "wasm-bindgen",
@@ -988,9 +995,9 @@ checksum = "df1d3c3b53da64cf5760482273a98e575c651a67eec7f77df96b5b642de8f039"
[[package]] [[package]]
name = "litemap" name = "litemap"
version = "0.8.0" version = "0.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "241eaef5fd12c88705a01fc1066c48c4b36e0dd4377dcdc7ec3942cea7a69956" checksum = "6373607a59f0be73a39b6fe456b8192fcc3585f602af20751600e974dd455e77"
[[package]] [[package]]
name = "lock_api" name = "lock_api"
@@ -1104,6 +1111,15 @@ dependencies = [
"autocfg", "autocfg",
] ]
[[package]]
name = "object"
version = "0.32.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a6a622008b6e321afc04970976f62ee297fdbaa6f95318ca343e3eebb9648441"
dependencies = [
"memchr",
]
[[package]] [[package]]
name = "once_cell" name = "once_cell"
version = "1.21.3" version = "1.21.3"
@@ -1112,15 +1128,15 @@ checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d"
[[package]] [[package]]
name = "once_cell_polyfill" name = "once_cell_polyfill"
version = "1.70.1" version = "1.70.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a4895175b425cb1f87721b59f0f286c2092bd4af812243672510e1ac53e2e0ad" checksum = "384b8ab6d37215f3c5301a95a4accb5d64aa607f1fcb26a11b5303878451b4fe"
[[package]] [[package]]
name = "openssl" name = "openssl"
version = "0.10.74" version = "0.10.75"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "24ad14dd45412269e1a30f52ad8f0664f0f4f4a89ee8fe28c3b3527021ebb654" checksum = "08838db121398ad17ab8531ce9de97b244589089e290a384c900cb9ff7434328"
dependencies = [ dependencies = [
"bitflags 2.10.0", "bitflags 2.10.0",
"cfg-if", "cfg-if",
@@ -1150,9 +1166,9 @@ checksum = "d05e27ee213611ffe7d6348b942e8f942b37114c00cc03cec254295a4a17852e"
[[package]] [[package]]
name = "openssl-sys" name = "openssl-sys"
version = "0.9.110" version = "0.9.111"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0a9f0075ba3c21b09f8e8b2026584b1d18d49388648f2fbbf3c97ea8deced8e2" checksum = "82cab2d520aa75e3c58898289429321eb788c3106963d0dc886ec7a5f4adc321"
dependencies = [ dependencies = [
"cc", "cc",
"libc", "libc",
@@ -1262,36 +1278,37 @@ checksum = "7edddbd0b52d732b21ad9a5fab5c704c14cd949e5e9a1ec5929a24fded1b904c"
[[package]] [[package]]
name = "potential_utf" name = "potential_utf"
version = "0.1.3" version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "84df19adbe5b5a0782edcab45899906947ab039ccf4573713735ee7de1e6b08a" checksum = "b73949432f5e2a09657003c25bca5e19a0e9c84f8058ca374f49e0ebe605af77"
dependencies = [ dependencies = [
"zerovec", "zerovec",
] ]
[[package]] [[package]]
name = "proc-macro2" name = "proc-macro2"
version = "1.0.101" version = "1.0.103"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "89ae43fd86e4158d6db51ad8e2b80f313af9cc74f5c0e03ccb87de09998732de" checksum = "5ee95bc4ef87b8d5ba32e8b7714ccc834865276eab0aed5c9958d00ec45f49e8"
dependencies = [ dependencies = [
"unicode-ident", "unicode-ident",
] ]
[[package]] [[package]]
name = "psm" name = "psm"
version = "0.1.27" version = "0.1.28"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e66fcd288453b748497d8fb18bccc83a16b0518e3906d4b8df0a8d42d93dbb1c" checksum = "d11f2fedc3b7dafdc2851bc52f277377c5473d378859be234bc7ebb593144d01"
dependencies = [ dependencies = [
"ar_archive_writer",
"cc", "cc",
] ]
[[package]] [[package]]
name = "quote" name = "quote"
version = "1.0.41" version = "1.0.42"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ce25767e7b499d1b604768e7cde645d14cc8584231ea6b295e9c9eb22c02e1d1" checksum = "a338cc41d27e6cc6dce6cefc13a0729dfbb81c262b1f519331575dd80ef3067f"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
] ]
@@ -1611,9 +1628,9 @@ dependencies = [
[[package]] [[package]]
name = "signal-hook-mio" name = "signal-hook-mio"
version = "0.2.4" version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "34db1a06d485c9142248b7a054f034b349b212551f3dfd19c94d45a754a217cd" checksum = "b75a19a7a740b25bc7944bdee6172368f988763b744e3d4dfe753f6b4ece40cc"
dependencies = [ dependencies = [
"libc", "libc",
"mio 0.8.11", "mio 0.8.11",
@@ -1716,9 +1733,9 @@ dependencies = [
[[package]] [[package]]
name = "syn" name = "syn"
version = "2.0.107" version = "2.0.110"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2a26dbd934e5451d21ef060c018dae56fc073894c5a7896f882928a76e6d081b" checksum = "a99801b5bd34ede4cf3fc688c5919368fea4e4814a4664359503e6015b280aea"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
@@ -1826,9 +1843,9 @@ dependencies = [
[[package]] [[package]]
name = "tinystr" name = "tinystr"
version = "0.8.1" version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d4f6d1145dcb577acf783d4e601bc1d76a13337bb54e6233add580b07344c8b" checksum = "42d3e9c45c09de15d06dd8acf5f4e0e399e85927b7f00711024eb7ae10fa4869"
dependencies = [ dependencies = [
"displaydoc", "displaydoc",
"zerovec", "zerovec",
@@ -1874,9 +1891,9 @@ dependencies = [
[[package]] [[package]]
name = "tokio-util" name = "tokio-util"
version = "0.7.16" version = "0.7.17"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "14307c986784f72ef81c89db7d9e28d6ac26d16213b109ea501696195e6e3ce5" checksum = "2efa149fe76073d6e8fd97ef4f4eca7b67f599660115591483572e406e165594"
dependencies = [ dependencies = [
"bytes", "bytes",
"futures-core", "futures-core",
@@ -2001,9 +2018,9 @@ checksum = "e421abadd41a4225275504ea4d6566923418b7f05506fbc9c0fe86ba7396114b"
[[package]] [[package]]
name = "unicode-ident" name = "unicode-ident"
version = "1.0.19" version = "1.0.22"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f63a545481291138910575129486daeaf8ac54aee4387fe7906919f7830c7d9d" checksum = "9312f7c4f6ff9069b165498234ce8be658059c6728633667c526e27dc2cf1df5"
[[package]] [[package]]
name = "unicode-segmentation" name = "unicode-segmentation"
@@ -2055,9 +2072,9 @@ checksum = "accd4ea62f7bb7a82fe23066fb0957d48ef677f6eeb8215f372f52e48bb32426"
[[package]] [[package]]
name = "version-compare" name = "version-compare"
version = "0.2.0" version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "852e951cb7832cb45cb1169900d19760cfa39b82bc0ea9c0e5a14ae88411c98b" checksum = "03c2856837ef78f57382f06b2b8563a2f512f7185d732608fd9176cb3b8edf0e"
[[package]] [[package]]
name = "version_check" name = "version_check"
@@ -2107,9 +2124,9 @@ dependencies = [
[[package]] [[package]]
name = "wasm-bindgen" name = "wasm-bindgen"
version = "0.2.104" version = "0.2.105"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c1da10c01ae9f1ae40cbfac0bac3b1e724b320abfcf52229f80b547c0d250e2d" checksum = "da95793dfc411fbbd93f5be7715b0578ec61fe87cb1a42b12eb625caa5c5ea60"
dependencies = [ dependencies = [
"cfg-if", "cfg-if",
"once_cell", "once_cell",
@@ -2118,25 +2135,11 @@ dependencies = [
"wasm-bindgen-shared", "wasm-bindgen-shared",
] ]
[[package]]
name = "wasm-bindgen-backend"
version = "0.2.104"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "671c9a5a66f49d8a47345ab942e2cb93c7d1d0339065d4f8139c486121b43b19"
dependencies = [
"bumpalo",
"log",
"proc-macro2",
"quote",
"syn",
"wasm-bindgen-shared",
]
[[package]] [[package]]
name = "wasm-bindgen-futures" name = "wasm-bindgen-futures"
version = "0.4.54" version = "0.4.55"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7e038d41e478cc73bae0ff9b36c60cff1c98b8f38f8d7e8061e79ee63608ac5c" checksum = "551f88106c6d5e7ccc7cd9a16f312dd3b5d36ea8b4954304657d5dfba115d4a0"
dependencies = [ dependencies = [
"cfg-if", "cfg-if",
"js-sys", "js-sys",
@@ -2147,9 +2150,9 @@ dependencies = [
[[package]] [[package]]
name = "wasm-bindgen-macro" name = "wasm-bindgen-macro"
version = "0.2.104" version = "0.2.105"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7ca60477e4c59f5f2986c50191cd972e3a50d8a95603bc9434501cf156a9a119" checksum = "04264334509e04a7bf8690f2384ef5265f05143a4bff3889ab7a3269adab59c2"
dependencies = [ dependencies = [
"quote", "quote",
"wasm-bindgen-macro-support", "wasm-bindgen-macro-support",
@@ -2157,31 +2160,31 @@ dependencies = [
[[package]] [[package]]
name = "wasm-bindgen-macro-support" name = "wasm-bindgen-macro-support"
version = "0.2.104" version = "0.2.105"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9f07d2f20d4da7b26400c9f4a0511e6e0345b040694e8a75bd41d578fa4421d7" checksum = "420bc339d9f322e562942d52e115d57e950d12d88983a14c79b86859ee6c7ebc"
dependencies = [ dependencies = [
"bumpalo",
"proc-macro2", "proc-macro2",
"quote", "quote",
"syn", "syn",
"wasm-bindgen-backend",
"wasm-bindgen-shared", "wasm-bindgen-shared",
] ]
[[package]] [[package]]
name = "wasm-bindgen-shared" name = "wasm-bindgen-shared"
version = "0.2.104" version = "0.2.105"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bad67dc8b2a1a6e5448428adec4c3e84c43e561d8c9ee8a9e5aabeb193ec41d1" checksum = "76f218a38c84bcb33c25ec7059b07847d465ce0e0a76b995e134a45adcb6af76"
dependencies = [ dependencies = [
"unicode-ident", "unicode-ident",
] ]
[[package]] [[package]]
name = "web-sys" name = "web-sys"
version = "0.3.81" version = "0.3.82"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9367c417a924a74cae129e6a2ae3b47fabb1f8995595ab474029da749a8be120" checksum = "3a1f95c0d03a47f4ae1f7a64643a6bb97465d9b740f0fa8f90ea33915c99a9a1"
dependencies = [ dependencies = [
"js-sys", "js-sys",
"wasm-bindgen", "wasm-bindgen",
@@ -2535,17 +2538,16 @@ checksum = "f17a85883d4e6d00e8a97c586de764dabcc06133f7f1d55dce5cdc070ad7fe59"
[[package]] [[package]]
name = "writeable" name = "writeable"
version = "0.6.1" version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ea2f10b9bb0928dfb1b42b65e1f9e36f7f54dbdf08457afefb38afcdec4fa2bb" checksum = "9edde0db4769d2dc68579893f2306b26c6ecfbe0ef499b013d731b7b9247e0b9"
[[package]] [[package]]
name = "yoke" name = "yoke"
version = "0.8.0" version = "0.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f41bb01b8226ef4bfd589436a297c53d118f65921786300e427be8d487695cc" checksum = "72d6e5c6afb84d73944e5cedb052c4680d5657337201555f9f2a16b7406d4954"
dependencies = [ dependencies = [
"serde",
"stable_deref_trait", "stable_deref_trait",
"yoke-derive", "yoke-derive",
"zerofrom", "zerofrom",
@@ -2553,9 +2555,9 @@ dependencies = [
[[package]] [[package]]
name = "yoke-derive" name = "yoke-derive"
version = "0.8.0" version = "0.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "38da3c9736e16c5d3c8c597a9aaa5d1fa565d0532ae05e27c24aa62fb32c0ab6" checksum = "b659052874eb698efe5b9e8cf382204678a0086ebf46982b79d6ca3182927e5d"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
@@ -2616,9 +2618,9 @@ dependencies = [
[[package]] [[package]]
name = "zerotrie" name = "zerotrie"
version = "0.2.2" version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "36f0bbd478583f79edad978b407914f61b2972f5af6fa089686016be8f9af595" checksum = "2a59c17a5562d507e4b54960e8569ebee33bee890c70aa3fe7b97e85a9fd7851"
dependencies = [ dependencies = [
"displaydoc", "displaydoc",
"yoke", "yoke",
@@ -2627,9 +2629,9 @@ dependencies = [
[[package]] [[package]]
name = "zerovec" name = "zerovec"
version = "0.11.4" version = "0.11.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e7aa2bd55086f1ab526693ecbe444205da57e25f4489879da80635a46d90e73b" checksum = "6c28719294829477f525be0186d13efa9a3c602f7ec202ca9e353d310fb9a002"
dependencies = [ dependencies = [
"yoke", "yoke",
"zerofrom", "zerofrom",
@@ -2638,9 +2640,9 @@ dependencies = [
[[package]] [[package]]
name = "zerovec-derive" name = "zerovec-derive"
version = "0.11.1" version = "0.11.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b96237efa0c878c64bd89c436f661be4e46b2f3eff1ebb976f7ef2321d2f58f" checksum = "eadce39539ca5cb3985590102671f2567e659fca9666581ad3411d59207951f3"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.83" version = "0.1.131"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -10,7 +10,7 @@ use crate::metrics::MetricCollectionManager;
use crate::notifications::NotificationManager; use crate::notifications::NotificationManager;
use crate::service_tracker::UserStoppedServiceTracker; use crate::service_tracker::UserStoppedServiceTracker;
use crate::status::HostStatusManager; use crate::status::HostStatusManager;
use cm_dashboard_shared::{Metric, MetricMessage, MetricValue, Status}; use cm_dashboard_shared::{AgentData, Metric, MetricValue, Status, TmpfsData, DriveData, FilesystemData, ServiceData};
pub struct Agent { pub struct Agent {
hostname: String, hostname: String,
@@ -199,16 +199,310 @@ impl Agent {
return Ok(()); return Ok(());
} }
debug!("Broadcasting {} cached metrics (including host status summary)", metrics.len()); debug!("Broadcasting {} cached metrics as structured data", metrics.len());
// Create and send message with all current data // Convert metrics to structured data and send
let message = MetricMessage::new(self.hostname.clone(), metrics); let agent_data = self.metrics_to_structured_data(&metrics)?;
self.zmq_handler.publish_metrics(&message).await?; self.zmq_handler.publish_agent_data(&agent_data).await?;
debug!("Metrics broadcasted successfully"); debug!("Structured data broadcasted successfully");
Ok(()) Ok(())
} }
/// Convert legacy metrics to structured data format
fn metrics_to_structured_data(&self, metrics: &[Metric]) -> Result<AgentData> {
let mut agent_data = AgentData::new(self.hostname.clone(), self.get_agent_version());
// Parse metrics into structured data
for metric in metrics {
self.parse_metric_into_agent_data(&mut agent_data, metric)?;
}
Ok(agent_data)
}
/// Parse a single metric into the appropriate structured data field
fn parse_metric_into_agent_data(&self, agent_data: &mut AgentData, metric: &Metric) -> Result<()> {
// CPU metrics
if metric.name == "cpu_load_1min" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.load_1min = value;
}
} else if metric.name == "cpu_load_5min" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.load_5min = value;
}
} else if metric.name == "cpu_load_15min" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.load_15min = value;
}
} else if metric.name == "cpu_frequency_mhz" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.frequency_mhz = value;
}
} else if metric.name == "cpu_temperature_celsius" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.temperature_celsius = Some(value);
}
}
// Memory metrics
else if metric.name == "memory_usage_percent" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.usage_percent = value;
}
} else if metric.name == "memory_total_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.total_gb = value;
}
} else if metric.name == "memory_used_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.used_gb = value;
}
} else if metric.name == "memory_available_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.available_gb = value;
}
} else if metric.name == "memory_swap_total_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.swap_total_gb = value;
}
} else if metric.name == "memory_swap_used_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.swap_used_gb = value;
}
}
// Tmpfs metrics
else if metric.name.starts_with("memory_tmp_") {
// For now, create a single /tmp tmpfs entry
if metric.name == "memory_tmp_usage_percent" {
if let Some(value) = metric.value.as_f32() {
if let Some(tmpfs) = agent_data.system.memory.tmpfs.get_mut(0) {
tmpfs.usage_percent = value;
} else {
agent_data.system.memory.tmpfs.push(TmpfsData {
mount: "/tmp".to_string(),
usage_percent: value,
used_gb: 0.0,
total_gb: 0.0,
});
}
}
} else if metric.name == "memory_tmp_used_gb" {
if let Some(value) = metric.value.as_f32() {
if let Some(tmpfs) = agent_data.system.memory.tmpfs.get_mut(0) {
tmpfs.used_gb = value;
} else {
agent_data.system.memory.tmpfs.push(TmpfsData {
mount: "/tmp".to_string(),
usage_percent: 0.0,
used_gb: value,
total_gb: 0.0,
});
}
}
} else if metric.name == "memory_tmp_total_gb" {
if let Some(value) = metric.value.as_f32() {
if let Some(tmpfs) = agent_data.system.memory.tmpfs.get_mut(0) {
tmpfs.total_gb = value;
} else {
agent_data.system.memory.tmpfs.push(TmpfsData {
mount: "/tmp".to_string(),
usage_percent: 0.0,
used_gb: 0.0,
total_gb: value,
});
}
}
}
}
// Storage metrics
else if metric.name.starts_with("disk_") {
if metric.name.contains("_temperature") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
if let Some(temp) = metric.value.as_f32() {
self.ensure_drive_exists(agent_data, &drive_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == drive_name) {
drive.temperature_celsius = Some(temp);
}
}
}
} else if metric.name.contains("_wear_percent") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
if let Some(wear) = metric.value.as_f32() {
self.ensure_drive_exists(agent_data, &drive_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == drive_name) {
drive.wear_percent = Some(wear);
}
}
}
} else if metric.name.contains("_health") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
let health = metric.value.as_string();
self.ensure_drive_exists(agent_data, &drive_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == drive_name) {
drive.health = health;
}
}
} else if metric.name.contains("_fs_") {
// Filesystem metrics: disk_{pool}_fs_{filesystem}_{metric}
if let Some((pool_name, fs_name)) = self.extract_pool_and_filesystem(&metric.name) {
if metric.name.contains("_usage_percent") {
if let Some(usage) = metric.value.as_f32() {
self.ensure_filesystem_exists(agent_data, &pool_name, &fs_name, usage, 0.0, 0.0);
}
} else if metric.name.contains("_used_gb") {
if let Some(used) = metric.value.as_f32() {
self.update_filesystem_field(agent_data, &pool_name, &fs_name, |fs| fs.used_gb = used);
}
} else if metric.name.contains("_total_gb") {
if let Some(total) = metric.value.as_f32() {
self.update_filesystem_field(agent_data, &pool_name, &fs_name, |fs| fs.total_gb = total);
}
}
}
}
}
// Service metrics
else if metric.name.starts_with("service_") {
if let Some(service_name) = self.extract_service_name(&metric.name) {
if metric.name.contains("_status") {
let status = metric.value.as_string();
self.ensure_service_exists(agent_data, &service_name, &status);
} else if metric.name.contains("_memory_mb") {
if let Some(memory) = metric.value.as_f32() {
self.update_service_field(agent_data, &service_name, |svc| svc.memory_mb = memory);
}
} else if metric.name.contains("_disk_gb") {
if let Some(disk) = metric.value.as_f32() {
self.update_service_field(agent_data, &service_name, |svc| svc.disk_gb = disk);
}
}
}
}
// Backup metrics
else if metric.name.starts_with("backup_") {
if metric.name == "backup_status" {
agent_data.backup.status = metric.value.as_string();
} else if metric.name == "backup_last_run_timestamp" {
if let Some(timestamp) = metric.value.as_i64() {
agent_data.backup.last_run = Some(timestamp as u64);
}
} else if metric.name == "backup_next_scheduled_timestamp" {
if let Some(timestamp) = metric.value.as_i64() {
agent_data.backup.next_scheduled = Some(timestamp as u64);
}
} else if metric.name == "backup_size_gb" {
if let Some(size) = metric.value.as_f32() {
agent_data.backup.total_size_gb = Some(size);
}
} else if metric.name == "backup_repository_health" {
agent_data.backup.repository_health = Some(metric.value.as_string());
}
}
Ok(())
}
/// Extract drive name from metric like "disk_nvme0n1_temperature"
fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
if metric_name.starts_with("disk_") {
let suffixes = ["_temperature", "_wear_percent", "_health"];
for suffix in suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
}
}
None
}
/// Extract pool and filesystem from "disk_{pool}_fs_{filesystem}_{metric}"
fn extract_pool_and_filesystem(&self, metric_name: &str) -> Option<(String, String)> {
if let Some(fs_pos) = metric_name.find("_fs_") {
let pool_name = metric_name[5..fs_pos].to_string(); // Skip "disk_"
let after_fs = &metric_name[fs_pos + 4..]; // Skip "_fs_"
if let Some(metric_pos) = after_fs.find('_') {
let fs_name = after_fs[..metric_pos].to_string();
return Some((pool_name, fs_name));
}
}
None
}
/// Extract service name from "service_{name}_{metric}"
fn extract_service_name(&self, metric_name: &str) -> Option<String> {
if metric_name.starts_with("service_") {
let suffixes = ["_status", "_memory_mb", "_disk_gb"];
for suffix in suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
return Some(metric_name[8..suffix_pos].to_string()); // Skip "service_"
}
}
}
None
}
/// Ensure drive exists in agent_data
fn ensure_drive_exists(&self, agent_data: &mut AgentData, drive_name: &str) {
if !agent_data.system.storage.drives.iter().any(|d| d.name == drive_name) {
agent_data.system.storage.drives.push(DriveData {
name: drive_name.to_string(),
health: "UNKNOWN".to_string(),
temperature_celsius: None,
wear_percent: None,
filesystems: Vec::new(),
});
}
}
/// Ensure filesystem exists in the correct drive
fn ensure_filesystem_exists(&self, agent_data: &mut AgentData, pool_name: &str, fs_name: &str, usage_percent: f32, used_gb: f32, total_gb: f32) {
self.ensure_drive_exists(agent_data, pool_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == pool_name) {
if !drive.filesystems.iter().any(|fs| fs.mount == fs_name) {
drive.filesystems.push(FilesystemData {
mount: fs_name.to_string(),
usage_percent,
used_gb,
total_gb,
});
}
}
}
/// Update filesystem field
fn update_filesystem_field<F>(&self, agent_data: &mut AgentData, pool_name: &str, fs_name: &str, update_fn: F)
where F: FnOnce(&mut FilesystemData) {
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == pool_name) {
if let Some(fs) = drive.filesystems.iter_mut().find(|fs| fs.mount == fs_name) {
update_fn(fs);
}
}
}
/// Ensure service exists
fn ensure_service_exists(&self, agent_data: &mut AgentData, service_name: &str, status: &str) {
if !agent_data.services.iter().any(|s| s.name == service_name) {
agent_data.services.push(ServiceData {
name: service_name.to_string(),
status: status.to_string(),
memory_mb: 0.0,
disk_gb: 0.0,
user_stopped: false, // TODO: Get from service tracker
});
} else if let Some(service) = agent_data.services.iter_mut().find(|s| s.name == service_name) {
service.status = status.to_string();
}
}
/// Update service field
fn update_service_field<F>(&self, agent_data: &mut AgentData, service_name: &str, update_fn: F)
where F: FnOnce(&mut ServiceData) {
if let Some(service) = agent_data.services.iter_mut().find(|s| s.name == service_name) {
update_fn(service);
}
}
async fn process_metrics(&mut self, metrics: &[Metric]) -> bool { async fn process_metrics(&mut self, metrics: &[Metric]) -> bool {
let mut status_changed = false; let mut status_changed = false;
for metric in metrics { for metric in metrics {
@@ -261,13 +555,11 @@ impl Agent {
/// Send standalone heartbeat for connectivity detection /// Send standalone heartbeat for connectivity detection
async fn send_heartbeat(&mut self) -> Result<()> { async fn send_heartbeat(&mut self) -> Result<()> {
let heartbeat_metric = self.get_heartbeat_metric(); // Create minimal agent data with just heartbeat
let message = MetricMessage::new( let agent_data = AgentData::new(self.hostname.clone(), self.get_agent_version());
self.hostname.clone(), // Heartbeat timestamp is already set in AgentData::new()
vec![heartbeat_metric],
); self.zmq_handler.publish_agent_data(&agent_data).await?;
self.zmq_handler.publish_metrics(&message).await?;
debug!("Sent standalone heartbeat for connectivity detection"); debug!("Sent standalone heartbeat for connectivity detection");
Ok(()) Ok(())
} }

View File

@@ -25,6 +25,25 @@ impl BackupCollector {
} }
async fn read_backup_status(&self) -> Result<Option<BackupStatusToml>, CollectorError> { async fn read_backup_status(&self) -> Result<Option<BackupStatusToml>, CollectorError> {
// Check if we're in maintenance mode
if std::fs::metadata("/tmp/cm-maintenance").is_ok() {
// Return special maintenance mode status
let maintenance_status = BackupStatusToml {
backup_name: "maintenance".to_string(),
start_time: chrono::Utc::now().format("%Y-%m-%d %H:%M:%S UTC").to_string(),
current_time: chrono::Utc::now().format("%Y-%m-%d %H:%M:%S UTC").to_string(),
duration_seconds: 0,
status: "pending".to_string(),
last_updated: chrono::Utc::now().format("%Y-%m-%d %H:%M:%S UTC").to_string(),
disk_space: None,
disk_product_name: None,
disk_serial_number: None,
disk_wear_percent: None,
services: HashMap::new(),
};
return Ok(Some(maintenance_status));
}
// Check if backup status file exists // Check if backup status file exists
if !std::path::Path::new(&self.backup_status_file).exists() { if !std::path::Path::new(&self.backup_status_file).exists() {
return Ok(None); // File doesn't exist, but this is not an error return Ok(None); // File doesn't exist, but this is not an error
@@ -79,7 +98,9 @@ impl BackupCollector {
} }
} }
"failed" => Status::Critical, "failed" => Status::Critical,
"warning" => Status::Warning, // Backup completed with warnings
"running" => Status::Ok, // Currently running is OK "running" => Status::Ok, // Currently running is OK
"pending" => Status::Pending, // Maintenance mode or backup starting
_ => Status::Unknown, _ => Status::Unknown,
} }
} }
@@ -379,6 +400,25 @@ impl Collector for BackupCollector {
}); });
} }
if let Some(wear_percent) = backup_status.disk_wear_percent {
let wear_status = if wear_percent >= 90.0 {
Status::Critical
} else if wear_percent >= 75.0 {
Status::Warning
} else {
Status::Ok
};
metrics.push(Metric {
name: "backup_disk_wear_percent".to_string(),
value: MetricValue::Float(wear_percent),
status: wear_status,
timestamp,
description: Some("Backup disk wear percentage from SMART data".to_string()),
unit: Some("percent".to_string()),
});
}
// Count services by status // Count services by status
let mut status_counts = HashMap::new(); let mut status_counts = HashMap::new();
for service in backup_status.services.values() { for service in backup_status.services.values() {
@@ -412,6 +452,7 @@ pub struct BackupStatusToml {
pub disk_space: Option<DiskSpace>, pub disk_space: Option<DiskSpace>,
pub disk_product_name: Option<String>, pub disk_product_name: Option<String>,
pub disk_serial_number: Option<String>, pub disk_serial_number: Option<String>,
pub disk_wear_percent: Option<f32>,
pub services: HashMap<String, ServiceStatus>, pub services: HashMap<String, ServiceStatus>,
} }

View File

@@ -5,353 +5,159 @@ use cm_dashboard_shared::{Metric, MetricValue, Status, StatusTracker, Hysteresis
use crate::config::DiskConfig; use crate::config::DiskConfig;
use std::process::Command; use std::process::Command;
use std::time::Instant; use std::time::Instant;
use std::collections::HashMap;
use tracing::debug; use tracing::debug;
use super::{Collector, CollectorError}; use super::{Collector, CollectorError};
/// Information about a storage pool (mount point with underlying drives) /// Storage collector with clean architecture
#[derive(Debug, Clone)]
struct StoragePool {
name: String, // e.g., "steampool", "root"
mount_point: String, // e.g., "/mnt/steampool", "/"
filesystem: String, // e.g., "mergerfs", "ext4", "zfs", "btrfs"
storage_type: String, // e.g., "mergerfs", "single", "raid", "zfs"
size: String, // e.g., "2.5TB"
used: String, // e.g., "2.1TB"
available: String, // e.g., "400GB"
usage_percent: f32, // e.g., 85.0
underlying_drives: Vec<DriveInfo>, // Individual physical drives
}
/// Information about an individual physical drive
#[derive(Debug, Clone)]
struct DriveInfo {
device: String, // e.g., "sda", "nvme0n1"
health_status: String, // e.g., "PASSED", "FAILED"
temperature: Option<f32>, // e.g., 45.0°C
wear_level: Option<f32>, // e.g., 12.0% (for SSDs)
}
/// Disk usage collector for monitoring filesystem sizes
pub struct DiskCollector { pub struct DiskCollector {
config: DiskConfig, config: DiskConfig,
temperature_thresholds: HysteresisThresholds, temperature_thresholds: HysteresisThresholds,
detected_devices: std::collections::HashMap<String, Vec<String>>, // mount_point -> devices }
/// A physical drive with its filesystems
#[derive(Debug, Clone)]
struct PhysicalDrive {
device: String, // e.g., "nvme0n1", "sda"
filesystems: Vec<Filesystem>, // mounted filesystems on this drive
temperature: Option<f32>, // drive temperature
wear_level: Option<f32>, // SSD wear level
health_status: String, // SMART health
}
/// A mergerfs pool
#[derive(Debug, Clone)]
struct MergerfsPool {
mount_point: String, // e.g., "/srv/media"
total_bytes: u64, // pool total capacity
used_bytes: u64, // pool used space
data_drives: Vec<DriveInfo>, // data member drives
parity_drives: Vec<DriveInfo>, // parity drives
}
/// Individual filesystem on a drive
#[derive(Debug, Clone)]
struct Filesystem {
mount_point: String, // e.g., "/", "/boot"
total_bytes: u64, // filesystem capacity
used_bytes: u64, // filesystem used space
}
/// Drive information for pools
#[derive(Debug, Clone)]
struct DriveInfo {
device: String, // e.g., "sdb", "sdc"
mount_point: String, // e.g., "/mnt/disk1"
temperature: Option<f32>, // drive temperature
wear_level: Option<f32>, // SSD wear level
health_status: String, // SMART health
}
/// Discovered storage topology
#[derive(Debug)]
struct StorageTopology {
physical_drives: Vec<PhysicalDrive>,
mergerfs_pools: Vec<MergerfsPool>,
} }
impl DiskCollector { impl DiskCollector {
pub fn new(config: DiskConfig) -> Self { pub fn new(config: DiskConfig) -> Self {
// Create hysteresis thresholds for disk temperature from config
let temperature_thresholds = HysteresisThresholds::with_custom_gaps( let temperature_thresholds = HysteresisThresholds::with_custom_gaps(
config.temperature_warning_celsius, config.temperature_warning_celsius,
5.0, // 5°C gap for recovery 5.0,
config.temperature_critical_celsius, config.temperature_critical_celsius,
5.0, // 5°C gap for recovery 5.0,
); );
// Detect devices for all configured filesystems at startup
let mut detected_devices = std::collections::HashMap::new();
for fs_config in &config.filesystems {
if fs_config.monitor {
if let Ok(devices) = Self::detect_device_for_mount_point_static(&fs_config.mount_point) {
detected_devices.insert(fs_config.mount_point.clone(), devices);
}
}
}
Self { Self {
config, config,
temperature_thresholds, temperature_thresholds,
detected_devices,
} }
} }
/// Calculate disk temperature status using hysteresis thresholds /// Discover all storage using clean workflow: lsblk → df → group
fn calculate_temperature_status(&self, metric_name: &str, temperature: f32, status_tracker: &mut StatusTracker) -> Status { fn discover_storage(&self) -> Result<StorageTopology> {
status_tracker.calculate_with_hysteresis(metric_name, temperature, &self.temperature_thresholds) debug!("Starting storage discovery");
}
/// Get configured storage pools with individual drive information
fn get_configured_storage_pools(&self) -> Result<Vec<StoragePool>> {
let mut storage_pools = Vec::new();
for fs_config in &self.config.filesystems {
if !fs_config.monitor {
continue;
}
// Get filesystem stats for the mount point
match self.get_filesystem_info(&fs_config.mount_point) {
Ok((total_bytes, used_bytes)) => {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else {
0.0
};
// Convert bytes to human-readable format
let size = self.bytes_to_human_readable(total_bytes);
let used = self.bytes_to_human_readable(used_bytes);
let available = self.bytes_to_human_readable(available_bytes);
// Get individual drive information using pre-detected devices
let device_names = self.detected_devices.get(&fs_config.mount_point).cloned().unwrap_or_default();
let underlying_drives = self.get_drive_info_for_devices(&device_names)?;
storage_pools.push(StoragePool {
name: fs_config.name.clone(),
mount_point: fs_config.mount_point.clone(),
filesystem: fs_config.fs_type.clone(),
storage_type: fs_config.storage_type.clone(),
size,
used,
available,
usage_percent: usage_percent as f32,
underlying_drives,
});
debug!(
"Storage pool '{}' ({}) at {} with {} detected drives",
fs_config.name, fs_config.storage_type, fs_config.mount_point, device_names.len()
);
}
Err(e) => {
debug!(
"Failed to get filesystem info for storage pool '{}': {}",
fs_config.name, e
);
}
}
}
Ok(storage_pools)
}
/// Get drive information for a list of device names
fn get_drive_info_for_devices(&self, device_names: &[String]) -> Result<Vec<DriveInfo>> {
let mut drives = Vec::new();
for device_name in device_names { // Step 1: Get all mount points and their backing devices using lsblk
let device_path = format!("/dev/{}", device_name); let mount_devices = self.get_mount_devices()?;
debug!("Found {} mount points", mount_devices.len());
// Get SMART data for this drive
let (health_status, temperature, wear_level) = self.get_smart_data(&device_path);
drives.push(DriveInfo {
device: device_name.clone(),
health_status: health_status.clone(),
temperature,
wear_level,
});
debug!(
"Drive info for {}: health={}, temp={:?}°C, wear={:?}%",
device_name, health_status, temperature, wear_level
);
}
Ok(drives) // Step 2: Get filesystem usage for each mount point using df
let filesystem_usage = self.get_filesystem_usage(&mount_devices)?;
debug!("Got usage data for {} filesystems", filesystem_usage.len());
// Step 3: Detect mergerfs pools from /proc/mounts
let mergerfs_pools = self.discover_mergerfs_pools()?;
debug!("Found {} mergerfs pools", mergerfs_pools.len());
// Step 4: Group regular filesystems by physical drive
let physical_drives = self.group_by_physical_drive(&mount_devices, &filesystem_usage, &mergerfs_pools)?;
debug!("Grouped into {} physical drives", physical_drives.len());
Ok(StorageTopology {
physical_drives,
mergerfs_pools,
})
} }
/// Get SMART data for a drive (health, temperature, wear level) /// Use lsblk to get mount points and their backing devices
fn get_smart_data(&self, device_path: &str) -> (String, Option<f32>, Option<f32>) { fn get_mount_devices(&self) -> Result<HashMap<String, String>> {
// Try to get SMART data using smartctl
let output = Command::new("sudo")
.arg("smartctl")
.arg("-a")
.arg(device_path)
.output();
match output {
Ok(result) if result.status.success() => {
let stdout = String::from_utf8_lossy(&result.stdout);
// Parse health status
let health = if stdout.contains("PASSED") {
"PASSED".to_string()
} else if stdout.contains("FAILED") {
"FAILED".to_string()
} else {
"UNKNOWN".to_string()
};
// Parse temperature (look for various temperature indicators)
let temperature = self.parse_temperature_from_smart(&stdout);
// Parse wear level (for SSDs)
let wear_level = self.parse_wear_level_from_smart(&stdout);
(health, temperature, wear_level)
}
_ => {
debug!("Failed to get SMART data for {}", device_path);
("UNKNOWN".to_string(), None, None)
}
}
}
/// Parse temperature from SMART output
fn parse_temperature_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
// Look for temperature in various formats
if line.contains("Temperature_Celsius") || line.contains("Temperature") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if let Ok(temp) = parts[9].parse::<f32>() {
return Some(temp);
}
}
}
// NVMe drives might show temperature differently
if line.contains("temperature:") {
if let Some(temp_part) = line.split("temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
}
None
}
/// Parse wear level from SMART output (SSD wear leveling)
/// Supports both NVMe and SATA SSD wear indicators
fn parse_wear_level_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
let line = line.trim();
// NVMe drives - direct percentage used
if line.contains("Percentage Used:") {
if let Some(wear_part) = line.split("Percentage Used:").nth(1) {
if let Some(wear_str) = wear_part.split('%').next() {
if let Ok(wear) = wear_str.trim().parse::<f32>() {
return Some(wear);
}
}
}
}
// SATA SSD attributes - parse SMART table format
// Format: ID ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
// SSD Life Left / Percent Lifetime Remaining (higher = less wear)
if line.contains("SSD_Life_Left") || line.contains("Percent_Lifetime_Remain") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Media Wearout Indicator (lower = more wear, normalize to 0-100)
if line.contains("Media_Wearout_Indicator") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Wear Leveling Count (higher = less wear, but varies by manufacturer)
if line.contains("Wear_Leveling_Count") {
if let Ok(wear_count) = parts[3].parse::<f32>() { // VALUE column
// Most SSDs: 100 = new, decreases with wear
if wear_count <= 100.0 {
return Some(100.0 - wear_count);
}
}
}
// Total LBAs Written - calculate against typical endurance if available
// This is more complex and manufacturer-specific, so we skip for now
}
}
None
}
/// Convert bytes to human-readable format
fn bytes_to_human_readable(&self, bytes: u64) -> String {
const UNITS: &[&str] = &["B", "K", "M", "G", "T"];
let mut size = bytes as f64;
let mut unit_index = 0;
while size >= 1024.0 && unit_index < UNITS.len() - 1 {
size /= 1024.0;
unit_index += 1;
}
if unit_index == 0 {
format!("{:.0}{}", size, UNITS[unit_index])
} else {
format!("{:.1}{}", size, UNITS[unit_index])
}
}
/// Detect device backing a mount point using lsblk (static version for startup)
fn detect_device_for_mount_point_static(mount_point: &str) -> Result<Vec<String>> {
let output = Command::new("lsblk") let output = Command::new("lsblk")
.args(&["-n", "-o", "NAME,MOUNTPOINT"]) .args(&["-n", "-o", "NAME,MOUNTPOINT"])
.output()?; .output()?;
if !output.status.success() { if !output.status.success() {
return Ok(Vec::new()); return Err(anyhow::anyhow!("lsblk command failed"));
} }
let mut mount_devices = HashMap::new();
let output_str = String::from_utf8_lossy(&output.stdout); let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() { for line in output_str.lines() {
let parts: Vec<&str> = line.split_whitespace().collect(); let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 && parts[1] == mount_point { if parts.len() >= 2 {
// Remove tree symbols and extract device name (e.g., "├─nvme0n1p2" -> "nvme0n1p2")
let device_name = parts[0] let device_name = parts[0]
.trim_start_matches('├') .trim_start_matches(&['├', '└', '─', ' '][..]);
.trim_start_matches('└') let mount_point = parts[1];
.trim_start_matches('─')
.trim();
// Extract base device name (e.g., "nvme0n1p2" -> "nvme0n1") // Skip unwanted mount points
if let Some(base_device) = Self::extract_base_device(device_name) { if self.should_skip_mount_point(mount_point) {
return Ok(vec![base_device]); continue;
}
mount_devices.insert(mount_point.to_string(), device_name.to_string());
}
}
Ok(mount_devices)
}
/// Check if we should skip this mount point
fn should_skip_mount_point(&self, mount_point: &str) -> bool {
let skip_prefixes = ["/proc", "/sys", "/dev", "/tmp", "/run"];
skip_prefixes.iter().any(|prefix| mount_point.starts_with(prefix))
}
/// Use df to get filesystem usage for mount points
fn get_filesystem_usage(&self, mount_devices: &HashMap<String, String>) -> Result<HashMap<String, (u64, u64)>> {
let mut filesystem_usage = HashMap::new();
for mount_point in mount_devices.keys() {
match self.get_filesystem_info(mount_point) {
Ok((total, used)) => {
filesystem_usage.insert(mount_point.clone(), (total, used));
}
Err(e) => {
debug!("Failed to get filesystem info for {}: {}", mount_point, e);
} }
} }
} }
Ok(Vec::new()) Ok(filesystem_usage)
} }
/// Extract base device name from partition (e.g., "nvme0n1p2" -> "nvme0n1", "sda1" -> "sda")
fn extract_base_device(device_name: &str) -> Option<String> {
// Handle NVMe devices (nvme0n1p1 -> nvme0n1)
if device_name.starts_with("nvme") {
if let Some(p_pos) = device_name.find('p') {
return Some(device_name[..p_pos].to_string());
}
}
// Handle traditional devices (sda1 -> sda)
if device_name.len() > 1 {
let chars: Vec<char> = device_name.chars().collect();
let mut end_idx = chars.len();
// Find where the device name ends and partition number begins
for (i, &c) in chars.iter().enumerate().rev() {
if !c.is_ascii_digit() {
end_idx = i + 1;
break;
}
}
if end_idx > 0 && end_idx < chars.len() {
return Some(chars[..end_idx].iter().collect());
}
}
// If no partition detected, return as-is
Some(device_name.to_string())
}
/// Get filesystem info using df command /// Get filesystem info using df command
fn get_filesystem_info(&self, path: &str) -> Result<(u64, u64)> { fn get_filesystem_info(&self, path: &str) -> Result<(u64, u64)> {
let output = Command::new("df") let output = Command::new("df")
@@ -381,216 +187,815 @@ impl DiskCollector {
Ok((total_bytes, used_bytes)) Ok((total_bytes, used_bytes))
} }
/// Discover mergerfs pools from /proc/mounts
/// Parse size string (e.g., "120G", "45M") to GB value fn discover_mergerfs_pools(&self) -> Result<Vec<MergerfsPool>> {
fn parse_size_to_gb(&self, size_str: &str) -> f32 { let mounts_content = std::fs::read_to_string("/proc/mounts")?;
let size_str = size_str.trim(); let mut pools = Vec::new();
if size_str.is_empty() || size_str == "-" {
return 0.0; for line in mounts_content.lines() {
} let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 3 && parts[2] == "fuse.mergerfs" {
// Extract numeric part and unit let mount_point = parts[1].to_string();
let (num_str, unit) = if let Some(last_char) = size_str.chars().last() { let device_sources = parts[0]; // e.g., "/mnt/disk1:/mnt/disk2"
if last_char.is_alphabetic() {
let num_part = &size_str[..size_str.len() - 1]; // Get pool usage
let unit_part = &size_str[size_str.len() - 1..]; let (total_bytes, used_bytes) = self.get_filesystem_info(&mount_point)
(num_part, unit_part) .unwrap_or((0, 0));
} else {
(size_str, "") // Parse member paths - handle both full paths and numeric references
let raw_paths: Vec<String> = device_sources
.split(':')
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.collect();
// Convert numeric references to actual mount points if needed
let mut member_paths = if raw_paths.iter().any(|path| !path.starts_with('/')) {
// Handle numeric format like "1:2" by finding corresponding /mnt/disk* paths
self.resolve_numeric_mergerfs_paths(&raw_paths)?
} else {
// Already full paths
raw_paths
};
// For SnapRAID setups, include parity drives that are related to this pool's data drives
let related_parity_paths = self.discover_related_parity_drives(&member_paths)?;
member_paths.extend(related_parity_paths);
// Categorize as data vs parity drives
let (data_drives, parity_drives) = match self.categorize_pool_drives(&member_paths) {
Ok(drives) => drives,
Err(e) => {
debug!("Failed to categorize drives for pool {}: {}. Skipping.", mount_point, e);
continue;
}
};
pools.push(MergerfsPool {
mount_point,
total_bytes,
used_bytes,
data_drives,
parity_drives,
});
} }
} else {
(size_str, "")
};
let number: f32 = num_str.parse().unwrap_or(0.0);
match unit.to_uppercase().as_str() {
"T" | "TB" => number * 1024.0,
"G" | "GB" => number,
"M" | "MB" => number / 1024.0,
"K" | "KB" => number / (1024.0 * 1024.0),
"B" | "" => number / (1024.0 * 1024.0 * 1024.0),
_ => number, // Assume GB if unknown unit
} }
Ok(pools)
}
/// Discover parity drives that are related to the given data drives
fn discover_related_parity_drives(&self, data_drives: &[String]) -> Result<Vec<String>> {
let mount_devices = self.get_mount_devices()?;
let mut related_parity = Vec::new();
// Find parity drives that share the same parent directory as the data drives
for data_path in data_drives {
if let Some(parent_dir) = self.get_parent_directory(data_path) {
// Look for parity drives in the same parent directory
for (mount_point, _device) in &mount_devices {
if mount_point.contains("parity") && mount_point.starts_with(&parent_dir) {
if !related_parity.contains(mount_point) {
related_parity.push(mount_point.clone());
}
}
}
}
}
Ok(related_parity)
}
/// Get parent directory of a mount path (e.g., "/mnt/disk1" -> "/mnt")
fn get_parent_directory(&self, path: &str) -> Option<String> {
if let Some(last_slash) = path.rfind('/') {
if last_slash > 0 {
return Some(path[..last_slash].to_string());
}
}
None
}
/// Categorize pool member drives as data vs parity
fn categorize_pool_drives(&self, member_paths: &[String]) -> Result<(Vec<DriveInfo>, Vec<DriveInfo>)> {
let mut data_drives = Vec::new();
let mut parity_drives = Vec::new();
for path in member_paths {
let drive_info = self.get_drive_info_for_path(path)?;
// Heuristic: if path contains "parity", it's parity
if path.to_lowercase().contains("parity") {
parity_drives.push(drive_info);
} else {
data_drives.push(drive_info);
}
}
Ok((data_drives, parity_drives))
}
/// Get drive information for a mount path
fn get_drive_info_for_path(&self, path: &str) -> Result<DriveInfo> {
// Use lsblk to find the backing device
let output = Command::new("lsblk")
.args(&["-n", "-o", "NAME,MOUNTPOINT"])
.output()?;
let output_str = String::from_utf8_lossy(&output.stdout);
let mut device = String::new();
for line in output_str.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 && parts[1] == path {
device = parts[0]
.trim_start_matches('├')
.trim_start_matches('└')
.trim_start_matches('─')
.trim()
.to_string();
break;
}
}
if device.is_empty() {
return Err(anyhow::anyhow!("Could not find device for path {}", path));
}
// Extract base device name (e.g., "sda1" -> "sda")
let base_device = self.extract_base_device(&device);
// Get SMART data
let (health, temperature, wear) = self.get_smart_data(&format!("/dev/{}", base_device));
Ok(DriveInfo {
device: base_device,
mount_point: path.to_string(),
temperature,
wear_level: wear,
health_status: health,
})
}
/// Resolve numeric mergerfs references like "1:2" to actual mount paths
fn resolve_numeric_mergerfs_paths(&self, numeric_refs: &[String]) -> Result<Vec<String>> {
let mut resolved_paths = Vec::new();
// Get all mount points that look like /mnt/disk* or /mnt/parity*
let mount_devices = self.get_mount_devices()?;
let mut disk_mounts: Vec<String> = mount_devices.keys()
.filter(|path| path.starts_with("/mnt/disk") || path.starts_with("/mnt/parity"))
.cloned()
.collect();
disk_mounts.sort(); // Ensure consistent ordering
for num_ref in numeric_refs {
if let Ok(index) = num_ref.parse::<usize>() {
// Convert 1-based index to 0-based
if index > 0 && index <= disk_mounts.len() {
resolved_paths.push(disk_mounts[index - 1].clone());
}
}
}
// Fallback: if we couldn't resolve, return the original paths
if resolved_paths.is_empty() {
resolved_paths = numeric_refs.to_vec();
}
Ok(resolved_paths)
}
/// Extract base device name from partition (e.g., "nvme0n1p2" -> "nvme0n1", "sda1" -> "sda")
fn extract_base_device(&self, device_name: &str) -> String {
// Handle NVMe devices (nvme0n1p1 -> nvme0n1)
if device_name.starts_with("nvme") {
if let Some(p_pos) = device_name.find('p') {
return device_name[..p_pos].to_string();
}
}
// Handle traditional devices (sda1 -> sda)
if device_name.len() > 1 {
let chars: Vec<char> = device_name.chars().collect();
let mut end_idx = chars.len();
// Find where the device name ends and partition number begins
for (i, &c) in chars.iter().enumerate().rev() {
if !c.is_ascii_digit() {
end_idx = i + 1;
break;
}
}
if end_idx > 0 && end_idx < chars.len() {
return chars[..end_idx].iter().collect();
}
}
// If no partition detected, return as-is
device_name.to_string()
}
/// Group filesystems by physical drive (excluding mergerfs members)
fn group_by_physical_drive(
&self,
mount_devices: &HashMap<String, String>,
filesystem_usage: &HashMap<String, (u64, u64)>,
mergerfs_pools: &[MergerfsPool]
) -> Result<Vec<PhysicalDrive>> {
let mut drive_groups: HashMap<String, Vec<Filesystem>> = HashMap::new();
// Get all mergerfs member paths to exclude them
let mut mergerfs_members = std::collections::HashSet::new();
for pool in mergerfs_pools {
for drive in &pool.data_drives {
mergerfs_members.insert(drive.mount_point.clone());
}
for drive in &pool.parity_drives {
mergerfs_members.insert(drive.mount_point.clone());
}
}
// Group filesystems by base device
for (mount_point, device) in mount_devices {
// Skip mergerfs member mounts
if mergerfs_members.contains(mount_point) {
continue;
}
let base_device = self.extract_base_device(device);
if let Some((total, used)) = filesystem_usage.get(mount_point) {
let filesystem = Filesystem {
mount_point: mount_point.clone(),
total_bytes: *total,
used_bytes: *used,
};
drive_groups.entry(base_device).or_insert_with(Vec::new).push(filesystem);
}
}
// Convert to PhysicalDrive structs with SMART data
let mut physical_drives = Vec::new();
for (device, filesystems) in drive_groups {
let (health, temperature, wear) = self.get_smart_data(&format!("/dev/{}", device));
physical_drives.push(PhysicalDrive {
device,
filesystems,
temperature,
wear_level: wear,
health_status: health,
});
}
Ok(physical_drives)
}
/// Get SMART data for a drive
fn get_smart_data(&self, device_path: &str) -> (String, Option<f32>, Option<f32>) {
let output = Command::new("sudo")
.arg("smartctl")
.arg("-a")
.arg(device_path)
.output();
match output {
Ok(result) if result.status.success() => {
let stdout = String::from_utf8_lossy(&result.stdout);
// Parse health status
let health = if stdout.contains("PASSED") {
"PASSED".to_string()
} else if stdout.contains("FAILED") {
"FAILED".to_string()
} else {
"UNKNOWN".to_string()
};
// Parse temperature and wear level
let temperature = self.parse_temperature_from_smart(&stdout);
let wear_level = self.parse_wear_level_from_smart(&stdout);
(health, temperature, wear_level)
}
_ => {
debug!("Failed to get SMART data for {}", device_path);
("UNKNOWN".to_string(), None, None)
}
}
}
/// Parse temperature from SMART output
fn parse_temperature_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
if line.contains("Temperature_Celsius") || line.contains("Temperature") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if let Ok(temp) = parts[9].parse::<f32>() {
return Some(temp);
}
}
}
// NVMe format: "Temperature:" (capital T)
if line.contains("Temperature:") {
if let Some(temp_part) = line.split("Temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
// Legacy format: "temperature:" (lowercase)
if line.contains("temperature:") {
if let Some(temp_part) = line.split("temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
}
None
}
/// Parse wear level from SMART output
fn parse_wear_level_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
if line.contains("Percentage Used:") {
if let Some(wear_part) = line.split("Percentage Used:").nth(1) {
if let Some(wear_str) = wear_part.split('%').next() {
if let Ok(wear) = wear_str.trim().parse::<f32>() {
return Some(wear);
}
}
}
}
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if line.contains("SSD_Life_Left") || line.contains("Percent_Lifetime_Remain") {
if let Ok(remaining) = parts[3].parse::<f32>() {
return Some(100.0 - remaining);
}
}
if line.contains("Wear_Leveling_Count") {
if let Ok(wear_count) = parts[3].parse::<f32>() {
if wear_count <= 100.0 {
return Some(100.0 - wear_count);
}
}
}
}
}
None
}
/// Calculate temperature status with hysteresis
fn calculate_temperature_status(&self, metric_name: &str, temperature: f32, status_tracker: &mut StatusTracker) -> Status {
status_tracker.calculate_with_hysteresis(metric_name, temperature, &self.temperature_thresholds)
}
/// Convert bytes to human readable format
fn bytes_to_human_readable(&self, bytes: u64) -> String {
const UNITS: &[&str] = &["B", "K", "M", "G", "T"];
let mut size = bytes as f64;
let mut unit_index = 0;
while size >= 1024.0 && unit_index < UNITS.len() - 1 {
size /= 1024.0;
unit_index += 1;
}
if unit_index == 0 {
format!("{:.0}{}", size, UNITS[unit_index])
} else {
format!("{:.1}{}", size, UNITS[unit_index])
}
}
/// Convert bytes to gigabytes
fn bytes_to_gb(&self, bytes: u64) -> f32 {
bytes as f32 / (1024.0 * 1024.0 * 1024.0)
} }
} }
#[async_trait] #[async_trait]
impl Collector for DiskCollector { impl Collector for DiskCollector {
async fn collect(&self, status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> { async fn collect(&self, status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> {
let start_time = Instant::now(); let start_time = Instant::now();
debug!("Collecting storage pool and individual drive metrics"); debug!("Starting clean storage collection");
let mut metrics = Vec::new(); let mut metrics = Vec::new();
let timestamp = chrono::Utc::now().timestamp() as u64;
// Get configured storage pools with individual drive data // Discover storage topology
let storage_pools = match self.get_configured_storage_pools() { let topology = match self.discover_storage() {
Ok(pools) => { Ok(topology) => topology,
debug!("Found {} storage pools", pools.len());
pools
}
Err(e) => { Err(e) => {
debug!("Failed to get storage pools: {}", e); tracing::error!("Storage discovery failed: {}", e);
Vec::new() return Ok(metrics);
} }
}; };
// Generate metrics for each storage pool and its underlying drives // Generate metrics for physical drives
for storage_pool in &storage_pools { for drive in &topology.physical_drives {
let timestamp = chrono::Utc::now().timestamp() as u64; self.generate_physical_drive_metrics(&mut metrics, drive, timestamp, status_tracker);
}
// Storage pool overall metrics // Generate metrics for mergerfs pools
let pool_name = &storage_pool.name; for pool in &topology.mergerfs_pools {
self.generate_mergerfs_pool_metrics(&mut metrics, pool, timestamp, status_tracker);
// Parse size strings to get actual values for calculations }
let size_gb = self.parse_size_to_gb(&storage_pool.size);
let used_gb = self.parse_size_to_gb(&storage_pool.used);
let avail_gb = self.parse_size_to_gb(&storage_pool.available);
// Calculate status based on configured thresholds // Add total storage count
let pool_status = if storage_pool.usage_percent >= self.config.usage_critical_percent { let total_storage = topology.physical_drives.len() + topology.mergerfs_pools.len();
metrics.push(Metric {
name: "disk_count".to_string(),
value: MetricValue::Integer(total_storage as i64),
unit: None,
description: Some(format!("Total storage: {} drives, {} pools", topology.physical_drives.len(), topology.mergerfs_pools.len())),
status: Status::Ok,
timestamp,
});
let collection_time = start_time.elapsed();
debug!("Clean storage collection completed in {:?} with {} metrics", collection_time, metrics.len());
Ok(metrics)
}
}
impl DiskCollector {
/// Generate metrics for a physical drive and its filesystems
fn generate_physical_drive_metrics(
&self,
metrics: &mut Vec<Metric>,
drive: &PhysicalDrive,
timestamp: u64,
status_tracker: &mut StatusTracker
) {
let drive_name = &drive.device;
// Calculate drive totals
let total_capacity: u64 = drive.filesystems.iter().map(|fs| fs.total_bytes).sum();
let total_used: u64 = drive.filesystems.iter().map(|fs| fs.used_bytes).sum();
let total_available = total_capacity.saturating_sub(total_used);
let usage_percent = if total_capacity > 0 {
(total_used as f64 / total_capacity as f64) * 100.0
} else { 0.0 };
// Drive health status
let health_status = if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown };
// Usage status
let usage_status = if usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
let drive_status = if health_status == Status::Critical { Status::Critical } else { usage_status };
// Drive info metrics
metrics.push(Metric {
name: format!("disk_{}_health", drive_name),
value: MetricValue::String(drive.health_status.clone()),
unit: None,
description: Some(format!("{}: {}", drive_name, drive.health_status)),
status: health_status,
timestamp,
});
// Drive temperature
if let Some(temp) = drive.temperature {
let temp_status = self.calculate_temperature_status(
&format!("disk_{}_temperature", drive_name), temp, status_tracker
);
metrics.push(Metric {
name: format!("disk_{}_temperature", drive_name),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive_name, temp)),
status: temp_status,
timestamp,
});
}
// Drive wear level
if let Some(wear) = drive.wear_level {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok };
metrics.push(Metric {
name: format!("disk_{}_wear_percent", drive_name),
value: MetricValue::Float(wear),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive_name, wear)),
status: wear_status,
timestamp,
});
}
// Drive capacity metrics
metrics.push(Metric {
name: format!("disk_{}_total_gb", drive_name),
value: MetricValue::Float(self.bytes_to_gb(total_capacity)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", drive_name, self.bytes_to_human_readable(total_capacity))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", drive_name),
value: MetricValue::Float(self.bytes_to_gb(total_used)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", drive_name, self.bytes_to_human_readable(total_used))),
status: drive_status.clone(),
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_available_gb", drive_name),
value: MetricValue::Float(self.bytes_to_gb(total_available)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", drive_name, self.bytes_to_human_readable(total_available))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", drive_name),
value: MetricValue::Float(usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.1}%", drive_name, usage_percent)),
status: drive_status,
timestamp,
});
// Pool type indicator
metrics.push(Metric {
name: format!("disk_{}_pool_type", drive_name),
value: MetricValue::String(format!("drive ({})", drive.filesystems.len())),
unit: None,
description: Some(format!("Type: physical drive")),
status: Status::Ok,
timestamp,
});
// Individual filesystem metrics
for filesystem in &drive.filesystems {
let fs_name = if filesystem.mount_point == "/" {
"root".to_string()
} else {
filesystem.mount_point.trim_start_matches('/').replace('/', "_")
};
let fs_usage_percent = if filesystem.total_bytes > 0 {
(filesystem.used_bytes as f64 / filesystem.total_bytes as f64) * 100.0
} else { 0.0 };
let fs_status = if fs_usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical Status::Critical
} else if storage_pool.usage_percent >= self.config.usage_warning_percent { } else if fs_usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning Status::Warning
} else { } else {
Status::Ok Status::Ok
}; };
// Storage pool info metrics
metrics.push(Metric { metrics.push(Metric {
name: format!("disk_{}_mount_point", pool_name), name: format!("disk_{}_fs_{}_usage_percent", drive_name, fs_name),
value: MetricValue::String(storage_pool.mount_point.clone()), value: MetricValue::Float(fs_usage_percent as f32),
unit: None,
description: Some(format!("Mount: {}", storage_pool.mount_point)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_filesystem", pool_name),
value: MetricValue::String(storage_pool.filesystem.clone()),
unit: None,
description: Some(format!("FS: {}", storage_pool.filesystem)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_storage_type", pool_name),
value: MetricValue::String(storage_pool.storage_type.clone()),
unit: None,
description: Some(format!("Type: {}", storage_pool.storage_type)),
status: Status::Ok,
timestamp,
});
// Storage pool size metrics
metrics.push(Metric {
name: format!("disk_{}_total_gb", pool_name),
value: MetricValue::Float(size_gb),
unit: Some("GB".to_string()),
description: Some(format!("Total: {}", storage_pool.size)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", pool_name),
value: MetricValue::Float(used_gb),
unit: Some("GB".to_string()),
description: Some(format!("Used: {}", storage_pool.used)),
status: pool_status,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_available_gb", pool_name),
value: MetricValue::Float(avail_gb),
unit: Some("GB".to_string()),
description: Some(format!("Available: {}", storage_pool.available)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", pool_name),
value: MetricValue::Float(storage_pool.usage_percent),
unit: Some("%".to_string()), unit: Some("%".to_string()),
description: Some(format!("Usage: {:.1}%", storage_pool.usage_percent)), description: Some(format!("{}: {:.0}%", filesystem.mount_point, fs_usage_percent)),
status: pool_status, status: fs_status.clone(),
timestamp, timestamp,
}); });
// Individual drive metrics for this storage pool metrics.push(Metric {
for drive in &storage_pool.underlying_drives { name: format!("disk_{}_fs_{}_used_gb", drive_name, fs_name),
// Drive health status value: MetricValue::Float(self.bytes_to_gb(filesystem.used_bytes)),
metrics.push(Metric { unit: Some("GB".to_string()),
name: format!("disk_{}_{}_health", pool_name, drive.device), description: Some(format!("{}: {}", filesystem.mount_point, self.bytes_to_human_readable(filesystem.used_bytes))),
value: MetricValue::String(drive.health_status.clone()), status: fs_status.clone(),
unit: None, timestamp,
description: Some(format!("{}: {}", drive.device, drive.health_status)), });
status: if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown },
timestamp,
});
// Drive temperature metrics.push(Metric {
if let Some(temp) = drive.temperature { name: format!("disk_{}_fs_{}_total_gb", drive_name, fs_name),
let temp_status = self.calculate_temperature_status( value: MetricValue::Float(self.bytes_to_gb(filesystem.total_bytes)),
&format!("disk_{}_{}_temperature", pool_name, drive.device), unit: Some("GB".to_string()),
temp, description: Some(format!("{}: {}", filesystem.mount_point, self.bytes_to_human_readable(filesystem.total_bytes))),
status_tracker status: fs_status.clone(),
); timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_{}_temperature", pool_name, drive.device),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive.device, temp)),
status: temp_status,
timestamp,
});
}
// Drive wear level (for SSDs) let fs_available = filesystem.total_bytes.saturating_sub(filesystem.used_bytes);
if let Some(wear) = drive.wear_level { metrics.push(Metric {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical } name: format!("disk_{}_fs_{}_available_gb", drive_name, fs_name),
else if wear >= self.config.wear_warning_percent { Status::Warning } value: MetricValue::Float(self.bytes_to_gb(fs_available)),
else { Status::Ok }; unit: Some("GB".to_string()),
description: Some(format!("{}: {}", filesystem.mount_point, self.bytes_to_human_readable(fs_available))),
metrics.push(Metric { status: Status::Ok,
name: format!("disk_{}_{}_wear_percent", pool_name, drive.device), timestamp,
value: MetricValue::Float(wear), });
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive.device, wear)), metrics.push(Metric {
status: wear_status, name: format!("disk_{}_fs_{}_mount_point", drive_name, fs_name),
timestamp, value: MetricValue::String(filesystem.mount_point.clone()),
}); unit: None,
} description: Some(format!("Mount: {}", filesystem.mount_point)),
} status: Status::Ok,
timestamp,
});
} }
// Add storage pool count metric
metrics.push(Metric {
name: "disk_count".to_string(),
value: MetricValue::Integer(storage_pools.len() as i64),
unit: None,
description: Some(format!("Total storage pools: {}", storage_pools.len())),
status: Status::Ok,
timestamp: chrono::Utc::now().timestamp() as u64,
});
let collection_time = start_time.elapsed();
debug!(
"Multi-disk collection completed in {:?} with {} metrics",
collection_time,
metrics.len()
);
Ok(metrics)
} }
} /// Generate metrics for a mergerfs pool
fn generate_mergerfs_pool_metrics(
&self,
metrics: &mut Vec<Metric>,
pool: &MergerfsPool,
timestamp: u64,
status_tracker: &mut StatusTracker
) {
// Use consistent pool naming: extract mount point without leading slash
let pool_name = if pool.mount_point == "/" {
"root".to_string()
} else {
pool.mount_point.trim_start_matches('/').replace('/', "_")
};
if pool_name.is_empty() {
return;
}
let usage_percent = if pool.total_bytes > 0 {
(pool.used_bytes as f64 / pool.total_bytes as f64) * 100.0
} else { 0.0 };
// Calculate pool health based on drive health
let failed_data = pool.data_drives.iter()
.filter(|d| d.health_status != "PASSED")
.count();
let failed_parity = pool.parity_drives.iter()
.filter(|d| d.health_status != "PASSED")
.count();
let pool_health = match (failed_data, failed_parity) {
(0, 0) => Status::Ok,
(1, 0) | (0, 1) => Status::Warning,
_ => Status::Critical,
};
let usage_status = if usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
let pool_status = if pool_health == Status::Critical { Status::Critical } else { usage_status };
// Pool metrics
metrics.push(Metric {
name: format!("disk_{}_mount_point", pool_name),
value: MetricValue::String(pool.mount_point.clone()),
unit: None,
description: Some(format!("Mount: {}", pool.mount_point)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_pool_type", pool_name),
value: MetricValue::String(format!("mergerfs ({}+{})", pool.data_drives.len(), pool.parity_drives.len())),
unit: None,
description: Some("Type: mergerfs".to_string()),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_pool_health", pool_name),
value: MetricValue::String(match pool_health {
Status::Ok => "healthy".to_string(),
Status::Warning => "degraded".to_string(),
Status::Critical => "critical".to_string(),
_ => "unknown".to_string(),
}),
unit: None,
description: Some("Pool health".to_string()),
status: pool_health,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_total_gb", pool_name),
value: MetricValue::Float(self.bytes_to_gb(pool.total_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("Total: {}", self.bytes_to_human_readable(pool.total_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", pool_name),
value: MetricValue::Float(self.bytes_to_gb(pool.used_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("Used: {}", self.bytes_to_human_readable(pool.used_bytes))),
status: pool_status.clone(),
timestamp,
});
let available_bytes = pool.total_bytes.saturating_sub(pool.used_bytes);
metrics.push(Metric {
name: format!("disk_{}_available_gb", pool_name),
value: MetricValue::Float(self.bytes_to_gb(available_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("Available: {}", self.bytes_to_human_readable(available_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", pool_name),
value: MetricValue::Float(usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("Usage: {:.1}%", usage_percent)),
status: pool_status,
timestamp,
});
// Individual drive metrics
for drive in &pool.data_drives {
self.generate_pool_drive_metrics(metrics, &pool_name, &drive.device, drive, timestamp, status_tracker);
}
for drive in &pool.parity_drives {
self.generate_pool_drive_metrics(metrics, &pool_name, &drive.device, drive, timestamp, status_tracker);
}
}
/// Generate metrics for drives in mergerfs pools
fn generate_pool_drive_metrics(
&self,
metrics: &mut Vec<Metric>,
pool_name: &str,
drive_role: &str,
drive: &DriveInfo,
timestamp: u64,
status_tracker: &mut StatusTracker
) {
let drive_health = if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown };
metrics.push(Metric {
name: format!("disk_{}_{}_health", pool_name, drive_role),
value: MetricValue::String(drive.health_status.clone()),
unit: None,
description: Some(format!("{}: {}", drive.device, drive.health_status)),
status: drive_health,
timestamp,
});
if let Some(temp) = drive.temperature {
let temp_status = self.calculate_temperature_status(
&format!("disk_{}_{}_temperature", pool_name, drive_role), temp, status_tracker
);
metrics.push(Metric {
name: format!("disk_{}_{}_temperature", pool_name, drive_role),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive.device, temp)),
status: temp_status,
timestamp,
});
}
if let Some(wear) = drive.wear_level {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok };
metrics.push(Metric {
name: format!("disk_{}_{}_wear_percent", pool_name, drive_role),
value: MetricValue::Float(wear),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive.device, wear)),
status: wear_status,
timestamp,
});
}
}
}

View File

@@ -0,0 +1,1327 @@
use anyhow::Result;
use async_trait::async_trait;
use cm_dashboard_shared::{Metric, MetricValue, Status, StatusTracker, HysteresisThresholds};
use crate::config::DiskConfig;
use std::process::Command;
use std::time::Instant;
use std::fs;
use tracing::debug;
use super::{Collector, CollectorError};
/// Mount point information from /proc/mounts
#[derive(Debug, Clone)]
struct MountInfo {
device: String, // e.g., "/dev/sda1" or "/mnt/disk1:/mnt/disk2"
mount_point: String, // e.g., "/", "/srv/media"
fs_type: String, // e.g., "ext4", "xfs", "fuse.mergerfs"
}
/// Auto-discovered storage topology
#[derive(Debug, Clone)]
struct StorageTopology {
single_disks: Vec<MountInfo>,
mergerfs_pools: Vec<MergerfsPoolInfo>,
}
/// MergerFS pool information
#[derive(Debug, Clone)]
struct MergerfsPoolInfo {
mount_point: String, // e.g., "/srv/media"
data_members: Vec<String>, // e.g., ["/mnt/disk1", "/mnt/disk2"]
parity_disks: Vec<String>, // e.g., ["/mnt/parity"]
}
/// Information about a storage pool (mount point with underlying drives)
#[derive(Debug, Clone)]
struct StoragePool {
name: String, // e.g., "steampool", "root"
mount_point: String, // e.g., "/mnt/steampool", "/"
filesystem: String, // e.g., "mergerfs", "ext4", "zfs", "btrfs"
pool_type: StoragePoolType, // Enhanced pool type with configuration
size: String, // e.g., "2.5TB"
used: String, // e.g., "2.1TB"
available: String, // e.g., "400GB"
usage_percent: f32, // e.g., 85.0
underlying_drives: Vec<DriveInfo>, // Individual physical drives
pool_health: PoolHealth, // Overall pool health status
}
/// Enhanced storage pool types with specific configurations
#[derive(Debug, Clone)]
enum StoragePoolType {
Single, // Traditional single disk (legacy)
PhysicalDrive { // Physical drive with multiple filesystems
filesystems: Vec<String>, // Mount points on this drive
},
MergerfsPool { // MergerFS with optional parity
data_disks: Vec<String>, // Member disk names (sdb, sdd)
parity_disks: Vec<String>, // Parity disk names (sdc)
},
#[allow(dead_code)]
RaidArray { // Hardware RAID (future)
level: String, // "RAID1", "RAID5", etc.
member_disks: Vec<String>,
spare_disks: Vec<String>,
},
#[allow(dead_code)]
ZfsPool { // ZFS pool (future)
pool_name: String,
vdevs: Vec<String>,
}
}
/// Pool health status for redundant storage
#[derive(Debug, Clone, Copy, PartialEq)]
enum PoolHealth {
Healthy, // All drives OK, parity current
Degraded, // One drive failed or parity outdated, still functional
Critical, // Multiple failures, data at risk
#[allow(dead_code)]
Rebuilding, // Actively rebuilding/scrubbing (future: SnapRAID status integration)
Unknown, // Cannot determine status
}
/// Information about an individual physical drive
#[derive(Debug, Clone)]
struct DriveInfo {
device: String, // e.g., "sda", "nvme0n1"
health_status: String, // e.g., "PASSED", "FAILED"
temperature: Option<f32>, // e.g., 45.0°C
wear_level: Option<f32>, // e.g., 12.0% (for SSDs)
}
/// Disk usage collector for monitoring filesystem sizes
pub struct DiskCollector {
config: DiskConfig,
temperature_thresholds: HysteresisThresholds,
detected_devices: std::collections::HashMap<String, Vec<String>>, // mount_point -> devices
storage_topology: Option<StorageTopology>, // Auto-discovered storage layout
}
impl DiskCollector {
pub fn new(config: DiskConfig) -> Self {
// Create hysteresis thresholds for disk temperature from config
let temperature_thresholds = HysteresisThresholds::with_custom_gaps(
config.temperature_warning_celsius,
5.0, // 5°C gap for recovery
config.temperature_critical_celsius,
5.0, // 5°C gap for recovery
);
// Perform auto-discovery of storage topology
let storage_topology = match Self::auto_discover_storage() {
Ok(topology) => {
debug!("Auto-discovered storage topology: {} single disks, {} mergerfs pools",
topology.single_disks.len(), topology.mergerfs_pools.len());
Some(topology)
}
Err(e) => {
debug!("Failed to auto-discover storage topology: {}", e);
None
}
};
// Detect devices for discovered storage
let mut detected_devices = std::collections::HashMap::new();
if let Some(ref topology) = storage_topology {
// Add single disks
for disk in &topology.single_disks {
if let Ok(devices) = Self::detect_device_for_mount_point_static(&disk.mount_point) {
detected_devices.insert(disk.mount_point.clone(), devices);
}
}
// Add mergerfs pools and their members
for pool in &topology.mergerfs_pools {
// Detect devices for the pool itself
if let Ok(devices) = Self::detect_device_for_mount_point_static(&pool.mount_point) {
detected_devices.insert(pool.mount_point.clone(), devices);
}
// Detect devices for member disks
for member in &pool.data_members {
if let Ok(devices) = Self::detect_device_for_mount_point_static(member) {
detected_devices.insert(member.clone(), devices);
}
}
// Detect devices for parity disks
for parity in &pool.parity_disks {
if let Ok(devices) = Self::detect_device_for_mount_point_static(parity) {
detected_devices.insert(parity.clone(), devices);
}
}
}
} else {
// Fallback: use legacy filesystem config detection
for fs_config in &config.filesystems {
if fs_config.monitor {
if let Ok(devices) = Self::detect_device_for_mount_point_static(&fs_config.mount_point) {
detected_devices.insert(fs_config.mount_point.clone(), devices);
}
}
}
}
Self {
config,
temperature_thresholds,
detected_devices,
storage_topology,
}
}
/// Auto-discover storage topology by parsing system information
fn auto_discover_storage() -> Result<StorageTopology> {
let mounts = Self::parse_proc_mounts()?;
let mut single_disks = Vec::new();
let mut mergerfs_pools = Vec::new();
// Filter out unwanted filesystem types and mount points
let exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc", "cgroup", "cgroup2", "devpts"];
let exclude_mount_prefixes = ["/proc", "/sys", "/dev", "/tmp", "/run"];
for mount in mounts {
// Skip excluded filesystem types
if exclude_fs_types.contains(&mount.fs_type.as_str()) {
continue;
}
// Skip excluded mount point prefixes
if exclude_mount_prefixes.iter().any(|prefix| mount.mount_point.starts_with(prefix)) {
continue;
}
match mount.fs_type.as_str() {
"fuse.mergerfs" => {
// Parse mergerfs pool
let data_members = Self::parse_mergerfs_sources(&mount.device);
let parity_disks = Self::detect_parity_disks(&data_members);
mergerfs_pools.push(MergerfsPoolInfo {
mount_point: mount.mount_point.clone(),
data_members,
parity_disks,
});
debug!("Discovered mergerfs pool at {}", mount.mount_point);
}
"ext4" | "xfs" | "btrfs" | "ntfs" | "vfat" => {
// Check if this mount is part of a mergerfs pool
let is_mergerfs_member = mergerfs_pools.iter()
.any(|pool| pool.data_members.contains(&mount.mount_point) ||
pool.parity_disks.contains(&mount.mount_point));
if !is_mergerfs_member {
debug!("Discovered single disk at {}", mount.mount_point);
single_disks.push(mount);
}
}
_ => {
debug!("Skipping unsupported filesystem type: {}", mount.fs_type);
}
}
}
Ok(StorageTopology {
single_disks,
mergerfs_pools,
})
}
/// Parse /proc/mounts to get all mount information
fn parse_proc_mounts() -> Result<Vec<MountInfo>> {
let mounts_content = fs::read_to_string("/proc/mounts")?;
let mut mounts = Vec::new();
for line in mounts_content.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 3 {
mounts.push(MountInfo {
device: parts[0].to_string(),
mount_point: parts[1].to_string(),
fs_type: parts[2].to_string(),
});
}
}
Ok(mounts)
}
/// Parse mergerfs source string to extract member paths
fn parse_mergerfs_sources(source: &str) -> Vec<String> {
// MergerFS source format: "/mnt/disk1:/mnt/disk2:/mnt/disk3"
source.split(':')
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.collect()
}
/// Detect potential parity disks based on data member heuristics
fn detect_parity_disks(data_members: &[String]) -> Vec<String> {
let mut parity_disks = Vec::new();
// Heuristic 1: Look for mount points with "parity" in the name
if let Ok(mounts) = Self::parse_proc_mounts() {
for mount in mounts {
if mount.mount_point.to_lowercase().contains("parity") &&
(mount.fs_type == "xfs" || mount.fs_type == "ext4") {
debug!("Detected parity disk by name: {}", mount.mount_point);
parity_disks.push(mount.mount_point);
}
}
}
// Heuristic 2: Look for sequential device pattern
// If data members are /mnt/disk1, /mnt/disk2, look for /mnt/disk* that's not in data
if parity_disks.is_empty() {
if let Some(pattern) = Self::extract_mount_pattern(data_members) {
if let Ok(mounts) = Self::parse_proc_mounts() {
for mount in mounts {
if mount.mount_point.starts_with(&pattern) &&
!data_members.contains(&mount.mount_point) &&
(mount.fs_type == "xfs" || mount.fs_type == "ext4") {
debug!("Detected parity disk by pattern: {}", mount.mount_point);
parity_disks.push(mount.mount_point);
}
}
}
}
}
parity_disks
}
/// Extract common mount point pattern from data members
fn extract_mount_pattern(data_members: &[String]) -> Option<String> {
if data_members.is_empty() {
return None;
}
// Find common prefix (e.g., "/mnt/disk" from "/mnt/disk1", "/mnt/disk2")
let first = &data_members[0];
if let Some(last_slash) = first.rfind('/') {
let base = &first[..last_slash + 1]; // Include the slash
// Check if all members share this base
if data_members.iter().all(|member| member.starts_with(base)) {
return Some(base.to_string());
}
}
None
}
/// Calculate disk temperature status using hysteresis thresholds
fn calculate_temperature_status(&self, metric_name: &str, temperature: f32, status_tracker: &mut StatusTracker) -> Status {
status_tracker.calculate_with_hysteresis(metric_name, temperature, &self.temperature_thresholds)
}
/// Get storage pools using auto-discovered topology or fallback to configuration
fn get_configured_storage_pools(&self) -> Result<Vec<StoragePool>> {
if let Some(ref topology) = self.storage_topology {
self.get_auto_discovered_storage_pools(topology)
} else {
self.get_legacy_configured_storage_pools()
}
}
/// Get storage pools from auto-discovered topology
fn get_auto_discovered_storage_pools(&self, topology: &StorageTopology) -> Result<Vec<StoragePool>> {
let mut storage_pools = Vec::new();
// Group single disks by physical drive for unified pool display
let grouped_disks = self.group_filesystems_by_physical_drive(&topology.single_disks)?;
// Process grouped single disks (each physical drive becomes a pool)
for (drive_name, filesystems) in grouped_disks {
// Create a unified pool for this physical drive
let pool = self.create_physical_drive_pool(&drive_name, &filesystems)?;
storage_pools.push(pool);
}
// IMPORTANT: Do not create individual filesystem pools when using auto-discovery
// All single disk filesystems should be grouped into physical drive pools above
// Process mergerfs pools (these remain as logical pools)
for pool_info in &topology.mergerfs_pools {
if let Ok((total_bytes, used_bytes)) = self.get_filesystem_info(&pool_info.mount_point) {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else { 0.0 };
let size = self.bytes_to_human_readable(total_bytes);
let used = self.bytes_to_human_readable(used_bytes);
let available = self.bytes_to_human_readable(available_bytes);
// Collect all member and parity drives
let mut all_drives = Vec::new();
// Add data member drives
for member in &pool_info.data_members {
if let Some(devices) = self.detected_devices.get(member) {
all_drives.extend(devices.clone());
}
}
// Add parity drives
for parity in &pool_info.parity_disks {
if let Some(devices) = self.detected_devices.get(parity) {
all_drives.extend(devices.clone());
}
}
let underlying_drives = self.get_drive_info_for_devices(&all_drives)?;
// Calculate pool health
let pool_health = self.calculate_mergerfs_pool_health(&pool_info.data_members, &pool_info.parity_disks, &underlying_drives);
// Generate pool name from mount point
let name = pool_info.mount_point.trim_start_matches('/').replace('/', "_");
storage_pools.push(StoragePool {
name,
mount_point: pool_info.mount_point.clone(),
filesystem: "fuse.mergerfs".to_string(),
pool_type: StoragePoolType::MergerfsPool {
data_disks: pool_info.data_members.iter()
.filter_map(|member| self.detected_devices.get(member).and_then(|devices| devices.first().cloned()))
.collect(),
parity_disks: pool_info.parity_disks.iter()
.filter_map(|parity| self.detected_devices.get(parity).and_then(|devices| devices.first().cloned()))
.collect(),
},
size,
used,
available,
usage_percent: usage_percent as f32,
underlying_drives,
pool_health,
});
debug!("Auto-discovered mergerfs pool: {} with {} data + {} parity disks",
pool_info.mount_point, pool_info.data_members.len(), pool_info.parity_disks.len());
}
}
Ok(storage_pools)
}
/// Group filesystems by their backing physical drive
fn group_filesystems_by_physical_drive(&self, filesystems: &[MountInfo]) -> Result<std::collections::HashMap<String, Vec<MountInfo>>> {
let mut grouped = std::collections::HashMap::new();
for fs in filesystems {
// Get the physical drive name for this mount point
if let Some(devices) = self.detected_devices.get(&fs.mount_point) {
if let Some(device_name) = devices.first() {
// Extract base drive name from detected device
let drive_name = Self::extract_base_device(device_name)
.unwrap_or_else(|| device_name.clone());
debug!("Grouping filesystem {} (device: {}) under drive: {}",
fs.mount_point, device_name, drive_name);
grouped.entry(drive_name).or_insert_with(Vec::new).push(fs.clone());
}
}
}
debug!("Filesystem grouping result: {} drives with filesystems: {:?}",
grouped.len(),
grouped.keys().collect::<Vec<_>>());
Ok(grouped)
}
/// Create a physical drive pool containing multiple filesystems
fn create_physical_drive_pool(&self, drive_name: &str, filesystems: &[MountInfo]) -> Result<StoragePool> {
if filesystems.is_empty() {
return Err(anyhow::anyhow!("No filesystems for drive {}", drive_name));
}
// Calculate total usage across all filesystems on this drive
let mut total_capacity = 0u64;
let mut total_used = 0u64;
for fs in filesystems {
if let Ok((capacity, used)) = self.get_filesystem_info(&fs.mount_point) {
total_capacity += capacity;
total_used += used;
}
}
let total_available = total_capacity.saturating_sub(total_used);
let usage_percent = if total_capacity > 0 {
(total_used as f64 / total_capacity as f64) * 100.0
} else { 0.0 };
// Get drive information for SMART data
let device_names = vec![drive_name.to_string()];
let underlying_drives = self.get_drive_info_for_devices(&device_names)?;
// Collect filesystem mount points for this drive
let filesystem_mount_points: Vec<String> = filesystems.iter()
.map(|fs| fs.mount_point.clone())
.collect();
Ok(StoragePool {
name: drive_name.to_string(),
mount_point: format!("(physical drive)"), // Special marker for physical drives
filesystem: "physical".to_string(),
pool_type: StoragePoolType::PhysicalDrive {
filesystems: filesystem_mount_points,
},
size: self.bytes_to_human_readable(total_capacity),
used: self.bytes_to_human_readable(total_used),
available: self.bytes_to_human_readable(total_available),
usage_percent: usage_percent as f32,
pool_health: if underlying_drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Critical
},
underlying_drives,
})
}
/// Calculate pool health specifically for mergerfs pools
fn calculate_mergerfs_pool_health(&self, data_members: &[String], parity_disks: &[String], drives: &[DriveInfo]) -> PoolHealth {
// Get device names for data and parity drives
let mut data_device_names = Vec::new();
let mut parity_device_names = Vec::new();
for member in data_members {
if let Some(devices) = self.detected_devices.get(member) {
data_device_names.extend(devices.clone());
}
}
for parity in parity_disks {
if let Some(devices) = self.detected_devices.get(parity) {
parity_device_names.extend(devices.clone());
}
}
let failed_data = drives.iter()
.filter(|d| data_device_names.contains(&d.device) && d.health_status != "PASSED")
.count();
let failed_parity = drives.iter()
.filter(|d| parity_device_names.contains(&d.device) && d.health_status != "PASSED")
.count();
match (failed_data, failed_parity) {
(0, 0) => PoolHealth::Healthy,
(1, 0) => PoolHealth::Degraded, // Can recover with parity
(0, 1) => PoolHealth::Degraded, // Lost parity protection
_ => PoolHealth::Critical, // Multiple failures
}
}
/// Fallback to legacy configuration-based storage pools
fn get_legacy_configured_storage_pools(&self) -> Result<Vec<StoragePool>> {
let mut storage_pools = Vec::new();
let mut processed_pools = std::collections::HashSet::new();
// Legacy implementation: use filesystem configuration
for fs_config in &self.config.filesystems {
if !fs_config.monitor {
continue;
}
let (pool_type, skip_in_single_mode) = self.determine_pool_type(&fs_config.storage_type);
// Skip member disks if they're part of a pool
if skip_in_single_mode {
continue;
}
// Check if this pool was already processed (in case of multiple member disks)
let pool_key = match &pool_type {
StoragePoolType::MergerfsPool { .. } => {
// For mergerfs pools, use the main mount point
if fs_config.fs_type == "fuse.mergerfs" {
fs_config.mount_point.clone()
} else {
continue; // Skip member disks
}
}
_ => fs_config.mount_point.clone()
};
if processed_pools.contains(&pool_key) {
continue;
}
processed_pools.insert(pool_key.clone());
// Get filesystem stats for the mount point
match self.get_filesystem_info(&fs_config.mount_point) {
Ok((total_bytes, used_bytes)) => {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else { 0.0 };
// Convert bytes to human-readable format
let size = self.bytes_to_human_readable(total_bytes);
let used = self.bytes_to_human_readable(used_bytes);
let available = self.bytes_to_human_readable(available_bytes);
// Get underlying drives based on pool type
let underlying_drives = self.get_pool_drives(&pool_type, &fs_config.mount_point)?;
// Calculate pool health
let pool_health = self.calculate_pool_health(&pool_type, &underlying_drives);
let drive_count = underlying_drives.len();
storage_pools.push(StoragePool {
name: fs_config.name.clone(),
mount_point: fs_config.mount_point.clone(),
filesystem: fs_config.fs_type.clone(),
pool_type: pool_type.clone(),
size,
used,
available,
usage_percent: usage_percent as f32,
underlying_drives,
pool_health,
});
debug!(
"Legacy configured storage pool '{}' ({:?}) at {} with {} drives, health: {:?}",
fs_config.name, pool_type, fs_config.mount_point, drive_count, pool_health
);
}
Err(e) => {
debug!(
"Failed to get filesystem info for storage pool '{}': {}",
fs_config.name, e
);
}
}
}
Ok(storage_pools)
}
/// Determine the storage pool type from configuration
fn determine_pool_type(&self, storage_type: &str) -> (StoragePoolType, bool) {
match storage_type {
"single" => (StoragePoolType::Single, false),
"mergerfs_pool" | "mergerfs" => {
// Find associated member disks
let data_disks = self.find_pool_member_disks("mergerfs_member");
let parity_disks = self.find_pool_member_disks("parity");
(StoragePoolType::MergerfsPool { data_disks, parity_disks }, false)
}
"mergerfs_member" => (StoragePoolType::Single, true), // Skip, part of pool
"parity" => (StoragePoolType::Single, true), // Skip, part of pool
"raid1" | "raid5" | "raid6" => {
let member_disks = self.find_pool_member_disks(&format!("{}_member", storage_type));
(StoragePoolType::RaidArray {
level: storage_type.to_uppercase(),
member_disks,
spare_disks: Vec::new()
}, false)
}
_ => (StoragePoolType::Single, false) // Default to single
}
}
/// Find member disks for a specific storage type
fn find_pool_member_disks(&self, member_type: &str) -> Vec<String> {
let mut member_disks = Vec::new();
for fs_config in &self.config.filesystems {
if fs_config.storage_type == member_type && fs_config.monitor {
// Get device names for this mount point
if let Some(devices) = self.detected_devices.get(&fs_config.mount_point) {
member_disks.extend(devices.clone());
}
}
}
member_disks
}
/// Get drive information for a specific pool type
fn get_pool_drives(&self, pool_type: &StoragePoolType, mount_point: &str) -> Result<Vec<DriveInfo>> {
match pool_type {
StoragePoolType::Single => {
// Single disk - use detected devices for this mount point
let device_names = self.detected_devices.get(mount_point).cloned().unwrap_or_default();
self.get_drive_info_for_devices(&device_names)
}
StoragePoolType::PhysicalDrive { .. } => {
// Physical drive - get drive info for the drive directly (mount_point not used)
let device_names = vec![mount_point.to_string()];
self.get_drive_info_for_devices(&device_names)
}
StoragePoolType::MergerfsPool { data_disks, parity_disks } => {
// Mergerfs pool - collect all member drives
let mut all_disks = data_disks.clone();
all_disks.extend(parity_disks.clone());
self.get_drive_info_for_devices(&all_disks)
}
StoragePoolType::RaidArray { member_disks, spare_disks, .. } => {
// RAID array - collect member and spare drives
let mut all_disks = member_disks.clone();
all_disks.extend(spare_disks.clone());
self.get_drive_info_for_devices(&all_disks)
}
StoragePoolType::ZfsPool { .. } => {
// ZFS pool - use detected devices (future implementation)
let device_names = self.detected_devices.get(mount_point).cloned().unwrap_or_default();
self.get_drive_info_for_devices(&device_names)
}
}
}
/// Calculate pool health based on drive status and pool type
fn calculate_pool_health(&self, pool_type: &StoragePoolType, drives: &[DriveInfo]) -> PoolHealth {
match pool_type {
StoragePoolType::Single => {
// Single disk - health is just the drive health
if drives.is_empty() {
PoolHealth::Unknown
} else if drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Critical
}
}
StoragePoolType::PhysicalDrive { .. } => {
// Physical drive - health is just the drive health (similar to Single)
if drives.is_empty() {
PoolHealth::Unknown
} else if drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Critical
}
}
StoragePoolType::MergerfsPool { data_disks, parity_disks } => {
let failed_data = drives.iter()
.filter(|d| data_disks.contains(&d.device) && d.health_status != "PASSED")
.count();
let failed_parity = drives.iter()
.filter(|d| parity_disks.contains(&d.device) && d.health_status != "PASSED")
.count();
match (failed_data, failed_parity) {
(0, 0) => PoolHealth::Healthy,
(1, 0) => PoolHealth::Degraded, // Can recover with parity
(0, 1) => PoolHealth::Degraded, // Lost parity protection
_ => PoolHealth::Critical, // Multiple failures
}
}
StoragePoolType::RaidArray { level, .. } => {
let failed_drives = drives.iter().filter(|d| d.health_status != "PASSED").count();
// Basic RAID health logic (can be enhanced per RAID level)
match failed_drives {
0 => PoolHealth::Healthy,
1 if level.contains('1') || level.contains('5') || level.contains('6') => PoolHealth::Degraded,
_ => PoolHealth::Critical,
}
}
StoragePoolType::ZfsPool { .. } => {
// ZFS health would require zpool status parsing (future)
if drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Degraded
}
}
}
}
/// Get drive information for a list of device names
fn get_drive_info_for_devices(&self, device_names: &[String]) -> Result<Vec<DriveInfo>> {
let mut drives = Vec::new();
for device_name in device_names {
let device_path = format!("/dev/{}", device_name);
// Get SMART data for this drive
let (health_status, temperature, wear_level) = self.get_smart_data(&device_path);
drives.push(DriveInfo {
device: device_name.clone(),
health_status: health_status.clone(),
temperature,
wear_level,
});
debug!(
"Drive info for {}: health={}, temp={:?}°C, wear={:?}%",
device_name, health_status, temperature, wear_level
);
}
Ok(drives)
}
/// Get SMART data for a drive (health, temperature, wear level)
fn get_smart_data(&self, device_path: &str) -> (String, Option<f32>, Option<f32>) {
// Try to get SMART data using smartctl
let output = Command::new("sudo")
.arg("smartctl")
.arg("-a")
.arg(device_path)
.output();
match output {
Ok(result) if result.status.success() => {
let stdout = String::from_utf8_lossy(&result.stdout);
// Parse health status
let health = if stdout.contains("PASSED") {
"PASSED".to_string()
} else if stdout.contains("FAILED") {
"FAILED".to_string()
} else {
"UNKNOWN".to_string()
};
// Parse temperature (look for various temperature indicators)
let temperature = self.parse_temperature_from_smart(&stdout);
// Parse wear level (for SSDs)
let wear_level = self.parse_wear_level_from_smart(&stdout);
(health, temperature, wear_level)
}
_ => {
debug!("Failed to get SMART data for {}", device_path);
("UNKNOWN".to_string(), None, None)
}
}
}
/// Parse temperature from SMART output
fn parse_temperature_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
// Look for temperature in various formats
if line.contains("Temperature_Celsius") || line.contains("Temperature") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if let Ok(temp) = parts[9].parse::<f32>() {
return Some(temp);
}
}
}
// NVMe drives might show temperature differently
if line.contains("temperature:") {
if let Some(temp_part) = line.split("temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
}
None
}
/// Parse wear level from SMART output (SSD wear leveling)
/// Supports both NVMe and SATA SSD wear indicators
fn parse_wear_level_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
let line = line.trim();
// NVMe drives - direct percentage used
if line.contains("Percentage Used:") {
if let Some(wear_part) = line.split("Percentage Used:").nth(1) {
if let Some(wear_str) = wear_part.split('%').next() {
if let Ok(wear) = wear_str.trim().parse::<f32>() {
return Some(wear);
}
}
}
}
// SATA SSD attributes - parse SMART table format
// Format: ID ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
// SSD Life Left / Percent Lifetime Remaining (higher = less wear)
if line.contains("SSD_Life_Left") || line.contains("Percent_Lifetime_Remain") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Media Wearout Indicator (lower = more wear, normalize to 0-100)
if line.contains("Media_Wearout_Indicator") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Wear Leveling Count (higher = less wear, but varies by manufacturer)
if line.contains("Wear_Leveling_Count") {
if let Ok(wear_count) = parts[3].parse::<f32>() { // VALUE column
// Most SSDs: 100 = new, decreases with wear
if wear_count <= 100.0 {
return Some(100.0 - wear_count);
}
}
}
// Total LBAs Written - calculate against typical endurance if available
// This is more complex and manufacturer-specific, so we skip for now
}
}
None
}
/// Convert bytes to human-readable format
fn bytes_to_human_readable(&self, bytes: u64) -> String {
const UNITS: &[&str] = &["B", "K", "M", "G", "T"];
let mut size = bytes as f64;
let mut unit_index = 0;
while size >= 1024.0 && unit_index < UNITS.len() - 1 {
size /= 1024.0;
unit_index += 1;
}
if unit_index == 0 {
format!("{:.0}{}", size, UNITS[unit_index])
} else {
format!("{:.1}{}", size, UNITS[unit_index])
}
}
/// Convert bytes to gigabytes
fn bytes_to_gb(&self, bytes: u64) -> f32 {
bytes as f32 / (1024.0 * 1024.0 * 1024.0)
}
/// Detect device backing a mount point using lsblk (static version for startup)
fn detect_device_for_mount_point_static(mount_point: &str) -> Result<Vec<String>> {
let output = Command::new("lsblk")
.args(&["-n", "-o", "NAME,MOUNTPOINT"])
.output()?;
if !output.status.success() {
return Ok(Vec::new());
}
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 && parts[1] == mount_point {
// Remove tree symbols and extract device name (e.g., "├─nvme0n1p2" -> "nvme0n1p2")
let device_name = parts[0]
.trim_start_matches('├')
.trim_start_matches('└')
.trim_start_matches('─')
.trim();
// Extract base device name (e.g., "nvme0n1p2" -> "nvme0n1")
if let Some(base_device) = Self::extract_base_device(device_name) {
return Ok(vec![base_device]);
}
}
}
Ok(Vec::new())
}
/// Extract base device name from partition (e.g., "nvme0n1p2" -> "nvme0n1", "sda1" -> "sda")
fn extract_base_device(device_name: &str) -> Option<String> {
// Handle NVMe devices (nvme0n1p1 -> nvme0n1)
if device_name.starts_with("nvme") {
if let Some(p_pos) = device_name.find('p') {
return Some(device_name[..p_pos].to_string());
}
}
// Handle traditional devices (sda1 -> sda)
if device_name.len() > 1 {
let chars: Vec<char> = device_name.chars().collect();
let mut end_idx = chars.len();
// Find where the device name ends and partition number begins
for (i, &c) in chars.iter().enumerate().rev() {
if !c.is_ascii_digit() {
end_idx = i + 1;
break;
}
}
if end_idx > 0 && end_idx < chars.len() {
return Some(chars[..end_idx].iter().collect());
}
}
// If no partition detected, return as-is
Some(device_name.to_string())
}
/// Get filesystem info using df command
fn get_filesystem_info(&self, path: &str) -> Result<(u64, u64)> {
let output = Command::new("df")
.arg("--block-size=1")
.arg(path)
.output()?;
if !output.status.success() {
return Err(anyhow::anyhow!("df command failed for {}", path));
}
let output_str = String::from_utf8(output.stdout)?;
let lines: Vec<&str> = output_str.lines().collect();
if lines.len() < 2 {
return Err(anyhow::anyhow!("Unexpected df output format"));
}
let fields: Vec<&str> = lines[1].split_whitespace().collect();
if fields.len() < 4 {
return Err(anyhow::anyhow!("Unexpected df fields count"));
}
let total_bytes = fields[1].parse::<u64>()?;
let used_bytes = fields[2].parse::<u64>()?;
Ok((total_bytes, used_bytes))
}
/// Parse size string (e.g., "120G", "45M") to GB value
fn parse_size_to_gb(&self, size_str: &str) -> f32 {
let size_str = size_str.trim();
if size_str.is_empty() || size_str == "-" {
return 0.0;
}
// Extract numeric part and unit
let (num_str, unit) = if let Some(last_char) = size_str.chars().last() {
if last_char.is_alphabetic() {
let num_part = &size_str[..size_str.len() - 1];
let unit_part = &size_str[size_str.len() - 1..];
(num_part, unit_part)
} else {
(size_str, "")
}
} else {
(size_str, "")
};
let number: f32 = num_str.parse().unwrap_or(0.0);
match unit.to_uppercase().as_str() {
"T" | "TB" => number * 1024.0,
"G" | "GB" => number,
"M" | "MB" => number / 1024.0,
"K" | "KB" => number / (1024.0 * 1024.0),
"B" | "" => number / (1024.0 * 1024.0 * 1024.0),
_ => number, // Assume GB if unknown unit
}
}
}
#[async_trait]
impl Collector for DiskCollector {
async fn collect(&self, status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> {
let start_time = Instant::now();
debug!("Collecting storage pool and individual drive metrics");
let mut metrics = Vec::new();
// Get configured storage pools with individual drive data
let storage_pools = match self.get_configured_storage_pools() {
Ok(pools) => {
debug!("Found {} storage pools", pools.len());
pools
}
Err(e) => {
debug!("Failed to get storage pools: {}", e);
Vec::new()
}
};
// Generate metrics for each storage pool and its underlying drives
for storage_pool in &storage_pools {
let timestamp = chrono::Utc::now().timestamp() as u64;
// Storage pool overall metrics
let pool_name = &storage_pool.name;
// Parse size strings to get actual values for calculations
let size_gb = self.parse_size_to_gb(&storage_pool.size);
let used_gb = self.parse_size_to_gb(&storage_pool.used);
let avail_gb = self.parse_size_to_gb(&storage_pool.available);
// Calculate status based on configured thresholds and pool health
let usage_status = if storage_pool.usage_percent >= self.config.usage_critical_percent {
Status::Critical
} else if storage_pool.usage_percent >= self.config.usage_warning_percent {
Status::Warning
} else {
Status::Ok
};
let pool_status = match storage_pool.pool_health {
PoolHealth::Critical => Status::Critical,
PoolHealth::Degraded => Status::Warning,
PoolHealth::Rebuilding => Status::Warning,
PoolHealth::Healthy => usage_status,
PoolHealth::Unknown => Status::Unknown,
};
// Storage pool info metrics
metrics.push(Metric {
name: format!("disk_{}_mount_point", pool_name),
value: MetricValue::String(storage_pool.mount_point.clone()),
unit: None,
description: Some(format!("Mount: {}", storage_pool.mount_point)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_filesystem", pool_name),
value: MetricValue::String(storage_pool.filesystem.clone()),
unit: None,
description: Some(format!("FS: {}", storage_pool.filesystem)),
status: Status::Ok,
timestamp,
});
// Enhanced pool type information
let pool_type_str = match &storage_pool.pool_type {
StoragePoolType::Single => "single".to_string(),
StoragePoolType::PhysicalDrive { filesystems } => {
format!("drive ({})", filesystems.len())
}
StoragePoolType::MergerfsPool { data_disks, parity_disks } => {
format!("mergerfs ({}+{})", data_disks.len(), parity_disks.len())
}
StoragePoolType::RaidArray { level, member_disks, spare_disks } => {
format!("{} ({}+{})", level, member_disks.len(), spare_disks.len())
}
StoragePoolType::ZfsPool { pool_name, .. } => {
format!("zfs ({})", pool_name)
}
};
metrics.push(Metric {
name: format!("disk_{}_pool_type", pool_name),
value: MetricValue::String(pool_type_str.clone()),
unit: None,
description: Some(format!("Type: {}", pool_type_str)),
status: Status::Ok,
timestamp,
});
// Pool health status
let health_str = match storage_pool.pool_health {
PoolHealth::Healthy => "healthy",
PoolHealth::Degraded => "degraded",
PoolHealth::Critical => "critical",
PoolHealth::Rebuilding => "rebuilding",
PoolHealth::Unknown => "unknown",
};
metrics.push(Metric {
name: format!("disk_{}_pool_health", pool_name),
value: MetricValue::String(health_str.to_string()),
unit: None,
description: Some(format!("Health: {}", health_str)),
status: pool_status,
timestamp,
});
// Storage pool size metrics
metrics.push(Metric {
name: format!("disk_{}_total_gb", pool_name),
value: MetricValue::Float(size_gb),
unit: Some("GB".to_string()),
description: Some(format!("Total: {}", storage_pool.size)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", pool_name),
value: MetricValue::Float(used_gb),
unit: Some("GB".to_string()),
description: Some(format!("Used: {}", storage_pool.used)),
status: pool_status,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_available_gb", pool_name),
value: MetricValue::Float(avail_gb),
unit: Some("GB".to_string()),
description: Some(format!("Available: {}", storage_pool.available)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", pool_name),
value: MetricValue::Float(storage_pool.usage_percent),
unit: Some("%".to_string()),
description: Some(format!("Usage: {:.1}%", storage_pool.usage_percent)),
status: pool_status,
timestamp,
});
// Individual drive metrics for this storage pool
for drive in &storage_pool.underlying_drives {
// Drive health status
metrics.push(Metric {
name: format!("disk_{}_{}_health", pool_name, drive.device),
value: MetricValue::String(drive.health_status.clone()),
unit: None,
description: Some(format!("{}: {}", drive.device, drive.health_status)),
status: if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown },
timestamp,
});
// Drive temperature
if let Some(temp) = drive.temperature {
let temp_status = self.calculate_temperature_status(
&format!("disk_{}_{}_temperature", pool_name, drive.device),
temp,
status_tracker
);
metrics.push(Metric {
name: format!("disk_{}_{}_temperature", pool_name, drive.device),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive.device, temp)),
status: temp_status,
timestamp,
});
}
// Drive wear level (for SSDs)
if let Some(wear) = drive.wear_level {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok };
metrics.push(Metric {
name: format!("disk_{}_{}_wear_percent", pool_name, drive.device),
value: MetricValue::Float(wear),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive.device, wear)),
status: wear_status,
timestamp,
});
}
}
// Individual filesystem metrics for PhysicalDrive pools
if let StoragePoolType::PhysicalDrive { filesystems } = &storage_pool.pool_type {
for filesystem_mount in filesystems {
if let Ok((total_bytes, used_bytes)) = self.get_filesystem_info(filesystem_mount) {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else { 0.0 };
let filesystem_name = if filesystem_mount == "/" {
"root".to_string()
} else {
filesystem_mount.trim_start_matches('/').replace('/', "_")
};
// Calculate filesystem status based on usage
let fs_status = if usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
// Filesystem usage metrics
metrics.push(Metric {
name: format!("disk_{}_fs_{}_usage_percent", pool_name, filesystem_name),
value: MetricValue::Float(usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}%", filesystem_mount, usage_percent)),
status: fs_status.clone(),
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_used_gb", pool_name, filesystem_name),
value: MetricValue::Float(self.bytes_to_gb(used_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}GB used", filesystem_mount, self.bytes_to_human_readable(used_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_total_gb", pool_name, filesystem_name),
value: MetricValue::Float(self.bytes_to_gb(total_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}GB total", filesystem_mount, self.bytes_to_human_readable(total_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_available_gb", pool_name, filesystem_name),
value: MetricValue::Float(self.bytes_to_gb(available_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}GB available", filesystem_mount, self.bytes_to_human_readable(available_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_mount_point", pool_name, filesystem_name),
value: MetricValue::String(filesystem_mount.clone()),
unit: None,
description: Some(format!("Mount: {}", filesystem_mount)),
status: Status::Ok,
timestamp,
});
}
}
}
}
// Add storage pool count metric
metrics.push(Metric {
name: "disk_count".to_string(),
value: MetricValue::Integer(storage_pools.len() as i64),
unit: None,
description: Some(format!("Total storage pools: {}", storage_pools.len())),
status: Status::Ok,
timestamp: chrono::Utc::now().timestamp() as u64,
});
let collection_time = start_time.elapsed();
debug!(
"Multi-disk collection completed in {:?} with {} metrics",
collection_time,
metrics.len()
);
Ok(metrics)
}
}

View File

@@ -1,5 +1,5 @@
use anyhow::Result; use anyhow::Result;
use cm_dashboard_shared::{MessageEnvelope, MetricMessage}; use cm_dashboard_shared::{AgentData, MessageEnvelope};
use tracing::{debug, info}; use tracing::{debug, info};
use zmq::{Context, Socket, SocketType}; use zmq::{Context, Socket, SocketType};
@@ -43,17 +43,17 @@ impl ZmqHandler {
}) })
} }
/// Publish metrics message via ZMQ
pub async fn publish_metrics(&self, message: &MetricMessage) -> Result<()> { /// Publish agent data via ZMQ
pub async fn publish_agent_data(&self, data: &AgentData) -> Result<()> {
debug!( debug!(
"Publishing {} metrics for host {}", "Publishing agent data for host {}",
message.metrics.len(), data.hostname
message.hostname
); );
// Create message envelope // Create message envelope for agent data
let envelope = MessageEnvelope::metrics(message.clone()) let envelope = MessageEnvelope::agent_data(data.clone())
.map_err(|e| anyhow::anyhow!("Failed to create message envelope: {}", e))?; .map_err(|e| anyhow::anyhow!("Failed to create agent data envelope: {}", e))?;
// Serialize envelope // Serialize envelope
let serialized = serde_json::to_vec(&envelope)?; let serialized = serde_json::to_vec(&envelope)?;
@@ -61,11 +61,10 @@ impl ZmqHandler {
// Send via ZMQ // Send via ZMQ
self.publisher.send(&serialized, 0)?; self.publisher.send(&serialized, 0)?;
debug!("Published metrics message ({} bytes)", serialized.len()); debug!("Published agent data message ({} bytes)", serialized.len());
Ok(()) Ok(())
} }
/// Try to receive a command (non-blocking) /// Try to receive a command (non-blocking)
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> { pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
match self.command_receiver.recv_bytes(zmq::DONTWAIT) { match self.command_receiver.recv_bytes(zmq::DONTWAIT) {

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.83" version = "0.1.131"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -20,12 +20,13 @@ pub struct Dashboard {
tui_app: Option<TuiApp>, tui_app: Option<TuiApp>,
terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>, terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>,
headless: bool, headless: bool,
raw_data: bool,
initial_commands_sent: std::collections::HashSet<String>, initial_commands_sent: std::collections::HashSet<String>,
config: DashboardConfig, config: DashboardConfig,
} }
impl Dashboard { impl Dashboard {
pub async fn new(config_path: Option<String>, headless: bool) -> Result<Self> { pub async fn new(config_path: Option<String>, headless: bool, raw_data: bool) -> Result<Self> {
info!("Initializing dashboard"); info!("Initializing dashboard");
// Load configuration - try default path if not specified // Load configuration - try default path if not specified
@@ -119,6 +120,7 @@ impl Dashboard {
tui_app, tui_app,
terminal, terminal,
headless, headless,
raw_data,
initial_commands_sent: std::collections::HashSet::new(), initial_commands_sent: std::collections::HashSet::new(),
config, config,
}) })
@@ -183,30 +185,35 @@ impl Dashboard {
// Check for new metrics // Check for new metrics
if last_metrics_check.elapsed() >= metrics_check_interval { if last_metrics_check.elapsed() >= metrics_check_interval {
if let Ok(Some(metric_message)) = self.zmq_consumer.receive_metrics().await { if let Ok(Some(agent_data)) = self.zmq_consumer.receive_agent_data().await {
debug!( debug!(
"Received metrics from {}: {} metrics", "Received agent data from {}",
metric_message.hostname, agent_data.hostname
metric_message.metrics.len()
); );
// Track first contact with host (no command needed - agent sends data every 2s) // Track first contact with host (no command needed - agent sends data every 2s)
let is_new_host = !self let is_new_host = !self
.initial_commands_sent .initial_commands_sent
.contains(&metric_message.hostname); .contains(&agent_data.hostname);
if is_new_host { if is_new_host {
info!( info!(
"First contact with host {} - data will update automatically", "First contact with host {} - data will update automatically",
metric_message.hostname agent_data.hostname
); );
self.initial_commands_sent self.initial_commands_sent
.insert(metric_message.hostname.clone()); .insert(agent_data.hostname.clone());
} }
// Update metric store // Show raw data if requested (before processing)
self.metric_store if self.raw_data {
.update_metrics(&metric_message.hostname, metric_message.metrics); println!("RAW AGENT DATA FROM {}:", agent_data.hostname);
println!("{}", serde_json::to_string_pretty(&agent_data).unwrap_or_else(|e| format!("Serialization error: {}", e)));
println!("{}", "".repeat(80));
}
// Update data store
self.metric_store.process_agent_data(agent_data);
// Check for agent version mismatches across hosts // Check for agent version mismatches across hosts
if let Some((current_version, outdated_hosts)) = self.metric_store.get_version_mismatches() { if let Some((current_version, outdated_hosts)) = self.metric_store.get_version_mismatches() {

View File

@@ -1,5 +1,5 @@
use anyhow::Result; use anyhow::Result;
use cm_dashboard_shared::{CommandOutputMessage, MessageEnvelope, MessageType, MetricMessage}; use cm_dashboard_shared::{AgentData, CommandOutputMessage, MessageEnvelope, MessageType};
use tracing::{debug, error, info, warn}; use tracing::{debug, error, info, warn};
use zmq::{Context, Socket, SocketType}; use zmq::{Context, Socket, SocketType};
@@ -117,8 +117,8 @@ impl ZmqConsumer {
} }
} }
/// Receive metrics from any connected agent (non-blocking) /// Receive agent data (non-blocking)
pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> { pub async fn receive_agent_data(&mut self) -> Result<Option<AgentData>> {
match self.subscriber.recv_bytes(zmq::DONTWAIT) { match self.subscriber.recv_bytes(zmq::DONTWAIT) {
Ok(data) => { Ok(data) => {
debug!("Received {} bytes from ZMQ", data.len()); debug!("Received {} bytes from ZMQ", data.len());
@@ -129,29 +129,27 @@ impl ZmqConsumer {
// Check message type // Check message type
match envelope.message_type { match envelope.message_type {
MessageType::Metrics => { MessageType::AgentData => {
let metrics = envelope let agent_data = envelope
.decode_metrics() .decode_agent_data()
.map_err(|e| anyhow::anyhow!("Failed to decode metrics: {}", e))?; .map_err(|e| anyhow::anyhow!("Failed to decode agent data: {}", e))?;
debug!( debug!(
"Received {} metrics from {}", "Received agent data from host {}",
metrics.metrics.len(), agent_data.hostname
metrics.hostname
); );
Ok(Some(agent_data))
Ok(Some(metrics))
} }
MessageType::Heartbeat => { MessageType::Heartbeat => {
debug!("Received heartbeat"); debug!("Received heartbeat");
Ok(None) // Don't return heartbeats as metrics Ok(None) // Don't return heartbeats
} }
MessageType::CommandOutput => { MessageType::CommandOutput => {
debug!("Received command output (will be handled by receive_command_output)"); debug!("Received command output (will be handled by receive_command_output)");
Ok(None) // Command output handled by separate method Ok(None) // Command output handled by separate method
} }
_ => { _ => {
debug!("Received non-metrics message: {:?}", envelope.message_type); debug!("Received unsupported message: {:?}", envelope.message_type);
Ok(None) Ok(None)
} }
} }
@@ -166,5 +164,6 @@ impl ZmqConsumer {
} }
} }
} }
} }

View File

@@ -55,8 +55,8 @@ pub struct SystemConfig {
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SshConfig { pub struct SshConfig {
pub rebuild_user: String, pub rebuild_user: String,
pub rebuild_alias: String, pub rebuild_cmd: String,
pub backup_alias: String, pub service_manage_cmd: String,
} }
/// Service log file configuration per host /// Service log file configuration per host

View File

@@ -51,6 +51,10 @@ struct Cli {
/// Run in headless mode (no TUI, just logging) /// Run in headless mode (no TUI, just logging)
#[arg(long)] #[arg(long)]
headless: bool, headless: bool,
/// Show raw agent data in headless mode
#[arg(long)]
raw_data: bool,
} }
#[tokio::main] #[tokio::main]
@@ -86,7 +90,7 @@ async fn main() -> Result<()> {
} }
// Create and run dashboard // Create and run dashboard
let mut dashboard = Dashboard::new(cli.config, cli.headless).await?; let mut dashboard = Dashboard::new(cli.config, cli.headless, cli.raw_data).await?;
// Setup graceful shutdown // Setup graceful shutdown
let ctrl_c = async { let ctrl_c = async {

View File

@@ -1,4 +1,4 @@
use cm_dashboard_shared::Metric; use cm_dashboard_shared::{AgentData, Metric};
use std::collections::HashMap; use std::collections::HashMap;
use std::time::{Duration, Instant}; use std::time::{Duration, Instant};
use tracing::{debug, info, warn}; use tracing::{debug, info, warn};
@@ -76,6 +76,286 @@ impl MetricStore {
); );
} }
/// Process structured agent data (temporary bridge - converts back to metrics)
/// TODO: Replace entire metric system with direct structured data processing
pub fn process_agent_data(&mut self, agent_data: AgentData) {
let metrics = self.convert_agent_data_to_metrics(&agent_data);
self.update_metrics(&agent_data.hostname, metrics);
}
/// Convert structured agent data to legacy metrics (temporary bridge)
fn convert_agent_data_to_metrics(&self, agent_data: &AgentData) -> Vec<Metric> {
use cm_dashboard_shared::{Metric, MetricValue, Status};
let mut metrics = Vec::new();
// Convert CPU data
metrics.push(Metric::new(
"cpu_load_1min".to_string(),
MetricValue::Float(agent_data.system.cpu.load_1min),
Status::Ok,
));
metrics.push(Metric::new(
"cpu_load_5min".to_string(),
MetricValue::Float(agent_data.system.cpu.load_5min),
Status::Ok,
));
metrics.push(Metric::new(
"cpu_load_15min".to_string(),
MetricValue::Float(agent_data.system.cpu.load_15min),
Status::Ok,
));
metrics.push(Metric::new(
"cpu_frequency_mhz".to_string(),
MetricValue::Float(agent_data.system.cpu.frequency_mhz),
Status::Ok,
));
if let Some(temp) = agent_data.system.cpu.temperature_celsius {
metrics.push(Metric::new(
"cpu_temperature_celsius".to_string(),
MetricValue::Float(temp),
Status::Ok,
));
}
// Convert Memory data
metrics.push(Metric::new(
"memory_usage_percent".to_string(),
MetricValue::Float(agent_data.system.memory.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
"memory_total_gb".to_string(),
MetricValue::Float(agent_data.system.memory.total_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_used_gb".to_string(),
MetricValue::Float(agent_data.system.memory.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_available_gb".to_string(),
MetricValue::Float(agent_data.system.memory.available_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_swap_total_gb".to_string(),
MetricValue::Float(agent_data.system.memory.swap_total_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_swap_used_gb".to_string(),
MetricValue::Float(agent_data.system.memory.swap_used_gb),
Status::Ok,
));
// Convert tmpfs data
for tmpfs in &agent_data.system.memory.tmpfs {
if tmpfs.mount == "/tmp" {
metrics.push(Metric::new(
"memory_tmp_usage_percent".to_string(),
MetricValue::Float(tmpfs.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
"memory_tmp_used_gb".to_string(),
MetricValue::Float(tmpfs.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_tmp_total_gb".to_string(),
MetricValue::Float(tmpfs.total_gb),
Status::Ok,
));
}
}
// Add agent metadata
metrics.push(Metric::new(
"agent_version".to_string(),
MetricValue::String(agent_data.agent_version.clone()),
Status::Ok,
));
metrics.push(Metric::new(
"agent_heartbeat".to_string(),
MetricValue::Integer(agent_data.timestamp as i64),
Status::Ok,
));
// Convert storage data
for drive in &agent_data.system.storage.drives {
// Drive-level metrics
if let Some(temp) = drive.temperature_celsius {
metrics.push(Metric::new(
format!("disk_{}_temperature", drive.name),
MetricValue::Float(temp),
Status::Ok,
));
}
if let Some(wear) = drive.wear_percent {
metrics.push(Metric::new(
format!("disk_{}_wear_percent", drive.name),
MetricValue::Float(wear),
Status::Ok,
));
}
metrics.push(Metric::new(
format!("disk_{}_health", drive.name),
MetricValue::String(drive.health.clone()),
Status::Ok,
));
// Filesystem metrics
for fs in &drive.filesystems {
let fs_base = format!("disk_{}_fs_{}", drive.name, fs.mount.replace('/', "root"));
metrics.push(Metric::new(
format!("{}_usage_percent", fs_base),
MetricValue::Float(fs.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_used_gb", fs_base),
MetricValue::Float(fs.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_total_gb", fs_base),
MetricValue::Float(fs.total_gb),
Status::Ok,
));
}
}
// Convert storage pools
for pool in &agent_data.system.storage.pools {
let pool_base = format!("disk_{}", pool.name);
metrics.push(Metric::new(
format!("{}_usage_percent", pool_base),
MetricValue::Float(pool.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_used_gb", pool_base),
MetricValue::Float(pool.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_total_gb", pool_base),
MetricValue::Float(pool.total_gb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_pool_type", pool_base),
MetricValue::String(pool.pool_type.clone()),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_mount_point", pool_base),
MetricValue::String(pool.mount.clone()),
Status::Ok,
));
// Pool drive data
for drive in &pool.data_drives {
if let Some(temp) = drive.temperature_celsius {
metrics.push(Metric::new(
format!("disk_{}_{}_temperature", pool.name, drive.name),
MetricValue::Float(temp),
Status::Ok,
));
}
if let Some(wear) = drive.wear_percent {
metrics.push(Metric::new(
format!("disk_{}_{}_wear_percent", pool.name, drive.name),
MetricValue::Float(wear),
Status::Ok,
));
}
}
for drive in &pool.parity_drives {
if let Some(temp) = drive.temperature_celsius {
metrics.push(Metric::new(
format!("disk_{}_{}_temperature", pool.name, drive.name),
MetricValue::Float(temp),
Status::Ok,
));
}
if let Some(wear) = drive.wear_percent {
metrics.push(Metric::new(
format!("disk_{}_{}_wear_percent", pool.name, drive.name),
MetricValue::Float(wear),
Status::Ok,
));
}
}
}
// Convert service data
for service in &agent_data.services {
let service_base = format!("service_{}", service.name);
metrics.push(Metric::new(
format!("{}_status", service_base),
MetricValue::String(service.status.clone()),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_memory_mb", service_base),
MetricValue::Float(service.memory_mb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_disk_gb", service_base),
MetricValue::Float(service.disk_gb),
Status::Ok,
));
if service.user_stopped {
metrics.push(Metric::new(
format!("{}_user_stopped", service_base),
MetricValue::Boolean(true),
Status::Ok,
));
}
}
// Convert backup data
metrics.push(Metric::new(
"backup_status".to_string(),
MetricValue::String(agent_data.backup.status.clone()),
Status::Ok,
));
if let Some(last_run) = agent_data.backup.last_run {
metrics.push(Metric::new(
"backup_last_run_timestamp".to_string(),
MetricValue::Integer(last_run as i64),
Status::Ok,
));
}
if let Some(next_scheduled) = agent_data.backup.next_scheduled {
metrics.push(Metric::new(
"backup_next_scheduled_timestamp".to_string(),
MetricValue::Integer(next_scheduled as i64),
Status::Ok,
));
}
if let Some(size) = agent_data.backup.total_size_gb {
metrics.push(Metric::new(
"backup_size_gb".to_string(),
MetricValue::Float(size),
Status::Ok,
));
}
if let Some(health) = &agent_data.backup.repository_health {
metrics.push(Metric::new(
"backup_repository_health".to_string(),
MetricValue::String(health.clone()),
Status::Ok,
));
}
metrics
}
/// Get current metric for a specific host /// Get current metric for a specific host
pub fn get_metric(&self, hostname: &str, metric_name: &str) -> Option<&Metric> { pub fn get_metric(&self, hostname: &str, metric_name: &str) -> Option<&Metric> {
self.current_metrics.get(hostname)?.get(metric_name) self.current_metrics.get(hostname)?.get(metric_name)

View File

@@ -220,12 +220,12 @@ impl TuiApp {
let connection_ip = self.get_connection_ip(&hostname); let connection_ip = self.get_connection_ip(&hostname);
// Create command that shows logo, rebuilds, and waits for user input // Create command that shows logo, rebuilds, and waits for user input
let logo_and_rebuild = format!( let logo_and_rebuild = format!(
"bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {} ({})\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'", "echo 'Rebuilding system: {} ({})' && ssh -tt {}@{} \"bash -ic '{}'\"",
hostname, hostname,
connection_ip, connection_ip,
self.config.ssh.rebuild_user, self.config.ssh.rebuild_user,
connection_ip, connection_ip,
self.config.ssh.rebuild_alias self.config.ssh.rebuild_cmd
); );
std::process::Command::new("tmux") std::process::Command::new("tmux")
@@ -244,12 +244,12 @@ impl TuiApp {
let connection_ip = self.get_connection_ip(&hostname); let connection_ip = self.get_connection_ip(&hostname);
// Create command that shows logo, runs backup, and waits for user input // Create command that shows logo, runs backup, and waits for user input
let logo_and_backup = format!( let logo_and_backup = format!(
"bash -c 'cat << \"EOF\"\nBackup Operation\nTarget: {} ({})\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Backup completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'", "echo 'Running backup: {} ({})' && ssh -tt {}@{} \"bash -ic '{}'\"",
hostname, hostname,
connection_ip, connection_ip,
self.config.ssh.rebuild_user, self.config.ssh.rebuild_user,
connection_ip, connection_ip,
self.config.ssh.backup_alias format!("{} start borgbackup", self.config.ssh.service_manage_cmd)
); );
std::process::Command::new("tmux") std::process::Command::new("tmux")
@@ -267,13 +267,12 @@ impl TuiApp {
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) { if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
let connection_ip = self.get_connection_ip(&hostname); let connection_ip = self.get_connection_ip(&hostname);
let service_start_command = format!( let service_start_command = format!(
"bash -c 'cat << \"EOF\"\nService Start: {}\nTarget: {} ({})\n\nEOF\nssh -tt {}@{} \"echo \\\"Starting service and following logs (Ctrl+C to stop):\\\" && echo \\\"========================================\\\" && sudo systemctl start {} & sudo journalctl -fu {} --since='1 second ago' --no-pager\"\necho\necho \"========================================\"\necho \"Operation completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'", "echo 'Starting service: {} on {}' && ssh -tt {}@{} \"bash -ic '{} start {}'\"",
service_name, service_name,
hostname, hostname,
connection_ip,
self.config.ssh.rebuild_user, self.config.ssh.rebuild_user,
connection_ip, connection_ip,
service_name, self.config.ssh.service_manage_cmd,
service_name service_name
); );
@@ -292,14 +291,12 @@ impl TuiApp {
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) { if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
let connection_ip = self.get_connection_ip(&hostname); let connection_ip = self.get_connection_ip(&hostname);
let service_stop_command = format!( let service_stop_command = format!(
"bash -c 'cat << \"EOF\"\nService Stop: {}.service\nTarget: {} ({})\n\nEOF\nssh -tt {}@{} \"echo \\\"Stopping service...\\\" && sudo systemctl stop {}.service && echo \\\"Service stopped! Final logs:\\\" && echo \\\"========================================\\\" && sudo journalctl -u {}.service --no-pager -n 10 && echo \\\"========================================\\\" && sudo systemctl status {}.service --no-pager -l\"\necho\necho \"========================================\"\necho \"Operation completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'", "echo 'Stopping service: {} on {}' && ssh -tt {}@{} \"bash -ic '{} stop {}'\"",
service_name, service_name,
hostname, hostname,
connection_ip,
self.config.ssh.rebuild_user, self.config.ssh.rebuild_user,
connection_ip, connection_ip,
service_name, self.config.ssh.service_manage_cmd,
service_name,
service_name service_name
); );
@@ -313,14 +310,15 @@ impl TuiApp {
.ok(); // Ignore errors, tmux will handle them .ok(); // Ignore errors, tmux will handle them
} }
} }
KeyCode::Char('J') => { KeyCode::Char('L') => {
// Show service logs via journalctl in tmux split window // Show service logs via service-manage script in tmux split window
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) { if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
let connection_ip = self.get_connection_ip(&hostname); let connection_ip = self.get_connection_ip(&hostname);
let journalctl_command = format!( let logs_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo journalctl -u {}.service -f --no-pager -n 50'; exit\"", "ssh -tt {}@{} '{} logs {}'",
self.config.ssh.rebuild_user, self.config.ssh.rebuild_user,
connection_ip, connection_ip,
self.config.ssh.service_manage_cmd,
service_name service_name
); );
@@ -329,37 +327,11 @@ impl TuiApp {
.arg("-v") .arg("-v")
.arg("-p") .arg("-p")
.arg("30") .arg("30")
.arg(&journalctl_command) .arg(&logs_command)
.spawn() .spawn()
.ok(); // Ignore errors, tmux will handle them .ok(); // Ignore errors, tmux will handle them
} }
} }
KeyCode::Char('L') => {
// Show custom service log file in tmux split window
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
// Check if this service has a custom log file configured
if let Some(host_logs) = self.config.service_logs.get(&hostname) {
if let Some(log_config) = host_logs.iter().find(|config| config.service_name == service_name) {
let connection_ip = self.get_connection_ip(&hostname);
let tail_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo tail -n 50 -f {}'; exit\"",
self.config.ssh.rebuild_user,
connection_ip,
log_config.log_file_path
);
std::process::Command::new("tmux")
.arg("split-window")
.arg("-v")
.arg("-p")
.arg("30")
.arg(&tail_command)
.spawn()
.ok(); // Ignore errors, tmux will handle them
}
}
}
}
KeyCode::Char('w') => { KeyCode::Char('w') => {
// Wake on LAN for offline hosts // Wake on LAN for offline hosts
if let Some(hostname) = self.current_host.clone() { if let Some(hostname) = self.current_host.clone() {
@@ -392,7 +364,8 @@ impl TuiApp {
if let Some(hostname) = self.current_host.clone() { if let Some(hostname) = self.current_host.clone() {
let connection_ip = self.get_connection_ip(&hostname); let connection_ip = self.get_connection_ip(&hostname);
let ssh_command = format!( let ssh_command = format!(
"ssh -tt {}@{}", "echo 'Opening SSH terminal to: {}' && ssh -tt {}@{}",
hostname,
self.config.ssh.rebuild_user, self.config.ssh.rebuild_user,
connection_ip connection_ip
); );
@@ -616,12 +589,13 @@ impl TuiApp {
// Split the title bar into left and right sections // Split the title bar into left and right sections
let chunks = Layout::default() let chunks = Layout::default()
.direction(Direction::Horizontal) .direction(Direction::Horizontal)
.constraints([Constraint::Length(15), Constraint::Min(0)]) .constraints([Constraint::Length(22), Constraint::Min(0)])
.split(area); .split(area);
// Left side: "cm-dashboard" text // Left side: "cm-dashboard" text with version
let title_text = format!(" cm-dashboard v{}", env!("CARGO_PKG_VERSION"));
let left_span = Span::styled( let left_span = Span::styled(
" cm-dashboard", &title_text,
Style::default().fg(Theme::background()).bg(background_color).add_modifier(Modifier::BOLD) Style::default().fg(Theme::background()).bg(background_color).add_modifier(Modifier::BOLD)
); );
let left_title = Paragraph::new(Line::from(vec![left_span])) let left_title = Paragraph::new(Line::from(vec![left_span]))
@@ -693,35 +667,27 @@ impl TuiApp {
return host_summary_metric.status; return host_summary_metric.status;
} }
// Fallback to old aggregation logic with proper Pending handling // Rewritten status aggregation - only Critical, Warning, or OK for top bar
let mut has_critical = false; let mut has_critical = false;
let mut has_warning = false; let mut has_warning = false;
let mut has_pending = false;
let mut ok_count = 0;
for metric in &metrics { for metric in &metrics {
match metric.status { match metric.status {
Status::Critical => has_critical = true, Status::Critical => has_critical = true,
Status::Warning => has_warning = true, Status::Warning => has_warning = true,
Status::Pending => has_pending = true, // Treat all other statuses as OK for top bar aggregation
Status::Ok => ok_count += 1, Status::Ok | Status::Pending | Status::Inactive | Status::Unknown => {},
Status::Inactive => ok_count += 1, // Treat inactive as OK for aggregation Status::Offline => {}, // Ignore offline
Status::Unknown => {}, // Ignore unknown for aggregation
Status::Offline => {}, // Ignore offline for aggregation
} }
} }
// Priority order: Critical > Warning > Pending > Ok > Unknown // Only return Critical, Warning, or OK - no other statuses
if has_critical { if has_critical {
Status::Critical Status::Critical
} else if has_warning { } else if has_warning {
Status::Warning Status::Warning
} else if has_pending {
Status::Pending
} else if ok_count > 0 {
Status::Ok
} else { } else {
Status::Unknown Status::Ok
} }
} }
@@ -745,9 +711,10 @@ impl TuiApp {
shortcuts.push("Tab: Host".to_string()); shortcuts.push("Tab: Host".to_string());
shortcuts.push("↑↓/jk: Select".to_string()); shortcuts.push("↑↓/jk: Select".to_string());
shortcuts.push("r: Rebuild".to_string()); shortcuts.push("r: Rebuild".to_string());
shortcuts.push("B: Backup".to_string());
shortcuts.push("s/S: Start/Stop".to_string()); shortcuts.push("s/S: Start/Stop".to_string());
shortcuts.push("J: Logs".to_string()); shortcuts.push("L: Logs".to_string());
shortcuts.push("L: Custom".to_string()); shortcuts.push("t: Terminal".to_string());
shortcuts.push("w: Wake".to_string()); shortcuts.push("w: Wake".to_string());
// Always show quit // Always show quit

View File

@@ -30,6 +30,8 @@ pub struct BackupWidget {
backup_disk_product_name: Option<String>, backup_disk_product_name: Option<String>,
/// Backup disk serial number from SMART data /// Backup disk serial number from SMART data
backup_disk_serial_number: Option<String>, backup_disk_serial_number: Option<String>,
/// Backup disk wear percentage from SMART data
backup_disk_wear_percent: Option<f32>,
/// Backup disk filesystem label /// Backup disk filesystem label
backup_disk_filesystem_label: Option<String>, backup_disk_filesystem_label: Option<String>,
/// Number of completed services /// Number of completed services
@@ -65,6 +67,7 @@ impl BackupWidget {
backup_disk_used_gb: None, backup_disk_used_gb: None,
backup_disk_product_name: None, backup_disk_product_name: None,
backup_disk_serial_number: None, backup_disk_serial_number: None,
backup_disk_wear_percent: None,
backup_disk_filesystem_label: None, backup_disk_filesystem_label: None,
services_completed_count: None, services_completed_count: None,
services_failed_count: None, services_failed_count: None,
@@ -197,6 +200,9 @@ impl Widget for BackupWidget {
"backup_disk_serial_number" => { "backup_disk_serial_number" => {
self.backup_disk_serial_number = Some(metric.value.as_string()); self.backup_disk_serial_number = Some(metric.value.as_string());
} }
"backup_disk_wear_percent" => {
self.backup_disk_wear_percent = metric.value.as_f32();
}
"backup_disk_filesystem_label" => { "backup_disk_filesystem_label" => {
self.backup_disk_filesystem_label = Some(metric.value.as_string()); self.backup_disk_filesystem_label = Some(metric.value.as_string());
} }
@@ -328,21 +334,31 @@ impl BackupWidget {
); );
lines.push(ratatui::text::Line::from(disk_spans)); lines.push(ratatui::text::Line::from(disk_spans));
// Serial number as sub-item // Collect sub-items to determine tree structure
let mut sub_items = Vec::new();
if let Some(serial) = &self.backup_disk_serial_number { if let Some(serial) = &self.backup_disk_serial_number {
lines.push(ratatui::text::Line::from(vec![ sub_items.push(format!("S/N: {}", serial));
ratatui::text::Span::styled(" ├─ ", Typography::tree()),
ratatui::text::Span::styled(format!("S/N: {}", serial), Typography::secondary())
]));
} }
// Usage as sub-item if let Some(wear) = self.backup_disk_wear_percent {
sub_items.push(format!("Wear: {:.0}%", wear));
}
if let (Some(used), Some(total)) = (self.backup_disk_used_gb, self.backup_disk_total_gb) { if let (Some(used), Some(total)) = (self.backup_disk_used_gb, self.backup_disk_total_gb) {
let used_str = Self::format_size_with_proper_units(used); let used_str = Self::format_size_with_proper_units(used);
let total_str = Self::format_size_with_proper_units(total); let total_str = Self::format_size_with_proper_units(total);
sub_items.push(format!("Usage: {}/{}", used_str, total_str));
}
// Render sub-items with proper tree structure
let num_items = sub_items.len();
for (i, item) in sub_items.into_iter().enumerate() {
let is_last = i == num_items - 1;
let tree_char = if is_last { " └─ " } else { " ├─ " };
lines.push(ratatui::text::Line::from(vec![ lines.push(ratatui::text::Line::from(vec![
ratatui::text::Span::styled(" └─ ", Typography::tree()), ratatui::text::Span::styled(tree_char, Typography::tree()),
ratatui::text::Span::styled(format!("Usage: {}/{}", used_str, total_str), Typography::secondary()) ratatui::text::Span::styled(item, Typography::secondary())
])); ]));
} }
} }

View File

@@ -209,36 +209,13 @@ impl ServicesWidget {
} }
/// Get currently selected service name (for actions) /// Get currently selected service name (for actions)
/// Only returns parent service names since only parent services can be selected
pub fn get_selected_service(&self) -> Option<String> { pub fn get_selected_service(&self) -> Option<String> {
// Build the same display list to find the selected service // Only parent services can be selected, so just get the parent service at selected_index
let mut display_lines: Vec<(String, Status, bool, Option<(ServiceInfo, bool)>, String)> = Vec::new();
let mut parent_services: Vec<_> = self.parent_services.iter().collect(); let mut parent_services: Vec<_> = self.parent_services.iter().collect();
parent_services.sort_by(|(a, _), (b, _)| a.cmp(b)); parent_services.sort_by(|(a, _), (b, _)| a.cmp(b));
for (parent_name, parent_info) in parent_services {
let parent_line = self.format_parent_service_line(parent_name, parent_info);
display_lines.push((parent_line, parent_info.widget_status, false, None, parent_name.clone()));
if let Some(sub_list) = self.sub_services.get(parent_name) {
let mut sorted_subs = sub_list.clone();
sorted_subs.sort_by(|(a, _), (b, _)| a.cmp(b));
for (i, (sub_name, sub_info)) in sorted_subs.iter().enumerate() {
let is_last_sub = i == sorted_subs.len() - 1;
let full_sub_name = format!("{}_{}", parent_name, sub_name);
display_lines.push((
sub_name.clone(),
sub_info.widget_status,
true,
Some((sub_info.clone(), is_last_sub)),
full_sub_name,
));
}
}
}
display_lines.get(self.selected_index).map(|(_, _, _, _, raw_name)| raw_name.clone()) parent_services.get(self.selected_index).map(|(name, _)| name.to_string())
} }
/// Get total count of selectable services (parent services only, not sub-services) /// Get total count of selectable services (parent services only, not sub-services)

View File

@@ -45,12 +45,15 @@ pub struct SystemWidget {
struct StoragePool { struct StoragePool {
name: String, name: String,
mount_point: String, mount_point: String,
pool_type: String, // "Single", "Raid0", etc. pool_type: String, // "single", "mergerfs (2+1)", "RAID5 (3+1)", etc.
pool_health: Option<String>, // "healthy", "degraded", "critical", "rebuilding"
drives: Vec<StorageDrive>, drives: Vec<StorageDrive>,
filesystems: Vec<FileSystem>, // For physical drive pools: individual filesystem children
usage_percent: Option<f32>, usage_percent: Option<f32>,
used_gb: Option<f32>, used_gb: Option<f32>,
total_gb: Option<f32>, total_gb: Option<f32>,
status: Status, status: Status,
health_status: Status, // Separate status for pool health vs usage
} }
#[derive(Clone)] #[derive(Clone)]
@@ -61,6 +64,16 @@ struct StorageDrive {
status: Status, status: Status,
} }
#[derive(Clone)]
struct FileSystem {
mount_point: String,
usage_percent: Option<f32>,
used_gb: Option<f32>,
total_gb: Option<f32>,
available_gb: Option<f32>,
status: Status,
}
impl SystemWidget { impl SystemWidget {
pub fn new() -> Self { pub fn new() -> Self {
Self { Self {
@@ -133,50 +146,75 @@ impl SystemWidget {
self.agent_hash.as_ref() self.agent_hash.as_ref()
} }
/// Get mount point for a pool name /// Get default mount point for a pool name (fallback only - should use actual mount_point metrics)
fn get_mount_point_for_pool(&self, pool_name: &str) -> String { fn get_mount_point_for_pool(&self, pool_name: &str) -> String {
match pool_name { // For device names, use the device name directly as display name
"root" => "/".to_string(), if pool_name.starts_with("nvme") || pool_name.starts_with("sd") || pool_name.starts_with("hd") {
"steampool" => "/mnt/steampool".to_string(), pool_name.to_string()
"steampool_1" => "/steampool_1".to_string(), } else {
"steampool_2" => "/steampool_2".to_string(), // For other pools, use the pool name as-is (will be overridden by mount_point metric)
_ => format!("/{}", pool_name), // Default fallback pool_name.to_string()
} }
} }
/// Parse storage metrics into pools and drives /// Parse storage metrics into pools and drives
fn update_storage_from_metrics(&mut self, metrics: &[&Metric]) { fn update_storage_from_metrics(&mut self, metrics: &[&Metric]) {
let mut pools: std::collections::HashMap<String, StoragePool> = std::collections::HashMap::new(); let mut pools: std::collections::HashMap<String, StoragePool> = std::collections::HashMap::new();
for metric in metrics { for metric in metrics {
if metric.name.starts_with("disk_") { if metric.name.starts_with("disk_") {
if let Some(pool_name) = self.extract_pool_name(&metric.name) { if let Some(pool_name) = self.extract_pool_name(&metric.name) {
let mount_point = self.get_mount_point_for_pool(&pool_name);
let pool = pools.entry(pool_name.clone()).or_insert_with(|| StoragePool { let pool = pools.entry(pool_name.clone()).or_insert_with(|| StoragePool {
name: pool_name.clone(), name: pool_name.clone(),
mount_point: mount_point.clone(), mount_point: self.get_mount_point_for_pool(&pool_name), // Default fallback
pool_type: "Single".to_string(), // Default, could be enhanced pool_type: "single".to_string(), // Default, will be updated
pool_health: None,
drives: Vec::new(), drives: Vec::new(),
filesystems: Vec::new(),
usage_percent: None, usage_percent: None,
used_gb: None, used_gb: None,
total_gb: None, total_gb: None,
status: Status::Unknown, status: Status::Unknown,
health_status: Status::Unknown,
}); });
// Parse different metric types // Parse different metric types
if metric.name.contains("_usage_percent") { if metric.name.contains("_usage_percent") && !metric.name.contains("_fs_") {
// Only use drive-level metrics for pool totals, not filesystem metrics
if let MetricValue::Float(usage) = metric.value { if let MetricValue::Float(usage) = metric.value {
pool.usage_percent = Some(usage); pool.usage_percent = Some(usage);
pool.status = metric.status.clone(); pool.status = metric.status.clone();
} }
} else if metric.name.contains("_used_gb") { } else if metric.name.contains("_used_gb") && !metric.name.contains("_fs_") {
// Only use drive-level metrics for pool totals, not filesystem metrics
if let MetricValue::Float(used) = metric.value { if let MetricValue::Float(used) = metric.value {
pool.used_gb = Some(used); pool.used_gb = Some(used);
} }
} else if metric.name.contains("_total_gb") { } else if metric.name.contains("_total_gb") && !metric.name.contains("_fs_") {
// Only use drive-level metrics for pool totals, not filesystem metrics
if let MetricValue::Float(total) = metric.value { if let MetricValue::Float(total) = metric.value {
pool.total_gb = Some(total); pool.total_gb = Some(total);
} }
} else if metric.name.contains("_mount_point") {
if let MetricValue::String(mount_point) = &metric.value {
pool.mount_point = mount_point.clone();
}
} else if metric.name.contains("_pool_type") {
if let MetricValue::String(pool_type) = &metric.value {
pool.pool_type = pool_type.clone();
}
} else if metric.name.contains("_pool_health") {
if let MetricValue::String(health) = &metric.value {
pool.pool_health = Some(health.clone());
pool.health_status = metric.status.clone();
}
} else if metric.name.contains("_health") && !metric.name.contains("_pool_health") {
// Handle physical drive health metrics (disk_{drive}_health)
if let MetricValue::String(health) = &metric.value {
// For physical drives, use the drive health as pool health
pool.pool_health = Some(health.clone());
pool.health_status = metric.status.clone();
}
} else if metric.name.contains("_temperature") { } else if metric.name.contains("_temperature") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) { if let Some(drive_name) = self.extract_drive_name(&metric.name) {
// Find existing drive or create new one // Find existing drive or create new one
@@ -194,12 +232,16 @@ impl SystemWidget {
if let MetricValue::Float(temp) = metric.value { if let MetricValue::Float(temp) = metric.value {
drive.temperature = Some(temp); drive.temperature = Some(temp);
drive.status = metric.status.clone(); drive.status = metric.status.clone();
// For physical drives, if this is the main drive, also update pool health
if drive.name == pool.name && pool.health_status == Status::Unknown {
pool.health_status = metric.status.clone();
}
} }
} }
} }
} else if metric.name.contains("_wear_percent") { } else if metric.name.contains("_wear_percent") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) { if let Some(drive_name) = self.extract_drive_name(&metric.name) {
// Find existing drive or create new one // For physical drives, ensure we create the drive object
let drive_exists = pool.drives.iter().any(|d| d.name == drive_name); let drive_exists = pool.drives.iter().any(|d| d.name == drive_name);
if !drive_exists { if !drive_exists {
pool.drives.push(StorageDrive { pool.drives.push(StorageDrive {
@@ -214,6 +256,95 @@ impl SystemWidget {
if let MetricValue::Float(wear) = metric.value { if let MetricValue::Float(wear) = metric.value {
drive.wear_percent = Some(wear); drive.wear_percent = Some(wear);
drive.status = metric.status.clone(); drive.status = metric.status.clone();
// For physical drives, if this is the main drive, also update pool health
if drive.name == pool.name && pool.health_status == Status::Unknown {
pool.health_status = metric.status.clone();
}
}
}
}
} else if metric.name.contains("_fs_") {
// Handle filesystem metrics for physical drive pools (disk_{pool}_fs_{fs_name}_{metric})
if let (Some(fs_name), Some(metric_type)) = self.extract_filesystem_metric(&metric.name) {
// Find or create filesystem entry
let fs_exists = pool.filesystems.iter().any(|fs| {
let fs_id = if fs.mount_point == "/" {
"root".to_string()
} else {
fs.mount_point.trim_start_matches('/').replace('/', "_")
};
fs_id == fs_name
});
if !fs_exists {
// Create filesystem entry with correct mount point
let mount_point = if metric_type == "mount_point" {
if let MetricValue::String(mount) = &metric.value {
mount.clone()
} else {
// Fallback: handle special cases
if fs_name == "root" {
"/".to_string()
} else {
format!("/{}", fs_name.replace('_', "/"))
}
}
} else {
// Fallback for non-mount_point metrics: generate mount point from fs_name
if fs_name == "root" {
"/".to_string()
} else {
format!("/{}", fs_name.replace('_', "/"))
}
};
pool.filesystems.push(FileSystem {
mount_point,
usage_percent: None,
used_gb: None,
total_gb: None,
available_gb: None,
status: Status::Unknown,
});
}
// Update the filesystem with the metric value
if let Some(filesystem) = pool.filesystems.iter_mut().find(|fs| {
let fs_id = if fs.mount_point == "/" {
"root".to_string()
} else {
fs.mount_point.trim_start_matches('/').replace('/', "_")
};
fs_id == fs_name
}) {
match metric_type.as_str() {
"usage_percent" => {
if let MetricValue::Float(usage) = metric.value {
filesystem.usage_percent = Some(usage);
filesystem.status = metric.status.clone();
}
}
"used_gb" => {
if let MetricValue::Float(used) = metric.value {
filesystem.used_gb = Some(used);
}
}
"total_gb" => {
if let MetricValue::Float(total) = metric.value {
filesystem.total_gb = Some(total);
}
}
"available_gb" => {
if let MetricValue::Float(available) = metric.value {
filesystem.available_gb = Some(available);
}
}
"mount_point" => {
if let MetricValue::String(mount) = &metric.value {
filesystem.mount_point = mount.clone();
}
}
_ => {}
} }
} }
} }
@@ -230,108 +361,161 @@ impl SystemWidget {
/// Extract pool name from disk metric name /// Extract pool name from disk metric name
fn extract_pool_name(&self, metric_name: &str) -> Option<String> { fn extract_pool_name(&self, metric_name: &str) -> Option<String> {
// Pattern: disk_{pool_name}_{drive_name}_{metric_type} // Pattern: disk_{pool_name}_{various suffixes}
// Since pool_name can contain underscores, work backwards from known metric suffixes // Since pool_name can contain underscores, work backwards from known metric suffixes
if metric_name.starts_with("disk_") { if metric_name.starts_with("disk_") {
// First try drive-specific metrics that have device names // Handle filesystem metrics: disk_{pool}_fs_{filesystem}_{metric}
if let Some(suffix_pos) = metric_name.rfind("_temperature") if metric_name.contains("_fs_") {
.or_else(|| metric_name.rfind("_wear_percent")) if let Some(fs_pos) = metric_name.find("_fs_") {
.or_else(|| metric_name.rfind("_health")) { return Some(metric_name[5..fs_pos].to_string()); // Skip "disk_", extract pool name before "_fs_"
// Find the second-to-last underscore to get pool name
let before_suffix = &metric_name[..suffix_pos];
if let Some(drive_start) = before_suffix.rfind('_') {
return Some(metric_name[5..drive_start].to_string()); // Skip "disk_"
} }
} }
// For pool-level metrics (usage_percent, used_gb, total_gb), take everything before the metric suffix
else if let Some(suffix_pos) = metric_name.rfind("_usage_percent") // Handle pool-level metrics (usage_percent, used_gb, total_gb, mount_point, pool_type, pool_health)
.or_else(|| metric_name.rfind("_used_gb")) // Use rfind to get the last occurrence of these suffixes
.or_else(|| metric_name.rfind("_total_gb")) { let pool_suffixes = ["_usage_percent", "_used_gb", "_total_gb", "_available_gb", "_mount_point", "_pool_type", "_pool_health"];
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_" for suffix in pool_suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
} }
// Fallback to old behavior for unknown patterns
else if let Some(captures) = metric_name.strip_prefix("disk_") { // Handle physical drive metrics: disk_{drive}_health, disk_{drive}_wear_percent, and disk_{drive}_temperature
if let Some(pos) = captures.find('_') { if (metric_name.ends_with("_health") && !metric_name.contains("_pool_health"))
return Some(captures[..pos].to_string()); || metric_name.ends_with("_wear_percent")
|| metric_name.ends_with("_temperature") {
// Count underscores to distinguish physical drive metrics (disk_{drive}_metric)
// from pool drive metrics (disk_{pool}_{drive}_metric)
let underscore_count = metric_name.matches('_').count();
// disk_nvme0n1_wear_percent has 3 underscores: disk_nvme0n1_wear_percent
if underscore_count == 3 { // disk_{drive}_metric (where drive has underscores)
if let Some(suffix_pos) = metric_name.rfind("_health")
.or_else(|| metric_name.rfind("_wear_percent"))
.or_else(|| metric_name.rfind("_temperature")) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
}
}
// Handle drive-specific metrics: disk_{pool}_{drive}_{metric}
let drive_suffixes = ["_temperature", "_health"];
for suffix in drive_suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
// Extract pool name by finding the second-to-last underscore
let before_suffix = &metric_name[..suffix_pos];
if let Some(drive_start) = before_suffix.rfind('_') {
if drive_start > 5 {
return Some(metric_name[5..drive_start].to_string()); // Skip "disk_"
}
}
} }
} }
} }
None None
} }
/// Extract filesystem name and metric type from filesystem metric names
/// Pattern: disk_{pool}_fs_{filesystem_name}_{metric_type}
fn extract_filesystem_metric(&self, metric_name: &str) -> (Option<String>, Option<String>) {
if metric_name.starts_with("disk_") && metric_name.contains("_fs_") {
// Find the _fs_ part
if let Some(fs_start) = metric_name.find("_fs_") {
let after_fs = &metric_name[fs_start + 4..]; // Skip "_fs_"
// Look for known metric suffixes (these can contain underscores)
let known_suffixes = ["usage_percent", "used_gb", "total_gb", "available_gb", "mount_point"];
for suffix in known_suffixes {
if after_fs.ends_with(suffix) {
// Extract filesystem name by removing suffix and underscore
if let Some(underscore_pos) = after_fs.rfind(&format!("_{}", suffix)) {
let fs_name = after_fs[..underscore_pos].to_string();
return (Some(fs_name), Some(suffix.to_string()));
}
}
}
}
}
(None, None)
}
/// Extract drive name from disk metric name /// Extract drive name from disk metric name
fn extract_drive_name(&self, metric_name: &str) -> Option<String> { fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
// Pattern: disk_{pool_name}_{drive_name}_{metric_type} // Pattern: disk_{pool_name}_{drive_name}_{metric_type} OR disk_{drive_name}_{metric_type}
// Since pool_name can contain underscores, work backwards from known metric suffixes // Pool drives: disk_srv_media_sdb_temperature
// Physical drives: disk_nvme0n1_temperature
if metric_name.starts_with("disk_") { if metric_name.starts_with("disk_") {
if let Some(suffix_pos) = metric_name.rfind("_temperature") if let Some(suffix_pos) = metric_name.rfind("_temperature")
.or_else(|| metric_name.rfind("_wear_percent")) .or_else(|| metric_name.rfind("_wear_percent"))
.or_else(|| metric_name.rfind("_health")) { .or_else(|| metric_name.rfind("_health")) {
// Find the second-to-last underscore to get the drive name
let before_suffix = &metric_name[..suffix_pos]; let before_suffix = &metric_name[..suffix_pos];
// Extract the last component as drive name (e.g., "sdb", "sdc", "nvme0n1")
if let Some(drive_start) = before_suffix.rfind('_') { if let Some(drive_start) = before_suffix.rfind('_') {
return Some(before_suffix[drive_start + 1..].to_string()); return Some(before_suffix[drive_start + 1..].to_string());
} else {
// Handle physical drive metrics: disk_{drive}_metric (no pool)
// Extract everything after "disk_" as the drive name
return Some(before_suffix[5..].to_string()); // Skip "disk_"
} }
} }
} }
None None
} }
/// Render storage section with tree structure /// Render storage section with enhanced tree structure
fn render_storage(&self) -> Vec<Line<'_>> { fn render_storage(&self) -> Vec<Line<'_>> {
let mut lines = Vec::new(); let mut lines = Vec::new();
for pool in &self.storage_pools { for pool in &self.storage_pools {
// Pool header line // Pool header line with type and health
let usage_text = match (pool.usage_percent, pool.used_gb, pool.total_gb) { let pool_label = if pool.pool_type.starts_with("drive (") {
(Some(pct), Some(used), Some(total)) => { // For physical drives, show the drive name with temperature and wear percentage if available
format!("{:.0}% {:.1}GB/{:.1}GB", pct, used, total) // Look for any drive with temp/wear data (physical drives may have drives named after the pool)
let temp_opt = pool.drives.iter()
.find_map(|d| d.temperature);
let wear_opt = pool.drives.iter()
.find_map(|d| d.wear_percent);
let mut drive_info = Vec::new();
if let Some(temp) = temp_opt {
drive_info.push(format!("T: {:.0}°C", temp));
} }
_ => "—% —GB/—GB".to_string(), if let Some(wear) = wear_opt {
}; drive_info.push(format!("W: {:.0}%", wear));
}
let pool_label = if pool.pool_type.to_lowercase() == "single" {
if drive_info.is_empty() {
format!("{}:", pool.name)
} else {
format!("{} {}:", pool.name, drive_info.join(" "))
}
} else if pool.pool_type == "single" {
format!("{}:", pool.mount_point) format!("{}:", pool.mount_point)
} else { } else {
format!("{} ({}):", pool.mount_point, pool.pool_type) format!("{} ({}):", pool.mount_point, pool.pool_type)
}; };
let pool_spans = StatusIcons::create_status_spans( let pool_spans = StatusIcons::create_status_spans(
pool.status.clone(), pool.health_status.clone(),
&pool_label &pool_label
); );
lines.push(Line::from(pool_spans)); lines.push(Line::from(pool_spans));
// Drive lines with tree structure // Skip pool health line as discussed - removed
let has_usage_line = pool.usage_percent.is_some();
for (i, drive) in pool.drives.iter().enumerate() { // Total usage line (only show for multi-drive pools, skip for single physical drives)
let is_last_drive = i == pool.drives.len() - 1; if !pool.pool_type.starts_with("drive (") {
let tree_symbol = if is_last_drive && !has_usage_line { "└─" } else { "├─" }; let usage_text = match (pool.usage_percent, pool.used_gb, pool.total_gb) {
(Some(pct), Some(used), Some(total)) => {
let mut drive_info = Vec::new(); format!("Total: {:.0}% {:.1}GB/{:.1}GB", pct, used, total)
if let Some(temp) = drive.temperature { }
drive_info.push(format!("T: {:.0}C", temp)); _ => "Total: —% —GB/—GB".to_string(),
}
if let Some(wear) = drive.wear_percent {
drive_info.push(format!("W: {:.0}%", wear));
}
let drive_text = if drive_info.is_empty() {
drive.name.clone()
} else {
format!("{} {}", drive.name, drive_info.join(""))
}; };
let mut drive_spans = vec![ let has_drives = !pool.drives.is_empty();
Span::raw(" "), let has_filesystems = !pool.filesystems.is_empty();
Span::styled(tree_symbol, Typography::tree()), let has_children = has_drives || has_filesystems;
Span::raw(" "), let tree_symbol = if has_children { "├─" } else { "└─" };
];
drive_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(drive_spans));
}
// Usage line
if pool.usage_percent.is_some() {
let tree_symbol = "└─";
let mut usage_spans = vec![ let mut usage_spans = vec![
Span::raw(" "), Span::raw(" "),
Span::styled(tree_symbol, Typography::tree()), Span::styled(tree_symbol, Typography::tree()),
@@ -340,10 +524,126 @@ impl SystemWidget {
usage_spans.extend(StatusIcons::create_status_spans(pool.status.clone(), &usage_text)); usage_spans.extend(StatusIcons::create_status_spans(pool.status.clone(), &usage_text));
lines.push(Line::from(usage_spans)); lines.push(Line::from(usage_spans));
} }
// Drive lines with enhanced grouping
if pool.pool_type.contains("mergerfs") && pool.drives.len() > 1 {
// Group drives by type for mergerfs pools
let (data_drives, parity_drives): (Vec<_>, Vec<_>) = pool.drives.iter().enumerate()
.partition(|(_, drive)| {
// Simple heuristic: drives with 'parity' in name or sdc (common parity drive)
!drive.name.to_lowercase().contains("parity") && drive.name != "sdc"
});
// Show data drives
if !data_drives.is_empty() {
lines.push(Line::from(vec![
Span::raw(" "),
Span::styled("├─ ", Typography::tree()),
Span::styled("Data Disks:", Typography::secondary()),
]));
for (i, (_, drive)) in data_drives.iter().enumerate() {
let is_last = i == data_drives.len() - 1;
if is_last && parity_drives.is_empty() {
self.render_drive_line(&mut lines, drive, "│ └─");
} else {
self.render_drive_line(&mut lines, drive, "│ ├─");
}
}
}
// Show parity drives
if !parity_drives.is_empty() {
lines.push(Line::from(vec![
Span::raw(" "),
Span::styled("└─ ", Typography::tree()),
Span::styled("Parity:", Typography::secondary()),
]));
for (i, (_, drive)) in parity_drives.iter().enumerate() {
let is_last = i == parity_drives.len() - 1;
if is_last {
self.render_drive_line(&mut lines, drive, " └─");
} else {
self.render_drive_line(&mut lines, drive, " ├─");
}
}
}
} else if pool.pool_type != "single" && pool.drives.len() > 1 {
// Regular drive listing for non-mergerfs multi-drive pools
for (i, drive) in pool.drives.iter().enumerate() {
let is_last = i == pool.drives.len() - 1;
let tree_symbol = if is_last { "└─" } else { "├─" };
self.render_drive_line(&mut lines, drive, tree_symbol);
}
} else if pool.pool_type.starts_with("drive (") {
// Physical drive pools: wear data shown in header, skip drive lines, show filesystems directly
for (i, filesystem) in pool.filesystems.iter().enumerate() {
let is_last = i == pool.filesystems.len() - 1;
let tree_symbol = if is_last { "└─" } else { "├─" };
let fs_text = match (filesystem.usage_percent, filesystem.used_gb, filesystem.total_gb) {
(Some(pct), Some(used), Some(total)) => {
format!("{}: {:.0}% {:.1}GB/{:.1}GB", filesystem.mount_point, pct, used, total)
}
(Some(pct), _, Some(total)) => {
format!("{}: {:.0}% —GB/{:.1}GB", filesystem.mount_point, pct, total)
}
(Some(pct), _, _) => {
format!("{}: {:.0}% —GB/—GB", filesystem.mount_point, pct)
}
(_, Some(used), Some(total)) => {
format!("{}: —% {:.1}GB/{:.1}GB", filesystem.mount_point, used, total)
}
_ => format!("{}: —% —GB/—GB", filesystem.mount_point),
};
let mut fs_spans = vec![
Span::raw(" "),
Span::styled(tree_symbol, Typography::tree()),
Span::raw(" "),
];
fs_spans.extend(StatusIcons::create_status_spans(filesystem.status.clone(), &fs_text));
lines.push(Line::from(fs_spans));
}
} else {
// Single drive or simple pools
for (i, drive) in pool.drives.iter().enumerate() {
let is_last = i == pool.drives.len() - 1;
let tree_symbol = if is_last { "└─" } else { "├─" };
self.render_drive_line(&mut lines, drive, tree_symbol);
}
}
} }
lines lines
} }
/// Helper to render a single drive line
fn render_drive_line<'a>(&self, lines: &mut Vec<Line<'a>>, drive: &StorageDrive, tree_symbol: &'a str) {
let mut drive_info = Vec::new();
if let Some(temp) = drive.temperature {
drive_info.push(format!("T: {:.0}°C", temp));
}
if let Some(wear) = drive.wear_percent {
drive_info.push(format!("W: {:.0}%", wear));
}
// Always show drive name with info, or just name if no info available
let drive_text = if drive_info.is_empty() {
drive.name.clone()
} else {
format!("{} {}", drive.name, drive_info.join(" "))
};
let mut drive_spans = vec![
Span::raw(" "),
Span::styled(tree_symbol, Typography::tree()),
Span::raw(" "),
];
drive_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(drive_spans));
}
} }
impl Widget for SystemWidget { impl Widget for SystemWidget {
@@ -513,48 +813,9 @@ impl SystemWidget {
Span::styled("Storage:", Typography::widget_title()) Span::styled("Storage:", Typography::widget_title())
])); ]));
// Storage items with overflow handling // Storage items - let main overflow logic handle truncation
let storage_lines = self.render_storage(); let storage_lines = self.render_storage();
let remaining_space = area.height.saturating_sub(lines.len() as u16); lines.extend(storage_lines);
if storage_lines.len() <= remaining_space as usize {
// All storage lines fit
lines.extend(storage_lines);
} else if remaining_space >= 2 {
// Show what we can and add overflow indicator
let lines_to_show = (remaining_space - 1) as usize; // Reserve 1 line for overflow
lines.extend(storage_lines.iter().take(lines_to_show).cloned());
// Count hidden pools
let mut hidden_pools = 0;
let mut current_pool = String::new();
for (i, line) in storage_lines.iter().enumerate() {
if i >= lines_to_show {
// Check if this line represents a new pool (no indentation)
if let Some(first_span) = line.spans.first() {
let text = first_span.content.as_ref();
if !text.starts_with(" ") && text.contains(':') {
let pool_name = text.split(':').next().unwrap_or("").trim();
if pool_name != current_pool {
hidden_pools += 1;
current_pool = pool_name.to_string();
}
}
}
}
}
if hidden_pools > 0 {
let overflow_text = format!(
"... and {} more pool{}",
hidden_pools,
if hidden_pools == 1 { "" } else { "s" }
);
lines.push(Line::from(vec![
Span::styled(overflow_text, Typography::muted())
]));
}
}
// Apply scroll offset // Apply scroll offset
let total_lines = lines.len(); let total_lines = lines.len();

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.83" version = "0.1.131"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

161
shared/src/agent_data.rs Normal file
View File

@@ -0,0 +1,161 @@
use serde::{Deserialize, Serialize};
/// Complete structured data from an agent
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AgentData {
pub hostname: String,
pub agent_version: String,
pub timestamp: u64,
pub system: SystemData,
pub services: Vec<ServiceData>,
pub backup: BackupData,
}
/// System-level monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SystemData {
pub cpu: CpuData,
pub memory: MemoryData,
pub storage: StorageData,
}
/// CPU monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CpuData {
pub load_1min: f32,
pub load_5min: f32,
pub load_15min: f32,
pub frequency_mhz: f32,
pub temperature_celsius: Option<f32>,
}
/// Memory monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryData {
pub usage_percent: f32,
pub total_gb: f32,
pub used_gb: f32,
pub available_gb: f32,
pub swap_total_gb: f32,
pub swap_used_gb: f32,
pub tmpfs: Vec<TmpfsData>,
}
/// Tmpfs filesystem data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TmpfsData {
pub mount: String,
pub usage_percent: f32,
pub used_gb: f32,
pub total_gb: f32,
}
/// Storage monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct StorageData {
pub drives: Vec<DriveData>,
pub pools: Vec<PoolData>,
}
/// Individual drive data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DriveData {
pub name: String,
pub health: String,
pub temperature_celsius: Option<f32>,
pub wear_percent: Option<f32>,
pub filesystems: Vec<FilesystemData>,
}
/// Filesystem on a drive
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FilesystemData {
pub mount: String,
pub usage_percent: f32,
pub used_gb: f32,
pub total_gb: f32,
}
/// Storage pool (MergerFS, RAID, etc.)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolData {
pub name: String,
pub mount: String,
pub pool_type: String, // "mergerfs", "raid", etc.
pub health: String,
pub usage_percent: f32,
pub used_gb: f32,
pub total_gb: f32,
pub data_drives: Vec<PoolDriveData>,
pub parity_drives: Vec<PoolDriveData>,
}
/// Drive in a storage pool
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolDriveData {
pub name: String,
pub temperature_celsius: Option<f32>,
pub wear_percent: Option<f32>,
pub health: String,
}
/// Service monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServiceData {
pub name: String,
pub status: String, // "active", "inactive", "failed"
pub memory_mb: f32,
pub disk_gb: f32,
pub user_stopped: bool,
}
/// Backup system data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BackupData {
pub status: String,
pub last_run: Option<u64>,
pub next_scheduled: Option<u64>,
pub total_size_gb: Option<f32>,
pub repository_health: Option<String>,
}
impl AgentData {
/// Create new agent data with current timestamp
pub fn new(hostname: String, agent_version: String) -> Self {
Self {
hostname,
agent_version,
timestamp: chrono::Utc::now().timestamp() as u64,
system: SystemData {
cpu: CpuData {
load_1min: 0.0,
load_5min: 0.0,
load_15min: 0.0,
frequency_mhz: 0.0,
temperature_celsius: None,
},
memory: MemoryData {
usage_percent: 0.0,
total_gb: 0.0,
used_gb: 0.0,
available_gb: 0.0,
swap_total_gb: 0.0,
swap_used_gb: 0.0,
tmpfs: Vec::new(),
},
storage: StorageData {
drives: Vec::new(),
pools: Vec::new(),
},
},
services: Vec::new(),
backup: BackupData {
status: "unknown".to_string(),
last_run: None,
next_scheduled: None,
total_size_gb: None,
repository_health: None,
},
}
}
}

View File

@@ -1,8 +1,10 @@
pub mod agent_data;
pub mod cache; pub mod cache;
pub mod error; pub mod error;
pub mod metrics; pub mod metrics;
pub mod protocol; pub mod protocol;
pub use agent_data::*;
pub use cache::*; pub use cache::*;
pub use error::*; pub use error::*;
pub use metrics::*; pub use metrics::*;

View File

@@ -82,13 +82,13 @@ impl MetricValue {
/// Health status for metrics /// Health status for metrics
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)] #[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
pub enum Status { pub enum Status {
Inactive, // Lowest priority - treated as good Inactive, // Lowest priority
Ok, // Second lowest - also good Unknown, //
Unknown, Offline, //
Offline, Pending, //
Pending, Ok, // 5th place - good status has higher priority than unknown states
Warning, Warning, //
Critical, Critical, // Highest priority
} }
impl Status { impl Status {

View File

@@ -1,13 +1,9 @@
use crate::metrics::Metric; use crate::agent_data::AgentData;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
/// Message sent from agent to dashboard via ZMQ /// Message sent from agent to dashboard via ZMQ
#[derive(Debug, Clone, Serialize, Deserialize)] /// Always structured data - no legacy metrics support
pub struct MetricMessage { pub type AgentMessage = AgentData;
pub hostname: String,
pub timestamp: u64,
pub metrics: Vec<Metric>,
}
/// Command output streaming message /// Command output streaming message
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
@@ -20,15 +16,6 @@ pub struct CommandOutputMessage {
pub timestamp: u64, pub timestamp: u64,
} }
impl MetricMessage {
pub fn new(hostname: String, metrics: Vec<Metric>) -> Self {
Self {
hostname,
timestamp: chrono::Utc::now().timestamp() as u64,
metrics,
}
}
}
impl CommandOutputMessage { impl CommandOutputMessage {
pub fn new(hostname: String, command_id: String, command_type: String, output_line: String, is_complete: bool) -> Self { pub fn new(hostname: String, command_id: String, command_type: String, output_line: String, is_complete: bool) -> Self {
@@ -59,8 +46,8 @@ pub enum Command {
pub enum CommandResponse { pub enum CommandResponse {
/// Acknowledgment of command /// Acknowledgment of command
Ack, Ack,
/// Metrics response /// Agent data response
Metrics(Vec<Metric>), AgentData(AgentData),
/// Pong response to ping /// Pong response to ping
Pong, Pong,
/// Error response /// Error response
@@ -76,7 +63,7 @@ pub struct MessageEnvelope {
#[derive(Debug, Serialize, Deserialize)] #[derive(Debug, Serialize, Deserialize)]
pub enum MessageType { pub enum MessageType {
Metrics, AgentData,
Command, Command,
CommandResponse, CommandResponse,
CommandOutput, CommandOutput,
@@ -84,10 +71,10 @@ pub enum MessageType {
} }
impl MessageEnvelope { impl MessageEnvelope {
pub fn metrics(message: MetricMessage) -> Result<Self, crate::SharedError> { pub fn agent_data(data: AgentData) -> Result<Self, crate::SharedError> {
Ok(Self { Ok(Self {
message_type: MessageType::Metrics, message_type: MessageType::AgentData,
payload: serde_json::to_vec(&message)?, payload: serde_json::to_vec(&data)?,
}) })
} }
@@ -119,11 +106,11 @@ impl MessageEnvelope {
}) })
} }
pub fn decode_metrics(&self) -> Result<MetricMessage, crate::SharedError> { pub fn decode_agent_data(&self) -> Result<AgentData, crate::SharedError> {
match self.message_type { match self.message_type {
MessageType::Metrics => Ok(serde_json::from_slice(&self.payload)?), MessageType::AgentData => Ok(serde_json::from_slice(&self.payload)?),
_ => Err(crate::SharedError::Protocol { _ => Err(crate::SharedError::Protocol {
message: "Expected metrics message".to_string(), message: "Expected agent data message".to_string(),
}), }),
} }
} }