Compare commits

...

33 Commits

Author SHA1 Message Date
adf3b0f51c Implement complete structured data architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
Replace fragile string-based metrics with type-safe JSON data structures.
Agent converts all metrics to structured data, dashboard processes typed fields.

Changes:
- Add AgentData struct with CPU, memory, storage, services, backup fields
- Replace string parsing with direct field access throughout system
- Maintain UI compatibility via temporary metric bridge conversion
- Fix NVMe temperature display and eliminate string parsing bugs
- Update protocol to support structured data transmission over ZMQ
- Comprehensive metric type coverage: CPU, memory, storage, services, backup

Version bump to 0.1.131
2025-11-23 21:32:00 +01:00
41ded0170c Add wear percentage display and NVMe temperature collection
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
- Display wear percentage in storage headers for single physical drives
- Remove redundant drive type indicators, show wear data instead
- Fix wear metric parsing for physical drives (underscore count issue)
- Add NVMe temperature parsing support (Temperature: format)
- Add raw metrics debugging functionality for troubleshooting
- Clean up physical drive display to remove redundant information
2025-11-23 20:29:24 +01:00
9b4191b2c3 Fix physical drive name and health status display
All checks were successful
Build and Release / build-and-release (push) Successful in 2m13s
- Display actual drive name (e.g., nvme0n1) instead of mount point for physical drives
- Fix health status parsing for physical drives to show proper status icons
- Update pool name extraction to handle disk_{drive}_health metrics correctly
- Improve storage widget rendering for physical drive identification
2025-11-23 19:25:45 +01:00
53dbb43352 Fix SnapRAID parity association using directory-based discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
- Replace blanket parity drive inclusion with smart relationship detection
- Only associate parity drives from same parent directory as data drives
- Prevent incorrect exclusion of nvme0n1 physical drives from grouping
- Maintain zero-configuration auto-discovery without hardcoded paths
2025-11-23 18:42:48 +01:00
ba03623110 Remove hardcoded pool mount point mappings for true auto-discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Eliminate hardcoded mappings like 'root' -> '/' and 'steampool' -> '/mnt/steampool'
- Use device names directly for physical drives
- Rely on mount_point metrics from agent for actual mount paths
- Implement zero-configuration architecture as specified in CLAUDE.md
2025-11-23 18:34:45 +01:00
f24c4ed650 Fix pool name extraction to prevent wrong physical drive naming
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Remove fallback logic that could extract incorrect pool names
- Simplify pool suffix matching to use explicit arrays
- Ensure only valid metric patterns create pools
2025-11-23 18:24:39 +01:00
86501fd486 Fix display format to match CLAUDE.md specification
All checks were successful
Build and Release / build-and-release (push) Successful in 1m17s
- Use actual device names (sdb, sdc) instead of data_0, parity_0
- Fix physical drive naming to show device names instead of mount points
- Update pool name extraction to handle new device-based naming
- Ensure Drive: line shows temperature and wear data for physical drives
2025-11-23 18:13:35 +01:00
192eea6e0c Integrate SnapRAID parity drives into mergerfs pools
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Add SnapRAID parity drive detection to mergerfs discovery
- Remove Pool Status health line as discussed
- Update drive display to always show wear data when available
- Include /mnt/parity drives as part of mergerfs pool structure
2025-11-23 18:05:19 +01:00
43fb838c9b Fix duplicate drive display in mergerfs pools
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
- Restructure storage rendering logic to prevent drive duplication
- Use specific mergerfs check instead of generic multi-drive condition
- Ensure drives only appear once under organized data/parity sections
2025-11-23 17:46:09 +01:00
54483653f9 Fix mergerfs drive metric parsing for proper pool consolidation
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
- Update extract_pool_name to handle data_/parity_ drive metrics correctly
- Fix extract_drive_name to parse mergerfs drive roles properly
- Prevent srv_media_data from being parsed as separate pool
2025-11-23 17:40:12 +01:00
e47803b705 Fix mergerfs pool consolidation and naming
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
- Improve pool name extraction in dashboard parsing
- Use consistent mergerfs pool naming in agent
- Add mount_point metric parsing to use actual mount paths
- Fix pool consolidation to prevent duplicate entries
2025-11-23 17:35:23 +01:00
439d0d9af6 Fix mergerfs numeric reference parsing for proper pool detection
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
Add support for numeric mergerfs references like "1:2" by mapping them
to actual mount points (/mnt/disk1, /mnt/disk2). This enables proper
mergerfs pool detection and hides individual member drives as intended.
2025-11-23 17:27:45 +01:00
2242b5ddfe Make mergerfs detection more robust to prevent discovery failures
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Skip mergerfs pools with numeric device references (e.g., "1:2")
instead of crashing. This allows regular drive detection to work
even when mergerfs uses non-standard mount formats.

Preserves existing functionality for standard mergerfs setups.
2025-11-23 17:19:15 +01:00
9d0f42d55c Fix filesystem usage_percent parsing and remove hardcoded status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
1. Add missing _fs_ filter to usage_percent parsing in dashboard
2. Fix agent to use calculated fs_status instead of hardcoded Status::Ok

This completes the disk collector auto-discovery by ensuring filesystem
usage percentages and status indicators display correctly.
2025-11-23 16:47:20 +01:00
1da7b5f6e7 Fix both pool-level and filesystem metric parsing bugs
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
1. Prevent filesystem _fs_ metrics from overwriting pool totals
2. Fix filesystem name extraction to properly parse boot/root names

This resolves both the pool total display (showing 0.1GB instead of 220GB)
and individual filesystem display (showing —% —GB/—GB).
2025-11-23 16:29:00 +01:00
006f27f7d9 Fix lsblk parsing for filesystem discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Remove unused debug code and fix device name parsing to properly
handle lsblk tree characters. This resolves the issue where only
/boot filesystem was discovered instead of both /boot and /.
2025-11-23 16:09:48 +01:00
07422cd0a7 Add debug logging for filesystem discovery
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
2025-11-23 15:26:49 +01:00
de30b80219 Fix filesystem metric parsing bounds error in dashboard
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Prevent string slicing panic in extract_filesystem_metric when
parsing individual filesystem metrics. This resolves the issue
where filesystem entries show —% —GB/—GB instead of actual usage.
2025-11-23 15:23:15 +01:00
7d96ca9fad Fix disk collector filesystem discovery with debug logging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Add debug logging to filesystem usage collection to identify why
some mount points are being dropped during discovery. This should
resolve the issue where total capacity shows incorrect values.
2025-11-23 15:15:56 +01:00
9b940ebd19 Fix string slicing bounds error in metric parsing
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Fixed critical bug where dashboard crashed with 'begin <= end' slice error
when parsing disk metrics with new naming format. Added bounds checking
to prevent invalid string slicing operations.

- Fixed extract_pool_name string slicing bounds check
- Removed ineffective panic handling that caused infinite loop
- Dashboard now handles new disk collector metrics correctly
2025-11-23 14:52:09 +01:00
6d4da1b7da Add robust error handling to prevent dashboard crashes
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Added comprehensive error handling to storage metrics parsing to prevent
dashboard crashes when encountering unexpected metric formats or parsing
errors. Dashboard now continues gracefully with empty storage display
instead of crashing, improving reliability during metric format changes.

- Wrapped storage metric parsing in panic recovery
- Added logging for metric parsing failures
- Dashboard shows empty storage on errors instead of crashing
- Ensures dashboard remains functional during agent updates
2025-11-23 14:45:00 +01:00
1e7f1616aa Complete disk collector rewrite with clean architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
Replaced complex disk collector with simple lsblk → df → group workflow.
Supports both physical drives and mergerfs pools with unified metrics.
Eliminates configuration complexity through pure auto-discovery.

- Clean discovery pipeline using lsblk and df commands
- Physical drive grouping with filesystem children
- MergerFS pool detection with parity heuristics
- Unified metric generation for consistent dashboard display
- SMART data collection for temperature, wear, and health
2025-11-23 14:22:19 +01:00
7a3ee3d5ba Fix physical drive grouping logic for unified pool visualization
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
Updated filesystem grouping to use extract_base_device method for proper
partition-to-drive mapping. This ensures nvme0n1p1 and nvme0n1p2 are
correctly grouped under nvme0n1 drive pool instead of separate pools.
2025-11-23 13:54:33 +01:00
0e8b149718 Add partial filesystem data display for debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 2m11s
- Make filesystem display more forgiving - show partial data if available
- Will display usage% even if GB values are missing, or vice versa
- This should help identify which specific metrics aren't being populated
- Debug version to identify filesystem data population issues
2025-11-23 13:33:36 +01:00
2c27d0e1db Prepare v0.1.107 for filesystem data debugging
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Current status: Filesystem children appear with correct mount points but show —% —GB/—GB
Need to debug why usage_percent, used_gb, total_gb metrics aren't populating filesystem entries
2025-11-23 13:24:13 +01:00
9f18488752 Fix filesystem metric parsing for correct mount point names
All checks were successful
Build and Release / build-and-release (push) Successful in 2m10s
- Fix extract_filesystem_metric() to handle multi-underscore metric names correctly
- Parse known metric suffixes (usage_percent, mount_point, available_gb, etc.)
- Prevent incorrect parsing like boot_mount_point -> fs_name='boot_mount', metric_type='point'
- Should now correctly show /boot and / instead of /boot/mount and /root/mount
2025-11-23 13:11:05 +01:00
fab6404cca Fix filesystem children creation logic
All checks were successful
Build and Release / build-and-release (push) Successful in 1m17s
- Allow filesystem entries to be created with any metric, not just mount_point
- Ensure filesystem children appear under physical drive pools
- Improve mount point fallback logic for better compatibility
2025-11-23 13:04:01 +01:00
c3626cc362 Fix unified pool visualization filesystem children display issues
All checks were successful
Build and Release / build-and-release (push) Successful in 2m14s
- Fix extract_pool_name() to handle filesystem metrics (_fs_) correctly
- Prevent individual filesystem pools (nvme0n1_fs_boot, nvme0n1_fs_root) from being created
- Fix incorrect mount point names (was showing /root/mount instead of /)
- Only create filesystem entries when receiving mount_point metrics
- Add available_gb field to FileSystem struct for proper available space handling
- Ensure filesystem children show correct usage data instead of —% —GB/—GB
2025-11-23 12:58:16 +01:00
d68ecfbc64 Complete unified pool visualization with filesystem children
All checks were successful
Build and Release / build-and-release (push) Successful in 2m17s
- Implement filesystem children display under physical drive pools
- Agent generates individual filesystem metrics for each mount point
- Dashboard parses filesystem metrics and displays as tree children
- Add filesystem usage, total, and available space metrics
- Support target format: drive info + filesystem children hierarchy
- Fix compilation warnings by properly using available_bytes calculation
2025-11-23 12:48:24 +01:00
d1272a6c13 Implement unified pool visualization for single drives
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Group single disk filesystems by physical drive during auto-discovery
- Create physical drive pools with filesystem children
- Display temperature, wear, and health at drive level
- Provide consistent hierarchical storage visualization
- Fix borrow checker issues in create_physical_drive_pool method
- Add PhysicalDrive case to all StoragePoolType match statements
2025-11-23 12:10:42 +01:00
33b3beb342 Implement storage auto-discovery system
All checks were successful
Build and Release / build-and-release (push) Successful in 1m49s
- Add automatic detection of mergerfs pools by parsing /proc/mounts
- Implement smart heuristics for parity disk identification
- Store discovered topology at agent startup for efficient monitoring
- Eliminate need for manual storage pool configuration
- Support zero-config storage visualization with backward compatibility
- Clean up mount parsing and remove unused fields
2025-11-23 11:44:57 +01:00
f9384d9df6 Implement enhanced storage pool visualization
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
- Add support for mergerfs pool grouping with data and parity disk separation
- Implement pool health monitoring (healthy/degraded/critical status)
- Create hierarchical tree view for multi-disk storage arrays
- Add automatic pool type detection and member disk association
- Maintain backward compatibility for single disk configurations
- Support future extension for RAID and ZFS pool types
2025-11-23 11:18:21 +01:00
156d707377 Add version display and fix status aggregation priorities
All checks were successful
Build and Release / build-and-release (push) Successful in 2m37s
- Add dynamic version display in top bar using CARGO_PKG_VERSION
- Rewrite status aggregation to only show Critical/Warning/OK in top bar
- Fix Status enum ordering to prioritize OK over transitional states
- Remove blue/gray colors from top bar background
2025-11-21 16:19:45 +01:00
19 changed files with 3651 additions and 673 deletions

237
CLAUDE.md
View File

@@ -59,11 +59,85 @@ hostname2 = [
## Core Architecture Principles
### Individual Metrics Philosophy
- Agent collects individual metrics, dashboard composes widgets
- Each metric collected, transmitted, and stored individually
- Agent calculates status for each metric using thresholds
- Dashboard aggregates individual metric statuses for widget status
### Structured Data Architecture (Planned Migration)
Current system uses string-based metrics with complex parsing. Planning migration to structured JSON data to eliminate fragile string manipulation.
**Current (String Metrics):**
- Agent sends individual metrics with string names like `disk_nvme0n1_temperature`
- Dashboard parses metric names with underscore counting and string splitting
- Complex and error-prone metric filtering and extraction logic
**Target (Structured Data):**
```json
{
"hostname": "cmbox",
"agent_version": "v0.1.130",
"timestamp": 1763926877,
"system": {
"cpu": {
"load_1min": 3.50,
"load_5min": 3.57,
"load_15min": 3.58,
"frequency_mhz": 1500,
"temperature_celsius": 45.2
},
"memory": {
"usage_percent": 25.0,
"total_gb": 23.3,
"used_gb": 5.9,
"swap_total_gb": 10.7,
"swap_used_gb": 0.99,
"tmpfs": [
{"mount": "/tmp", "usage_percent": 15.0, "used_gb": 0.3, "total_gb": 2.0}
]
},
"storage": {
"drives": [
{
"name": "nvme0n1",
"health": "PASSED",
"temperature_celsius": 29.0,
"wear_percent": 1.0,
"filesystems": [
{"mount": "/", "usage_percent": 24.0, "used_gb": 224.9, "total_gb": 928.2}
]
}
],
"pools": [
{
"name": "srv_media",
"mount": "/srv/media",
"type": "mergerfs",
"health": "healthy",
"usage_percent": 63.0,
"used_gb": 2355.2,
"total_gb": 3686.4,
"data_drives": [
{"name": "sdb", "temperature_celsius": 24.0}
],
"parity_drives": [
{"name": "sdc", "temperature_celsius": 24.0}
]
}
]
}
},
"services": [
{"name": "sshd", "status": "active", "memory_mb": 4.5, "disk_gb": 0.0}
],
"backup": {
"status": "completed",
"last_run": 1763920000,
"next_scheduled": 1764006400,
"total_size_gb": 150.5,
"repository_health": "ok"
}
}
```
- Agent sends structured JSON over ZMQ
- Dashboard accesses data directly: `data.system.storage.drives[0].temperature_celsius`
- Type safety eliminates all parsing bugs
### Maintenance Mode
- Agent checks for `/tmp/cm-maintenance` file before sending notifications
@@ -144,6 +218,130 @@ nix-build --no-out-link -E 'with import <nixpkgs> {}; fetchurl {
- **Workspace builds**: `nix-shell -p openssl pkg-config --run "cargo build --workspace"`
- **Clean compilation**: Remove `target/` between major changes
## Enhanced Storage Pool Visualization
### Auto-Discovery Architecture
The dashboard uses automatic storage discovery to eliminate manual configuration complexity while providing intelligent storage pool grouping.
### Discovery Process
**At Agent Startup:**
1. Parse `/proc/mounts` to identify all mounted filesystems
2. Detect MergerFS pools by analyzing `fuse.mergerfs` mount sources
3. Identify member disks and potential parity relationships via heuristics
4. Store discovered storage topology for continuous monitoring
5. Generate pool-aware metrics with hierarchical relationships
**Continuous Monitoring:**
- Use stored discovery data for efficient metric collection
- Monitor individual drives for SMART data, temperature, wear
- Calculate pool-level health based on member drive status
- Generate enhanced metrics for dashboard visualization
### Supported Storage Types
**Single Disks:**
- ext4, xfs, btrfs mounted directly
- Individual drive monitoring with SMART data
- Traditional single-disk display for root, boot, etc.
**MergerFS Pools:**
- Auto-detect from `/proc/mounts` fuse.mergerfs entries
- Parse source paths to identify member disks (e.g., "/mnt/disk1:/mnt/disk2")
- Heuristic parity disk detection (sequential device names, "parity" in path)
- Pool health calculation (healthy/degraded/critical)
- Hierarchical tree display with data/parity disk grouping
**Future Extensions Ready:**
- RAID arrays via `/proc/mdstat` parsing
- ZFS pools via `zpool status` integration
- LVM logical volumes via `lvs` discovery
### Configuration
```toml
[collectors.disk]
enabled = true
auto_discover = true # Default: true
# Optional exclusions for special filesystems
exclude_mount_points = ["/tmp", "/proc", "/sys", "/dev"]
exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc"]
```
### Display Format
```
Storage:
● /srv/media (mergerfs (2+1)):
├─ Pool Status: ● Healthy (3 drives)
├─ Total: ● 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ Parity: ● sdc T: 24°C
● /:
├─ ● nvme0n1 W: 13%
└─ ● 7% 14.5GB/218.5GB
```
### Implementation Benefits
- **Zero Configuration**: No manual pool definitions required
- **Always Accurate**: Reflects actual system state automatically
- **Scales Automatically**: Handles any number of pools without config changes
- **Backwards Compatible**: Single disks continue working unchanged
- **Future Ready**: Easy extension for additional storage technologies
### Current Status (v0.1.100)
**✅ Completed:**
- Auto-discovery system implemented and deployed
- `/proc/mounts` parsing with smart heuristics for parity detection
- Storage topology stored at agent startup for efficient monitoring
- Universal zero-configuration for all hosts (cmbox, steambox, simonbox, srv01, srv02, srv03)
- Enhanced pool health calculation (healthy/degraded/critical)
- Hierarchical tree visualization with data/parity disk separation
**🔄 In Progress - Complete Disk Collector Rewrite:**
The current disk collector has grown complex with mixed legacy/auto-discovery approaches. Planning complete rewrite with clean, simple workflow supporting both physical drives and mergerfs pools.
**New Clean Architecture:**
**Discovery Workflow:**
1. **`lsblk`** to detect all mount points and backing devices
2. **`df`** to get filesystem usage for each mount point
3. **Group by physical drive** (nvme0n1, sda, etc.)
4. **Parse `/proc/mounts`** for mergerfs pools
5. **Generate unified metrics** for both storage types
**Physical Drive Display:**
```
● nvme0n1:
├─ ● Drive: T: 35°C W: 1%
├─ ● Total: 23% 218.0GB/928.2GB
├─ ● /boot: 11% 0.1GB/1.0GB
└─ ● /: 23% 214.9GB/928.2GB
```
**MergerFS Pool Display:**
```
● /srv/media (mergerfs):
├─ ● Pool: 63% 2355.2GB/3686.4GB
├─ Data Disks:
│ ├─ ● sdb T: 24°C
│ └─ ● sdd T: 27°C
└─ ● sdc T: 24°C (parity)
```
**Implementation Benefits:**
- **Pure auto-discovery**: No configuration needed
- **Clean code paths**: Single workflow for all storage types
- **Consistent display**: Status icons on every line, no redundant text
- **Simple pipeline**: lsblk → df → group → metrics
- **Support for both**: Physical drives and mergerfs pools
## Important Communication Guidelines
Keep responses concise and focused. Avoid extensive implementation summaries unless requested.
@@ -169,12 +367,33 @@ Keep responses concise and focused. Avoid extensive implementation summaries unl
- ✅ "Restructure storage widget with improved layout"
- ✅ "Update CPU thresholds to production values"
## Planned Architecture Migration
### Phase 1: Structured Data Types (Shared Crate)
- Create Rust structs matching target JSON structure
- Replace `Metric` enum with typed data structures
- Add serde serialization/deserialization
### Phase 2: Agent Refactor
- Update collectors to return typed structs instead of `Vec<Metric>`
- Remove string metric name generation
- Send structured JSON over ZMQ
### Phase 3: Dashboard Refactor
- Replace metric parsing logic with direct field access
- Remove `extract_pool_name()`, `extract_drive_name()`, underscore counting
- Widgets access `data.system.storage.drives[0].temperature_celsius`
### Phase 4: Migration & Cleanup
- Support both formats during transition
- Gradual rollout with backward compatibility
- Remove legacy string metric system
## Implementation Rules
1. **Individual Metrics**: Each metric is collected, transmitted, and stored individually
2. **Agent Status Authority**: Agent calculates status for each metric using thresholds
3. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
4. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
1. **Agent Status Authority**: Agent calculates status for each metric using thresholds
2. **Dashboard Composition**: Dashboard widgets subscribe to specific metrics by name
3. **Status Aggregation**: Dashboard aggregates individual metric statuses for widget status
**NEVER:**
- Copy/paste ANY code from legacy implementations

6
Cargo.lock generated
View File

@@ -279,7 +279,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]]
name = "cm-dashboard"
version = "0.1.96"
version = "0.1.130"
dependencies = [
"anyhow",
"chrono",
@@ -301,7 +301,7 @@ dependencies = [
[[package]]
name = "cm-dashboard-agent"
version = "0.1.96"
version = "0.1.130"
dependencies = [
"anyhow",
"async-trait",
@@ -324,7 +324,7 @@ dependencies = [
[[package]]
name = "cm-dashboard-shared"
version = "0.1.96"
version = "0.1.130"
dependencies = [
"chrono",
"serde",

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard-agent"
version = "0.1.97"
version = "0.1.131"
edition = "2021"
[dependencies]

View File

@@ -10,7 +10,7 @@ use crate::metrics::MetricCollectionManager;
use crate::notifications::NotificationManager;
use crate::service_tracker::UserStoppedServiceTracker;
use crate::status::HostStatusManager;
use cm_dashboard_shared::{Metric, MetricMessage, MetricValue, Status};
use cm_dashboard_shared::{AgentData, Metric, MetricValue, Status, TmpfsData, DriveData, FilesystemData, ServiceData};
pub struct Agent {
hostname: String,
@@ -199,16 +199,310 @@ impl Agent {
return Ok(());
}
debug!("Broadcasting {} cached metrics (including host status summary)", metrics.len());
debug!("Broadcasting {} cached metrics as structured data", metrics.len());
// Create and send message with all current data
let message = MetricMessage::new(self.hostname.clone(), metrics);
self.zmq_handler.publish_metrics(&message).await?;
// Convert metrics to structured data and send
let agent_data = self.metrics_to_structured_data(&metrics)?;
self.zmq_handler.publish_agent_data(&agent_data).await?;
debug!("Metrics broadcasted successfully");
debug!("Structured data broadcasted successfully");
Ok(())
}
/// Convert legacy metrics to structured data format
fn metrics_to_structured_data(&self, metrics: &[Metric]) -> Result<AgentData> {
let mut agent_data = AgentData::new(self.hostname.clone(), self.get_agent_version());
// Parse metrics into structured data
for metric in metrics {
self.parse_metric_into_agent_data(&mut agent_data, metric)?;
}
Ok(agent_data)
}
/// Parse a single metric into the appropriate structured data field
fn parse_metric_into_agent_data(&self, agent_data: &mut AgentData, metric: &Metric) -> Result<()> {
// CPU metrics
if metric.name == "cpu_load_1min" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.load_1min = value;
}
} else if metric.name == "cpu_load_5min" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.load_5min = value;
}
} else if metric.name == "cpu_load_15min" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.load_15min = value;
}
} else if metric.name == "cpu_frequency_mhz" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.frequency_mhz = value;
}
} else if metric.name == "cpu_temperature_celsius" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.cpu.temperature_celsius = Some(value);
}
}
// Memory metrics
else if metric.name == "memory_usage_percent" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.usage_percent = value;
}
} else if metric.name == "memory_total_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.total_gb = value;
}
} else if metric.name == "memory_used_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.used_gb = value;
}
} else if metric.name == "memory_available_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.available_gb = value;
}
} else if metric.name == "memory_swap_total_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.swap_total_gb = value;
}
} else if metric.name == "memory_swap_used_gb" {
if let Some(value) = metric.value.as_f32() {
agent_data.system.memory.swap_used_gb = value;
}
}
// Tmpfs metrics
else if metric.name.starts_with("memory_tmp_") {
// For now, create a single /tmp tmpfs entry
if metric.name == "memory_tmp_usage_percent" {
if let Some(value) = metric.value.as_f32() {
if let Some(tmpfs) = agent_data.system.memory.tmpfs.get_mut(0) {
tmpfs.usage_percent = value;
} else {
agent_data.system.memory.tmpfs.push(TmpfsData {
mount: "/tmp".to_string(),
usage_percent: value,
used_gb: 0.0,
total_gb: 0.0,
});
}
}
} else if metric.name == "memory_tmp_used_gb" {
if let Some(value) = metric.value.as_f32() {
if let Some(tmpfs) = agent_data.system.memory.tmpfs.get_mut(0) {
tmpfs.used_gb = value;
} else {
agent_data.system.memory.tmpfs.push(TmpfsData {
mount: "/tmp".to_string(),
usage_percent: 0.0,
used_gb: value,
total_gb: 0.0,
});
}
}
} else if metric.name == "memory_tmp_total_gb" {
if let Some(value) = metric.value.as_f32() {
if let Some(tmpfs) = agent_data.system.memory.tmpfs.get_mut(0) {
tmpfs.total_gb = value;
} else {
agent_data.system.memory.tmpfs.push(TmpfsData {
mount: "/tmp".to_string(),
usage_percent: 0.0,
used_gb: 0.0,
total_gb: value,
});
}
}
}
}
// Storage metrics
else if metric.name.starts_with("disk_") {
if metric.name.contains("_temperature") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
if let Some(temp) = metric.value.as_f32() {
self.ensure_drive_exists(agent_data, &drive_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == drive_name) {
drive.temperature_celsius = Some(temp);
}
}
}
} else if metric.name.contains("_wear_percent") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
if let Some(wear) = metric.value.as_f32() {
self.ensure_drive_exists(agent_data, &drive_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == drive_name) {
drive.wear_percent = Some(wear);
}
}
}
} else if metric.name.contains("_health") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
let health = metric.value.as_string();
self.ensure_drive_exists(agent_data, &drive_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == drive_name) {
drive.health = health;
}
}
} else if metric.name.contains("_fs_") {
// Filesystem metrics: disk_{pool}_fs_{filesystem}_{metric}
if let Some((pool_name, fs_name)) = self.extract_pool_and_filesystem(&metric.name) {
if metric.name.contains("_usage_percent") {
if let Some(usage) = metric.value.as_f32() {
self.ensure_filesystem_exists(agent_data, &pool_name, &fs_name, usage, 0.0, 0.0);
}
} else if metric.name.contains("_used_gb") {
if let Some(used) = metric.value.as_f32() {
self.update_filesystem_field(agent_data, &pool_name, &fs_name, |fs| fs.used_gb = used);
}
} else if metric.name.contains("_total_gb") {
if let Some(total) = metric.value.as_f32() {
self.update_filesystem_field(agent_data, &pool_name, &fs_name, |fs| fs.total_gb = total);
}
}
}
}
}
// Service metrics
else if metric.name.starts_with("service_") {
if let Some(service_name) = self.extract_service_name(&metric.name) {
if metric.name.contains("_status") {
let status = metric.value.as_string();
self.ensure_service_exists(agent_data, &service_name, &status);
} else if metric.name.contains("_memory_mb") {
if let Some(memory) = metric.value.as_f32() {
self.update_service_field(agent_data, &service_name, |svc| svc.memory_mb = memory);
}
} else if metric.name.contains("_disk_gb") {
if let Some(disk) = metric.value.as_f32() {
self.update_service_field(agent_data, &service_name, |svc| svc.disk_gb = disk);
}
}
}
}
// Backup metrics
else if metric.name.starts_with("backup_") {
if metric.name == "backup_status" {
agent_data.backup.status = metric.value.as_string();
} else if metric.name == "backup_last_run_timestamp" {
if let Some(timestamp) = metric.value.as_i64() {
agent_data.backup.last_run = Some(timestamp as u64);
}
} else if metric.name == "backup_next_scheduled_timestamp" {
if let Some(timestamp) = metric.value.as_i64() {
agent_data.backup.next_scheduled = Some(timestamp as u64);
}
} else if metric.name == "backup_size_gb" {
if let Some(size) = metric.value.as_f32() {
agent_data.backup.total_size_gb = Some(size);
}
} else if metric.name == "backup_repository_health" {
agent_data.backup.repository_health = Some(metric.value.as_string());
}
}
Ok(())
}
/// Extract drive name from metric like "disk_nvme0n1_temperature"
fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
if metric_name.starts_with("disk_") {
let suffixes = ["_temperature", "_wear_percent", "_health"];
for suffix in suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
}
}
None
}
/// Extract pool and filesystem from "disk_{pool}_fs_{filesystem}_{metric}"
fn extract_pool_and_filesystem(&self, metric_name: &str) -> Option<(String, String)> {
if let Some(fs_pos) = metric_name.find("_fs_") {
let pool_name = metric_name[5..fs_pos].to_string(); // Skip "disk_"
let after_fs = &metric_name[fs_pos + 4..]; // Skip "_fs_"
if let Some(metric_pos) = after_fs.find('_') {
let fs_name = after_fs[..metric_pos].to_string();
return Some((pool_name, fs_name));
}
}
None
}
/// Extract service name from "service_{name}_{metric}"
fn extract_service_name(&self, metric_name: &str) -> Option<String> {
if metric_name.starts_with("service_") {
let suffixes = ["_status", "_memory_mb", "_disk_gb"];
for suffix in suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
return Some(metric_name[8..suffix_pos].to_string()); // Skip "service_"
}
}
}
None
}
/// Ensure drive exists in agent_data
fn ensure_drive_exists(&self, agent_data: &mut AgentData, drive_name: &str) {
if !agent_data.system.storage.drives.iter().any(|d| d.name == drive_name) {
agent_data.system.storage.drives.push(DriveData {
name: drive_name.to_string(),
health: "UNKNOWN".to_string(),
temperature_celsius: None,
wear_percent: None,
filesystems: Vec::new(),
});
}
}
/// Ensure filesystem exists in the correct drive
fn ensure_filesystem_exists(&self, agent_data: &mut AgentData, pool_name: &str, fs_name: &str, usage_percent: f32, used_gb: f32, total_gb: f32) {
self.ensure_drive_exists(agent_data, pool_name);
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == pool_name) {
if !drive.filesystems.iter().any(|fs| fs.mount == fs_name) {
drive.filesystems.push(FilesystemData {
mount: fs_name.to_string(),
usage_percent,
used_gb,
total_gb,
});
}
}
}
/// Update filesystem field
fn update_filesystem_field<F>(&self, agent_data: &mut AgentData, pool_name: &str, fs_name: &str, update_fn: F)
where F: FnOnce(&mut FilesystemData) {
if let Some(drive) = agent_data.system.storage.drives.iter_mut().find(|d| d.name == pool_name) {
if let Some(fs) = drive.filesystems.iter_mut().find(|fs| fs.mount == fs_name) {
update_fn(fs);
}
}
}
/// Ensure service exists
fn ensure_service_exists(&self, agent_data: &mut AgentData, service_name: &str, status: &str) {
if !agent_data.services.iter().any(|s| s.name == service_name) {
agent_data.services.push(ServiceData {
name: service_name.to_string(),
status: status.to_string(),
memory_mb: 0.0,
disk_gb: 0.0,
user_stopped: false, // TODO: Get from service tracker
});
} else if let Some(service) = agent_data.services.iter_mut().find(|s| s.name == service_name) {
service.status = status.to_string();
}
}
/// Update service field
fn update_service_field<F>(&self, agent_data: &mut AgentData, service_name: &str, update_fn: F)
where F: FnOnce(&mut ServiceData) {
if let Some(service) = agent_data.services.iter_mut().find(|s| s.name == service_name) {
update_fn(service);
}
}
async fn process_metrics(&mut self, metrics: &[Metric]) -> bool {
let mut status_changed = false;
for metric in metrics {
@@ -261,13 +555,11 @@ impl Agent {
/// Send standalone heartbeat for connectivity detection
async fn send_heartbeat(&mut self) -> Result<()> {
let heartbeat_metric = self.get_heartbeat_metric();
let message = MetricMessage::new(
self.hostname.clone(),
vec![heartbeat_metric],
);
// Create minimal agent data with just heartbeat
let agent_data = AgentData::new(self.hostname.clone(), self.get_agent_version());
// Heartbeat timestamp is already set in AgentData::new()
self.zmq_handler.publish_metrics(&message).await?;
self.zmq_handler.publish_agent_data(&agent_data).await?;
debug!("Sent standalone heartbeat for connectivity detection");
Ok(())
}

View File

@@ -5,353 +5,159 @@ use cm_dashboard_shared::{Metric, MetricValue, Status, StatusTracker, Hysteresis
use crate::config::DiskConfig;
use std::process::Command;
use std::time::Instant;
use std::collections::HashMap;
use tracing::debug;
use super::{Collector, CollectorError};
/// Information about a storage pool (mount point with underlying drives)
#[derive(Debug, Clone)]
struct StoragePool {
name: String, // e.g., "steampool", "root"
mount_point: String, // e.g., "/mnt/steampool", "/"
filesystem: String, // e.g., "mergerfs", "ext4", "zfs", "btrfs"
storage_type: String, // e.g., "mergerfs", "single", "raid", "zfs"
size: String, // e.g., "2.5TB"
used: String, // e.g., "2.1TB"
available: String, // e.g., "400GB"
usage_percent: f32, // e.g., 85.0
underlying_drives: Vec<DriveInfo>, // Individual physical drives
}
/// Information about an individual physical drive
#[derive(Debug, Clone)]
struct DriveInfo {
device: String, // e.g., "sda", "nvme0n1"
health_status: String, // e.g., "PASSED", "FAILED"
temperature: Option<f32>, // e.g., 45.0°C
wear_level: Option<f32>, // e.g., 12.0% (for SSDs)
}
/// Disk usage collector for monitoring filesystem sizes
/// Storage collector with clean architecture
pub struct DiskCollector {
config: DiskConfig,
temperature_thresholds: HysteresisThresholds,
detected_devices: std::collections::HashMap<String, Vec<String>>, // mount_point -> devices
}
/// A physical drive with its filesystems
#[derive(Debug, Clone)]
struct PhysicalDrive {
device: String, // e.g., "nvme0n1", "sda"
filesystems: Vec<Filesystem>, // mounted filesystems on this drive
temperature: Option<f32>, // drive temperature
wear_level: Option<f32>, // SSD wear level
health_status: String, // SMART health
}
/// A mergerfs pool
#[derive(Debug, Clone)]
struct MergerfsPool {
mount_point: String, // e.g., "/srv/media"
total_bytes: u64, // pool total capacity
used_bytes: u64, // pool used space
data_drives: Vec<DriveInfo>, // data member drives
parity_drives: Vec<DriveInfo>, // parity drives
}
/// Individual filesystem on a drive
#[derive(Debug, Clone)]
struct Filesystem {
mount_point: String, // e.g., "/", "/boot"
total_bytes: u64, // filesystem capacity
used_bytes: u64, // filesystem used space
}
/// Drive information for pools
#[derive(Debug, Clone)]
struct DriveInfo {
device: String, // e.g., "sdb", "sdc"
mount_point: String, // e.g., "/mnt/disk1"
temperature: Option<f32>, // drive temperature
wear_level: Option<f32>, // SSD wear level
health_status: String, // SMART health
}
/// Discovered storage topology
#[derive(Debug)]
struct StorageTopology {
physical_drives: Vec<PhysicalDrive>,
mergerfs_pools: Vec<MergerfsPool>,
}
impl DiskCollector {
pub fn new(config: DiskConfig) -> Self {
// Create hysteresis thresholds for disk temperature from config
let temperature_thresholds = HysteresisThresholds::with_custom_gaps(
config.temperature_warning_celsius,
5.0, // 5°C gap for recovery
5.0,
config.temperature_critical_celsius,
5.0, // 5°C gap for recovery
5.0,
);
// Detect devices for all configured filesystems at startup
let mut detected_devices = std::collections::HashMap::new();
for fs_config in &config.filesystems {
if fs_config.monitor {
if let Ok(devices) = Self::detect_device_for_mount_point_static(&fs_config.mount_point) {
detected_devices.insert(fs_config.mount_point.clone(), devices);
}
}
}
Self {
config,
temperature_thresholds,
detected_devices,
}
}
/// Calculate disk temperature status using hysteresis thresholds
fn calculate_temperature_status(&self, metric_name: &str, temperature: f32, status_tracker: &mut StatusTracker) -> Status {
status_tracker.calculate_with_hysteresis(metric_name, temperature, &self.temperature_thresholds)
/// Discover all storage using clean workflow: lsblk → df → group
fn discover_storage(&self) -> Result<StorageTopology> {
debug!("Starting storage discovery");
// Step 1: Get all mount points and their backing devices using lsblk
let mount_devices = self.get_mount_devices()?;
debug!("Found {} mount points", mount_devices.len());
// Step 2: Get filesystem usage for each mount point using df
let filesystem_usage = self.get_filesystem_usage(&mount_devices)?;
debug!("Got usage data for {} filesystems", filesystem_usage.len());
// Step 3: Detect mergerfs pools from /proc/mounts
let mergerfs_pools = self.discover_mergerfs_pools()?;
debug!("Found {} mergerfs pools", mergerfs_pools.len());
// Step 4: Group regular filesystems by physical drive
let physical_drives = self.group_by_physical_drive(&mount_devices, &filesystem_usage, &mergerfs_pools)?;
debug!("Grouped into {} physical drives", physical_drives.len());
Ok(StorageTopology {
physical_drives,
mergerfs_pools,
})
}
/// Get configured storage pools with individual drive information
fn get_configured_storage_pools(&self) -> Result<Vec<StoragePool>> {
let mut storage_pools = Vec::new();
for fs_config in &self.config.filesystems {
if !fs_config.monitor {
continue;
}
// Get filesystem stats for the mount point
match self.get_filesystem_info(&fs_config.mount_point) {
Ok((total_bytes, used_bytes)) => {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else {
0.0
};
// Convert bytes to human-readable format
let size = self.bytes_to_human_readable(total_bytes);
let used = self.bytes_to_human_readable(used_bytes);
let available = self.bytes_to_human_readable(available_bytes);
// Get individual drive information using pre-detected devices
let device_names = self.detected_devices.get(&fs_config.mount_point).cloned().unwrap_or_default();
let underlying_drives = self.get_drive_info_for_devices(&device_names)?;
storage_pools.push(StoragePool {
name: fs_config.name.clone(),
mount_point: fs_config.mount_point.clone(),
filesystem: fs_config.fs_type.clone(),
storage_type: fs_config.storage_type.clone(),
size,
used,
available,
usage_percent: usage_percent as f32,
underlying_drives,
});
debug!(
"Storage pool '{}' ({}) at {} with {} detected drives",
fs_config.name, fs_config.storage_type, fs_config.mount_point, device_names.len()
);
}
Err(e) => {
debug!(
"Failed to get filesystem info for storage pool '{}': {}",
fs_config.name, e
);
}
}
}
Ok(storage_pools)
}
/// Get drive information for a list of device names
fn get_drive_info_for_devices(&self, device_names: &[String]) -> Result<Vec<DriveInfo>> {
let mut drives = Vec::new();
for device_name in device_names {
let device_path = format!("/dev/{}", device_name);
// Get SMART data for this drive
let (health_status, temperature, wear_level) = self.get_smart_data(&device_path);
drives.push(DriveInfo {
device: device_name.clone(),
health_status: health_status.clone(),
temperature,
wear_level,
});
debug!(
"Drive info for {}: health={}, temp={:?}°C, wear={:?}%",
device_name, health_status, temperature, wear_level
);
}
Ok(drives)
}
/// Get SMART data for a drive (health, temperature, wear level)
fn get_smart_data(&self, device_path: &str) -> (String, Option<f32>, Option<f32>) {
// Try to get SMART data using smartctl
let output = Command::new("sudo")
.arg("smartctl")
.arg("-a")
.arg(device_path)
.output();
match output {
Ok(result) if result.status.success() => {
let stdout = String::from_utf8_lossy(&result.stdout);
// Parse health status
let health = if stdout.contains("PASSED") {
"PASSED".to_string()
} else if stdout.contains("FAILED") {
"FAILED".to_string()
} else {
"UNKNOWN".to_string()
};
// Parse temperature (look for various temperature indicators)
let temperature = self.parse_temperature_from_smart(&stdout);
// Parse wear level (for SSDs)
let wear_level = self.parse_wear_level_from_smart(&stdout);
(health, temperature, wear_level)
}
_ => {
debug!("Failed to get SMART data for {}", device_path);
("UNKNOWN".to_string(), None, None)
}
}
}
/// Parse temperature from SMART output
fn parse_temperature_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
// Look for temperature in various formats
if line.contains("Temperature_Celsius") || line.contains("Temperature") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if let Ok(temp) = parts[9].parse::<f32>() {
return Some(temp);
}
}
}
// NVMe drives might show temperature differently
if line.contains("temperature:") {
if let Some(temp_part) = line.split("temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
}
None
}
/// Parse wear level from SMART output (SSD wear leveling)
/// Supports both NVMe and SATA SSD wear indicators
fn parse_wear_level_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
let line = line.trim();
// NVMe drives - direct percentage used
if line.contains("Percentage Used:") {
if let Some(wear_part) = line.split("Percentage Used:").nth(1) {
if let Some(wear_str) = wear_part.split('%').next() {
if let Ok(wear) = wear_str.trim().parse::<f32>() {
return Some(wear);
}
}
}
}
// SATA SSD attributes - parse SMART table format
// Format: ID ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
// SSD Life Left / Percent Lifetime Remaining (higher = less wear)
if line.contains("SSD_Life_Left") || line.contains("Percent_Lifetime_Remain") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Media Wearout Indicator (lower = more wear, normalize to 0-100)
if line.contains("Media_Wearout_Indicator") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Wear Leveling Count (higher = less wear, but varies by manufacturer)
if line.contains("Wear_Leveling_Count") {
if let Ok(wear_count) = parts[3].parse::<f32>() { // VALUE column
// Most SSDs: 100 = new, decreases with wear
if wear_count <= 100.0 {
return Some(100.0 - wear_count);
}
}
}
// Total LBAs Written - calculate against typical endurance if available
// This is more complex and manufacturer-specific, so we skip for now
}
}
None
}
/// Convert bytes to human-readable format
fn bytes_to_human_readable(&self, bytes: u64) -> String {
const UNITS: &[&str] = &["B", "K", "M", "G", "T"];
let mut size = bytes as f64;
let mut unit_index = 0;
while size >= 1024.0 && unit_index < UNITS.len() - 1 {
size /= 1024.0;
unit_index += 1;
}
if unit_index == 0 {
format!("{:.0}{}", size, UNITS[unit_index])
} else {
format!("{:.1}{}", size, UNITS[unit_index])
}
}
/// Detect device backing a mount point using lsblk (static version for startup)
fn detect_device_for_mount_point_static(mount_point: &str) -> Result<Vec<String>> {
/// Use lsblk to get mount points and their backing devices
fn get_mount_devices(&self) -> Result<HashMap<String, String>> {
let output = Command::new("lsblk")
.args(&["-n", "-o", "NAME,MOUNTPOINT"])
.output()?;
if !output.status.success() {
return Ok(Vec::new());
return Err(anyhow::anyhow!("lsblk command failed"));
}
let mut mount_devices = HashMap::new();
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 && parts[1] == mount_point {
// Remove tree symbols and extract device name (e.g., "├─nvme0n1p2" -> "nvme0n1p2")
if parts.len() >= 2 {
let device_name = parts[0]
.trim_start_matches('├')
.trim_start_matches('└')
.trim_start_matches('─')
.trim();
.trim_start_matches(&['├', '└', '─', ' '][..]);
let mount_point = parts[1];
// Extract base device name (e.g., "nvme0n1p2" -> "nvme0n1")
if let Some(base_device) = Self::extract_base_device(device_name) {
return Ok(vec![base_device]);
// Skip unwanted mount points
if self.should_skip_mount_point(mount_point) {
continue;
}
mount_devices.insert(mount_point.to_string(), device_name.to_string());
}
}
Ok(mount_devices)
}
/// Check if we should skip this mount point
fn should_skip_mount_point(&self, mount_point: &str) -> bool {
let skip_prefixes = ["/proc", "/sys", "/dev", "/tmp", "/run"];
skip_prefixes.iter().any(|prefix| mount_point.starts_with(prefix))
}
/// Use df to get filesystem usage for mount points
fn get_filesystem_usage(&self, mount_devices: &HashMap<String, String>) -> Result<HashMap<String, (u64, u64)>> {
let mut filesystem_usage = HashMap::new();
for mount_point in mount_devices.keys() {
match self.get_filesystem_info(mount_point) {
Ok((total, used)) => {
filesystem_usage.insert(mount_point.clone(), (total, used));
}
Err(e) => {
debug!("Failed to get filesystem info for {}: {}", mount_point, e);
}
}
}
Ok(Vec::new())
Ok(filesystem_usage)
}
/// Extract base device name from partition (e.g., "nvme0n1p2" -> "nvme0n1", "sda1" -> "sda")
fn extract_base_device(device_name: &str) -> Option<String> {
// Handle NVMe devices (nvme0n1p1 -> nvme0n1)
if device_name.starts_with("nvme") {
if let Some(p_pos) = device_name.find('p') {
return Some(device_name[..p_pos].to_string());
}
}
// Handle traditional devices (sda1 -> sda)
if device_name.len() > 1 {
let chars: Vec<char> = device_name.chars().collect();
let mut end_idx = chars.len();
// Find where the device name ends and partition number begins
for (i, &c) in chars.iter().enumerate().rev() {
if !c.is_ascii_digit() {
end_idx = i + 1;
break;
}
}
if end_idx > 0 && end_idx < chars.len() {
return Some(chars[..end_idx].iter().collect());
}
}
// If no partition detected, return as-is
Some(device_name.to_string())
}
/// Get filesystem info using df command
fn get_filesystem_info(&self, path: &str) -> Result<(u64, u64)> {
let output = Command::new("df")
@@ -381,171 +187,795 @@ impl DiskCollector {
Ok((total_bytes, used_bytes))
}
/// Discover mergerfs pools from /proc/mounts
fn discover_mergerfs_pools(&self) -> Result<Vec<MergerfsPool>> {
let mounts_content = std::fs::read_to_string("/proc/mounts")?;
let mut pools = Vec::new();
/// Parse size string (e.g., "120G", "45M") to GB value
fn parse_size_to_gb(&self, size_str: &str) -> f32 {
let size_str = size_str.trim();
if size_str.is_empty() || size_str == "-" {
return 0.0;
}
for line in mounts_content.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 3 && parts[2] == "fuse.mergerfs" {
let mount_point = parts[1].to_string();
let device_sources = parts[0]; // e.g., "/mnt/disk1:/mnt/disk2"
// Extract numeric part and unit
let (num_str, unit) = if let Some(last_char) = size_str.chars().last() {
if last_char.is_alphabetic() {
let num_part = &size_str[..size_str.len() - 1];
let unit_part = &size_str[size_str.len() - 1..];
(num_part, unit_part)
// Get pool usage
let (total_bytes, used_bytes) = self.get_filesystem_info(&mount_point)
.unwrap_or((0, 0));
// Parse member paths - handle both full paths and numeric references
let raw_paths: Vec<String> = device_sources
.split(':')
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.collect();
// Convert numeric references to actual mount points if needed
let mut member_paths = if raw_paths.iter().any(|path| !path.starts_with('/')) {
// Handle numeric format like "1:2" by finding corresponding /mnt/disk* paths
self.resolve_numeric_mergerfs_paths(&raw_paths)?
} else {
(size_str, "")
}
} else {
(size_str, "")
// Already full paths
raw_paths
};
let number: f32 = num_str.parse().unwrap_or(0.0);
// For SnapRAID setups, include parity drives that are related to this pool's data drives
let related_parity_paths = self.discover_related_parity_drives(&member_paths)?;
member_paths.extend(related_parity_paths);
match unit.to_uppercase().as_str() {
"T" | "TB" => number * 1024.0,
"G" | "GB" => number,
"M" | "MB" => number / 1024.0,
"K" | "KB" => number / (1024.0 * 1024.0),
"B" | "" => number / (1024.0 * 1024.0 * 1024.0),
_ => number, // Assume GB if unknown unit
// Categorize as data vs parity drives
let (data_drives, parity_drives) = match self.categorize_pool_drives(&member_paths) {
Ok(drives) => drives,
Err(e) => {
debug!("Failed to categorize drives for pool {}: {}. Skipping.", mount_point, e);
continue;
}
};
pools.push(MergerfsPool {
mount_point,
total_bytes,
used_bytes,
data_drives,
parity_drives,
});
}
}
Ok(pools)
}
/// Discover parity drives that are related to the given data drives
fn discover_related_parity_drives(&self, data_drives: &[String]) -> Result<Vec<String>> {
let mount_devices = self.get_mount_devices()?;
let mut related_parity = Vec::new();
// Find parity drives that share the same parent directory as the data drives
for data_path in data_drives {
if let Some(parent_dir) = self.get_parent_directory(data_path) {
// Look for parity drives in the same parent directory
for (mount_point, _device) in &mount_devices {
if mount_point.contains("parity") && mount_point.starts_with(&parent_dir) {
if !related_parity.contains(mount_point) {
related_parity.push(mount_point.clone());
}
}
}
}
}
Ok(related_parity)
}
/// Get parent directory of a mount path (e.g., "/mnt/disk1" -> "/mnt")
fn get_parent_directory(&self, path: &str) -> Option<String> {
if let Some(last_slash) = path.rfind('/') {
if last_slash > 0 {
return Some(path[..last_slash].to_string());
}
}
None
}
/// Categorize pool member drives as data vs parity
fn categorize_pool_drives(&self, member_paths: &[String]) -> Result<(Vec<DriveInfo>, Vec<DriveInfo>)> {
let mut data_drives = Vec::new();
let mut parity_drives = Vec::new();
for path in member_paths {
let drive_info = self.get_drive_info_for_path(path)?;
// Heuristic: if path contains "parity", it's parity
if path.to_lowercase().contains("parity") {
parity_drives.push(drive_info);
} else {
data_drives.push(drive_info);
}
}
Ok((data_drives, parity_drives))
}
/// Get drive information for a mount path
fn get_drive_info_for_path(&self, path: &str) -> Result<DriveInfo> {
// Use lsblk to find the backing device
let output = Command::new("lsblk")
.args(&["-n", "-o", "NAME,MOUNTPOINT"])
.output()?;
let output_str = String::from_utf8_lossy(&output.stdout);
let mut device = String::new();
for line in output_str.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 && parts[1] == path {
device = parts[0]
.trim_start_matches('├')
.trim_start_matches('└')
.trim_start_matches('─')
.trim()
.to_string();
break;
}
}
if device.is_empty() {
return Err(anyhow::anyhow!("Could not find device for path {}", path));
}
// Extract base device name (e.g., "sda1" -> "sda")
let base_device = self.extract_base_device(&device);
// Get SMART data
let (health, temperature, wear) = self.get_smart_data(&format!("/dev/{}", base_device));
Ok(DriveInfo {
device: base_device,
mount_point: path.to_string(),
temperature,
wear_level: wear,
health_status: health,
})
}
/// Resolve numeric mergerfs references like "1:2" to actual mount paths
fn resolve_numeric_mergerfs_paths(&self, numeric_refs: &[String]) -> Result<Vec<String>> {
let mut resolved_paths = Vec::new();
// Get all mount points that look like /mnt/disk* or /mnt/parity*
let mount_devices = self.get_mount_devices()?;
let mut disk_mounts: Vec<String> = mount_devices.keys()
.filter(|path| path.starts_with("/mnt/disk") || path.starts_with("/mnt/parity"))
.cloned()
.collect();
disk_mounts.sort(); // Ensure consistent ordering
for num_ref in numeric_refs {
if let Ok(index) = num_ref.parse::<usize>() {
// Convert 1-based index to 0-based
if index > 0 && index <= disk_mounts.len() {
resolved_paths.push(disk_mounts[index - 1].clone());
}
}
}
// Fallback: if we couldn't resolve, return the original paths
if resolved_paths.is_empty() {
resolved_paths = numeric_refs.to_vec();
}
Ok(resolved_paths)
}
/// Extract base device name from partition (e.g., "nvme0n1p2" -> "nvme0n1", "sda1" -> "sda")
fn extract_base_device(&self, device_name: &str) -> String {
// Handle NVMe devices (nvme0n1p1 -> nvme0n1)
if device_name.starts_with("nvme") {
if let Some(p_pos) = device_name.find('p') {
return device_name[..p_pos].to_string();
}
}
// Handle traditional devices (sda1 -> sda)
if device_name.len() > 1 {
let chars: Vec<char> = device_name.chars().collect();
let mut end_idx = chars.len();
// Find where the device name ends and partition number begins
for (i, &c) in chars.iter().enumerate().rev() {
if !c.is_ascii_digit() {
end_idx = i + 1;
break;
}
}
if end_idx > 0 && end_idx < chars.len() {
return chars[..end_idx].iter().collect();
}
}
// If no partition detected, return as-is
device_name.to_string()
}
/// Group filesystems by physical drive (excluding mergerfs members)
fn group_by_physical_drive(
&self,
mount_devices: &HashMap<String, String>,
filesystem_usage: &HashMap<String, (u64, u64)>,
mergerfs_pools: &[MergerfsPool]
) -> Result<Vec<PhysicalDrive>> {
let mut drive_groups: HashMap<String, Vec<Filesystem>> = HashMap::new();
// Get all mergerfs member paths to exclude them
let mut mergerfs_members = std::collections::HashSet::new();
for pool in mergerfs_pools {
for drive in &pool.data_drives {
mergerfs_members.insert(drive.mount_point.clone());
}
for drive in &pool.parity_drives {
mergerfs_members.insert(drive.mount_point.clone());
}
}
// Group filesystems by base device
for (mount_point, device) in mount_devices {
// Skip mergerfs member mounts
if mergerfs_members.contains(mount_point) {
continue;
}
let base_device = self.extract_base_device(device);
if let Some((total, used)) = filesystem_usage.get(mount_point) {
let filesystem = Filesystem {
mount_point: mount_point.clone(),
total_bytes: *total,
used_bytes: *used,
};
drive_groups.entry(base_device).or_insert_with(Vec::new).push(filesystem);
}
}
// Convert to PhysicalDrive structs with SMART data
let mut physical_drives = Vec::new();
for (device, filesystems) in drive_groups {
let (health, temperature, wear) = self.get_smart_data(&format!("/dev/{}", device));
physical_drives.push(PhysicalDrive {
device,
filesystems,
temperature,
wear_level: wear,
health_status: health,
});
}
Ok(physical_drives)
}
/// Get SMART data for a drive
fn get_smart_data(&self, device_path: &str) -> (String, Option<f32>, Option<f32>) {
let output = Command::new("sudo")
.arg("smartctl")
.arg("-a")
.arg(device_path)
.output();
match output {
Ok(result) if result.status.success() => {
let stdout = String::from_utf8_lossy(&result.stdout);
// Parse health status
let health = if stdout.contains("PASSED") {
"PASSED".to_string()
} else if stdout.contains("FAILED") {
"FAILED".to_string()
} else {
"UNKNOWN".to_string()
};
// Parse temperature and wear level
let temperature = self.parse_temperature_from_smart(&stdout);
let wear_level = self.parse_wear_level_from_smart(&stdout);
(health, temperature, wear_level)
}
_ => {
debug!("Failed to get SMART data for {}", device_path);
("UNKNOWN".to_string(), None, None)
}
}
}
/// Parse temperature from SMART output
fn parse_temperature_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
if line.contains("Temperature_Celsius") || line.contains("Temperature") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if let Ok(temp) = parts[9].parse::<f32>() {
return Some(temp);
}
}
}
// NVMe format: "Temperature:" (capital T)
if line.contains("Temperature:") {
if let Some(temp_part) = line.split("Temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
// Legacy format: "temperature:" (lowercase)
if line.contains("temperature:") {
if let Some(temp_part) = line.split("temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
}
None
}
/// Parse wear level from SMART output
fn parse_wear_level_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
if line.contains("Percentage Used:") {
if let Some(wear_part) = line.split("Percentage Used:").nth(1) {
if let Some(wear_str) = wear_part.split('%').next() {
if let Ok(wear) = wear_str.trim().parse::<f32>() {
return Some(wear);
}
}
}
}
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if line.contains("SSD_Life_Left") || line.contains("Percent_Lifetime_Remain") {
if let Ok(remaining) = parts[3].parse::<f32>() {
return Some(100.0 - remaining);
}
}
if line.contains("Wear_Leveling_Count") {
if let Ok(wear_count) = parts[3].parse::<f32>() {
if wear_count <= 100.0 {
return Some(100.0 - wear_count);
}
}
}
}
}
None
}
/// Calculate temperature status with hysteresis
fn calculate_temperature_status(&self, metric_name: &str, temperature: f32, status_tracker: &mut StatusTracker) -> Status {
status_tracker.calculate_with_hysteresis(metric_name, temperature, &self.temperature_thresholds)
}
/// Convert bytes to human readable format
fn bytes_to_human_readable(&self, bytes: u64) -> String {
const UNITS: &[&str] = &["B", "K", "M", "G", "T"];
let mut size = bytes as f64;
let mut unit_index = 0;
while size >= 1024.0 && unit_index < UNITS.len() - 1 {
size /= 1024.0;
unit_index += 1;
}
if unit_index == 0 {
format!("{:.0}{}", size, UNITS[unit_index])
} else {
format!("{:.1}{}", size, UNITS[unit_index])
}
}
/// Convert bytes to gigabytes
fn bytes_to_gb(&self, bytes: u64) -> f32 {
bytes as f32 / (1024.0 * 1024.0 * 1024.0)
}
}
#[async_trait]
impl Collector for DiskCollector {
async fn collect(&self, status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> {
let start_time = Instant::now();
debug!("Collecting storage pool and individual drive metrics");
debug!("Starting clean storage collection");
let mut metrics = Vec::new();
let timestamp = chrono::Utc::now().timestamp() as u64;
// Get configured storage pools with individual drive data
let storage_pools = match self.get_configured_storage_pools() {
Ok(pools) => {
debug!("Found {} storage pools", pools.len());
pools
}
// Discover storage topology
let topology = match self.discover_storage() {
Ok(topology) => topology,
Err(e) => {
debug!("Failed to get storage pools: {}", e);
Vec::new()
tracing::error!("Storage discovery failed: {}", e);
return Ok(metrics);
}
};
// Generate metrics for each storage pool and its underlying drives
for storage_pool in &storage_pools {
let timestamp = chrono::Utc::now().timestamp() as u64;
// Generate metrics for physical drives
for drive in &topology.physical_drives {
self.generate_physical_drive_metrics(&mut metrics, drive, timestamp, status_tracker);
}
// Storage pool overall metrics
let pool_name = &storage_pool.name;
// Generate metrics for mergerfs pools
for pool in &topology.mergerfs_pools {
self.generate_mergerfs_pool_metrics(&mut metrics, pool, timestamp, status_tracker);
}
// Parse size strings to get actual values for calculations
let size_gb = self.parse_size_to_gb(&storage_pool.size);
let used_gb = self.parse_size_to_gb(&storage_pool.used);
let avail_gb = self.parse_size_to_gb(&storage_pool.available);
// Add total storage count
let total_storage = topology.physical_drives.len() + topology.mergerfs_pools.len();
metrics.push(Metric {
name: "disk_count".to_string(),
value: MetricValue::Integer(total_storage as i64),
unit: None,
description: Some(format!("Total storage: {} drives, {} pools", topology.physical_drives.len(), topology.mergerfs_pools.len())),
status: Status::Ok,
timestamp,
});
// Calculate status based on configured thresholds
let pool_status = if storage_pool.usage_percent >= self.config.usage_critical_percent {
let collection_time = start_time.elapsed();
debug!("Clean storage collection completed in {:?} with {} metrics", collection_time, metrics.len());
Ok(metrics)
}
}
impl DiskCollector {
/// Generate metrics for a physical drive and its filesystems
fn generate_physical_drive_metrics(
&self,
metrics: &mut Vec<Metric>,
drive: &PhysicalDrive,
timestamp: u64,
status_tracker: &mut StatusTracker
) {
let drive_name = &drive.device;
// Calculate drive totals
let total_capacity: u64 = drive.filesystems.iter().map(|fs| fs.total_bytes).sum();
let total_used: u64 = drive.filesystems.iter().map(|fs| fs.used_bytes).sum();
let total_available = total_capacity.saturating_sub(total_used);
let usage_percent = if total_capacity > 0 {
(total_used as f64 / total_capacity as f64) * 100.0
} else { 0.0 };
// Drive health status
let health_status = if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown };
// Usage status
let usage_status = if usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if storage_pool.usage_percent >= self.config.usage_warning_percent {
} else if usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
// Storage pool info metrics
metrics.push(Metric {
name: format!("disk_{}_mount_point", pool_name),
value: MetricValue::String(storage_pool.mount_point.clone()),
unit: None,
description: Some(format!("Mount: {}", storage_pool.mount_point)),
status: Status::Ok,
timestamp,
});
let drive_status = if health_status == Status::Critical { Status::Critical } else { usage_status };
// Drive info metrics
metrics.push(Metric {
name: format!("disk_{}_filesystem", pool_name),
value: MetricValue::String(storage_pool.filesystem.clone()),
unit: None,
description: Some(format!("FS: {}", storage_pool.filesystem)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_storage_type", pool_name),
value: MetricValue::String(storage_pool.storage_type.clone()),
unit: None,
description: Some(format!("Type: {}", storage_pool.storage_type)),
status: Status::Ok,
timestamp,
});
// Storage pool size metrics
metrics.push(Metric {
name: format!("disk_{}_total_gb", pool_name),
value: MetricValue::Float(size_gb),
unit: Some("GB".to_string()),
description: Some(format!("Total: {}", storage_pool.size)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", pool_name),
value: MetricValue::Float(used_gb),
unit: Some("GB".to_string()),
description: Some(format!("Used: {}", storage_pool.used)),
status: pool_status,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_available_gb", pool_name),
value: MetricValue::Float(avail_gb),
unit: Some("GB".to_string()),
description: Some(format!("Available: {}", storage_pool.available)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", pool_name),
value: MetricValue::Float(storage_pool.usage_percent),
unit: Some("%".to_string()),
description: Some(format!("Usage: {:.1}%", storage_pool.usage_percent)),
status: pool_status,
timestamp,
});
// Individual drive metrics for this storage pool
for drive in &storage_pool.underlying_drives {
// Drive health status
metrics.push(Metric {
name: format!("disk_{}_{}_health", pool_name, drive.device),
name: format!("disk_{}_health", drive_name),
value: MetricValue::String(drive.health_status.clone()),
unit: None,
description: Some(format!("{}: {}", drive.device, drive.health_status)),
status: if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown },
description: Some(format!("{}: {}", drive_name, drive.health_status)),
status: health_status,
timestamp,
});
// Drive temperature
if let Some(temp) = drive.temperature {
let temp_status = self.calculate_temperature_status(
&format!("disk_{}_{}_temperature", pool_name, drive.device),
temp,
status_tracker
&format!("disk_{}_temperature", drive_name), temp, status_tracker
);
metrics.push(Metric {
name: format!("disk_{}_temperature", drive_name),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive_name, temp)),
status: temp_status,
timestamp,
});
}
// Drive wear level
if let Some(wear) = drive.wear_level {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok };
metrics.push(Metric {
name: format!("disk_{}_wear_percent", drive_name),
value: MetricValue::Float(wear),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive_name, wear)),
status: wear_status,
timestamp,
});
}
// Drive capacity metrics
metrics.push(Metric {
name: format!("disk_{}_total_gb", drive_name),
value: MetricValue::Float(self.bytes_to_gb(total_capacity)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", drive_name, self.bytes_to_human_readable(total_capacity))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_{}_temperature", pool_name, drive.device),
name: format!("disk_{}_used_gb", drive_name),
value: MetricValue::Float(self.bytes_to_gb(total_used)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", drive_name, self.bytes_to_human_readable(total_used))),
status: drive_status.clone(),
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_available_gb", drive_name),
value: MetricValue::Float(self.bytes_to_gb(total_available)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", drive_name, self.bytes_to_human_readable(total_available))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", drive_name),
value: MetricValue::Float(usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.1}%", drive_name, usage_percent)),
status: drive_status,
timestamp,
});
// Pool type indicator
metrics.push(Metric {
name: format!("disk_{}_pool_type", drive_name),
value: MetricValue::String(format!("drive ({})", drive.filesystems.len())),
unit: None,
description: Some(format!("Type: physical drive")),
status: Status::Ok,
timestamp,
});
// Individual filesystem metrics
for filesystem in &drive.filesystems {
let fs_name = if filesystem.mount_point == "/" {
"root".to_string()
} else {
filesystem.mount_point.trim_start_matches('/').replace('/', "_")
};
let fs_usage_percent = if filesystem.total_bytes > 0 {
(filesystem.used_bytes as f64 / filesystem.total_bytes as f64) * 100.0
} else { 0.0 };
let fs_status = if fs_usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if fs_usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
metrics.push(Metric {
name: format!("disk_{}_fs_{}_usage_percent", drive_name, fs_name),
value: MetricValue::Float(fs_usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}%", filesystem.mount_point, fs_usage_percent)),
status: fs_status.clone(),
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_used_gb", drive_name, fs_name),
value: MetricValue::Float(self.bytes_to_gb(filesystem.used_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", filesystem.mount_point, self.bytes_to_human_readable(filesystem.used_bytes))),
status: fs_status.clone(),
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_total_gb", drive_name, fs_name),
value: MetricValue::Float(self.bytes_to_gb(filesystem.total_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", filesystem.mount_point, self.bytes_to_human_readable(filesystem.total_bytes))),
status: fs_status.clone(),
timestamp,
});
let fs_available = filesystem.total_bytes.saturating_sub(filesystem.used_bytes);
metrics.push(Metric {
name: format!("disk_{}_fs_{}_available_gb", drive_name, fs_name),
value: MetricValue::Float(self.bytes_to_gb(fs_available)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}", filesystem.mount_point, self.bytes_to_human_readable(fs_available))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_mount_point", drive_name, fs_name),
value: MetricValue::String(filesystem.mount_point.clone()),
unit: None,
description: Some(format!("Mount: {}", filesystem.mount_point)),
status: Status::Ok,
timestamp,
});
}
}
/// Generate metrics for a mergerfs pool
fn generate_mergerfs_pool_metrics(
&self,
metrics: &mut Vec<Metric>,
pool: &MergerfsPool,
timestamp: u64,
status_tracker: &mut StatusTracker
) {
// Use consistent pool naming: extract mount point without leading slash
let pool_name = if pool.mount_point == "/" {
"root".to_string()
} else {
pool.mount_point.trim_start_matches('/').replace('/', "_")
};
if pool_name.is_empty() {
return;
}
let usage_percent = if pool.total_bytes > 0 {
(pool.used_bytes as f64 / pool.total_bytes as f64) * 100.0
} else { 0.0 };
// Calculate pool health based on drive health
let failed_data = pool.data_drives.iter()
.filter(|d| d.health_status != "PASSED")
.count();
let failed_parity = pool.parity_drives.iter()
.filter(|d| d.health_status != "PASSED")
.count();
let pool_health = match (failed_data, failed_parity) {
(0, 0) => Status::Ok,
(1, 0) | (0, 1) => Status::Warning,
_ => Status::Critical,
};
let usage_status = if usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
let pool_status = if pool_health == Status::Critical { Status::Critical } else { usage_status };
// Pool metrics
metrics.push(Metric {
name: format!("disk_{}_mount_point", pool_name),
value: MetricValue::String(pool.mount_point.clone()),
unit: None,
description: Some(format!("Mount: {}", pool.mount_point)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_pool_type", pool_name),
value: MetricValue::String(format!("mergerfs ({}+{})", pool.data_drives.len(), pool.parity_drives.len())),
unit: None,
description: Some("Type: mergerfs".to_string()),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_pool_health", pool_name),
value: MetricValue::String(match pool_health {
Status::Ok => "healthy".to_string(),
Status::Warning => "degraded".to_string(),
Status::Critical => "critical".to_string(),
_ => "unknown".to_string(),
}),
unit: None,
description: Some("Pool health".to_string()),
status: pool_health,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_total_gb", pool_name),
value: MetricValue::Float(self.bytes_to_gb(pool.total_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("Total: {}", self.bytes_to_human_readable(pool.total_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", pool_name),
value: MetricValue::Float(self.bytes_to_gb(pool.used_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("Used: {}", self.bytes_to_human_readable(pool.used_bytes))),
status: pool_status.clone(),
timestamp,
});
let available_bytes = pool.total_bytes.saturating_sub(pool.used_bytes);
metrics.push(Metric {
name: format!("disk_{}_available_gb", pool_name),
value: MetricValue::Float(self.bytes_to_gb(available_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("Available: {}", self.bytes_to_human_readable(available_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", pool_name),
value: MetricValue::Float(usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("Usage: {:.1}%", usage_percent)),
status: pool_status,
timestamp,
});
// Individual drive metrics
for drive in &pool.data_drives {
self.generate_pool_drive_metrics(metrics, &pool_name, &drive.device, drive, timestamp, status_tracker);
}
for drive in &pool.parity_drives {
self.generate_pool_drive_metrics(metrics, &pool_name, &drive.device, drive, timestamp, status_tracker);
}
}
/// Generate metrics for drives in mergerfs pools
fn generate_pool_drive_metrics(
&self,
metrics: &mut Vec<Metric>,
pool_name: &str,
drive_role: &str,
drive: &DriveInfo,
timestamp: u64,
status_tracker: &mut StatusTracker
) {
let drive_health = if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown };
metrics.push(Metric {
name: format!("disk_{}_{}_health", pool_name, drive_role),
value: MetricValue::String(drive.health_status.clone()),
unit: None,
description: Some(format!("{}: {}", drive.device, drive.health_status)),
status: drive_health,
timestamp,
});
if let Some(temp) = drive.temperature {
let temp_status = self.calculate_temperature_status(
&format!("disk_{}_{}_temperature", pool_name, drive_role), temp, status_tracker
);
metrics.push(Metric {
name: format!("disk_{}_{}_temperature", pool_name, drive_role),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive.device, temp)),
@@ -554,14 +984,12 @@ impl Collector for DiskCollector {
});
}
// Drive wear level (for SSDs)
if let Some(wear) = drive.wear_level {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok };
metrics.push(Metric {
name: format!("disk_{}_{}_wear_percent", pool_name, drive.device),
name: format!("disk_{}_{}_wear_percent", pool_name, drive_role),
value: MetricValue::Float(wear),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive.device, wear)),
@@ -571,26 +999,3 @@ impl Collector for DiskCollector {
}
}
}
// Add storage pool count metric
metrics.push(Metric {
name: "disk_count".to_string(),
value: MetricValue::Integer(storage_pools.len() as i64),
unit: None,
description: Some(format!("Total storage pools: {}", storage_pools.len())),
status: Status::Ok,
timestamp: chrono::Utc::now().timestamp() as u64,
});
let collection_time = start_time.elapsed();
debug!(
"Multi-disk collection completed in {:?} with {} metrics",
collection_time,
metrics.len()
);
Ok(metrics)
}
}

View File

@@ -0,0 +1,1327 @@
use anyhow::Result;
use async_trait::async_trait;
use cm_dashboard_shared::{Metric, MetricValue, Status, StatusTracker, HysteresisThresholds};
use crate::config::DiskConfig;
use std::process::Command;
use std::time::Instant;
use std::fs;
use tracing::debug;
use super::{Collector, CollectorError};
/// Mount point information from /proc/mounts
#[derive(Debug, Clone)]
struct MountInfo {
device: String, // e.g., "/dev/sda1" or "/mnt/disk1:/mnt/disk2"
mount_point: String, // e.g., "/", "/srv/media"
fs_type: String, // e.g., "ext4", "xfs", "fuse.mergerfs"
}
/// Auto-discovered storage topology
#[derive(Debug, Clone)]
struct StorageTopology {
single_disks: Vec<MountInfo>,
mergerfs_pools: Vec<MergerfsPoolInfo>,
}
/// MergerFS pool information
#[derive(Debug, Clone)]
struct MergerfsPoolInfo {
mount_point: String, // e.g., "/srv/media"
data_members: Vec<String>, // e.g., ["/mnt/disk1", "/mnt/disk2"]
parity_disks: Vec<String>, // e.g., ["/mnt/parity"]
}
/// Information about a storage pool (mount point with underlying drives)
#[derive(Debug, Clone)]
struct StoragePool {
name: String, // e.g., "steampool", "root"
mount_point: String, // e.g., "/mnt/steampool", "/"
filesystem: String, // e.g., "mergerfs", "ext4", "zfs", "btrfs"
pool_type: StoragePoolType, // Enhanced pool type with configuration
size: String, // e.g., "2.5TB"
used: String, // e.g., "2.1TB"
available: String, // e.g., "400GB"
usage_percent: f32, // e.g., 85.0
underlying_drives: Vec<DriveInfo>, // Individual physical drives
pool_health: PoolHealth, // Overall pool health status
}
/// Enhanced storage pool types with specific configurations
#[derive(Debug, Clone)]
enum StoragePoolType {
Single, // Traditional single disk (legacy)
PhysicalDrive { // Physical drive with multiple filesystems
filesystems: Vec<String>, // Mount points on this drive
},
MergerfsPool { // MergerFS with optional parity
data_disks: Vec<String>, // Member disk names (sdb, sdd)
parity_disks: Vec<String>, // Parity disk names (sdc)
},
#[allow(dead_code)]
RaidArray { // Hardware RAID (future)
level: String, // "RAID1", "RAID5", etc.
member_disks: Vec<String>,
spare_disks: Vec<String>,
},
#[allow(dead_code)]
ZfsPool { // ZFS pool (future)
pool_name: String,
vdevs: Vec<String>,
}
}
/// Pool health status for redundant storage
#[derive(Debug, Clone, Copy, PartialEq)]
enum PoolHealth {
Healthy, // All drives OK, parity current
Degraded, // One drive failed or parity outdated, still functional
Critical, // Multiple failures, data at risk
#[allow(dead_code)]
Rebuilding, // Actively rebuilding/scrubbing (future: SnapRAID status integration)
Unknown, // Cannot determine status
}
/// Information about an individual physical drive
#[derive(Debug, Clone)]
struct DriveInfo {
device: String, // e.g., "sda", "nvme0n1"
health_status: String, // e.g., "PASSED", "FAILED"
temperature: Option<f32>, // e.g., 45.0°C
wear_level: Option<f32>, // e.g., 12.0% (for SSDs)
}
/// Disk usage collector for monitoring filesystem sizes
pub struct DiskCollector {
config: DiskConfig,
temperature_thresholds: HysteresisThresholds,
detected_devices: std::collections::HashMap<String, Vec<String>>, // mount_point -> devices
storage_topology: Option<StorageTopology>, // Auto-discovered storage layout
}
impl DiskCollector {
pub fn new(config: DiskConfig) -> Self {
// Create hysteresis thresholds for disk temperature from config
let temperature_thresholds = HysteresisThresholds::with_custom_gaps(
config.temperature_warning_celsius,
5.0, // 5°C gap for recovery
config.temperature_critical_celsius,
5.0, // 5°C gap for recovery
);
// Perform auto-discovery of storage topology
let storage_topology = match Self::auto_discover_storage() {
Ok(topology) => {
debug!("Auto-discovered storage topology: {} single disks, {} mergerfs pools",
topology.single_disks.len(), topology.mergerfs_pools.len());
Some(topology)
}
Err(e) => {
debug!("Failed to auto-discover storage topology: {}", e);
None
}
};
// Detect devices for discovered storage
let mut detected_devices = std::collections::HashMap::new();
if let Some(ref topology) = storage_topology {
// Add single disks
for disk in &topology.single_disks {
if let Ok(devices) = Self::detect_device_for_mount_point_static(&disk.mount_point) {
detected_devices.insert(disk.mount_point.clone(), devices);
}
}
// Add mergerfs pools and their members
for pool in &topology.mergerfs_pools {
// Detect devices for the pool itself
if let Ok(devices) = Self::detect_device_for_mount_point_static(&pool.mount_point) {
detected_devices.insert(pool.mount_point.clone(), devices);
}
// Detect devices for member disks
for member in &pool.data_members {
if let Ok(devices) = Self::detect_device_for_mount_point_static(member) {
detected_devices.insert(member.clone(), devices);
}
}
// Detect devices for parity disks
for parity in &pool.parity_disks {
if let Ok(devices) = Self::detect_device_for_mount_point_static(parity) {
detected_devices.insert(parity.clone(), devices);
}
}
}
} else {
// Fallback: use legacy filesystem config detection
for fs_config in &config.filesystems {
if fs_config.monitor {
if let Ok(devices) = Self::detect_device_for_mount_point_static(&fs_config.mount_point) {
detected_devices.insert(fs_config.mount_point.clone(), devices);
}
}
}
}
Self {
config,
temperature_thresholds,
detected_devices,
storage_topology,
}
}
/// Auto-discover storage topology by parsing system information
fn auto_discover_storage() -> Result<StorageTopology> {
let mounts = Self::parse_proc_mounts()?;
let mut single_disks = Vec::new();
let mut mergerfs_pools = Vec::new();
// Filter out unwanted filesystem types and mount points
let exclude_fs_types = ["tmpfs", "devtmpfs", "sysfs", "proc", "cgroup", "cgroup2", "devpts"];
let exclude_mount_prefixes = ["/proc", "/sys", "/dev", "/tmp", "/run"];
for mount in mounts {
// Skip excluded filesystem types
if exclude_fs_types.contains(&mount.fs_type.as_str()) {
continue;
}
// Skip excluded mount point prefixes
if exclude_mount_prefixes.iter().any(|prefix| mount.mount_point.starts_with(prefix)) {
continue;
}
match mount.fs_type.as_str() {
"fuse.mergerfs" => {
// Parse mergerfs pool
let data_members = Self::parse_mergerfs_sources(&mount.device);
let parity_disks = Self::detect_parity_disks(&data_members);
mergerfs_pools.push(MergerfsPoolInfo {
mount_point: mount.mount_point.clone(),
data_members,
parity_disks,
});
debug!("Discovered mergerfs pool at {}", mount.mount_point);
}
"ext4" | "xfs" | "btrfs" | "ntfs" | "vfat" => {
// Check if this mount is part of a mergerfs pool
let is_mergerfs_member = mergerfs_pools.iter()
.any(|pool| pool.data_members.contains(&mount.mount_point) ||
pool.parity_disks.contains(&mount.mount_point));
if !is_mergerfs_member {
debug!("Discovered single disk at {}", mount.mount_point);
single_disks.push(mount);
}
}
_ => {
debug!("Skipping unsupported filesystem type: {}", mount.fs_type);
}
}
}
Ok(StorageTopology {
single_disks,
mergerfs_pools,
})
}
/// Parse /proc/mounts to get all mount information
fn parse_proc_mounts() -> Result<Vec<MountInfo>> {
let mounts_content = fs::read_to_string("/proc/mounts")?;
let mut mounts = Vec::new();
for line in mounts_content.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 3 {
mounts.push(MountInfo {
device: parts[0].to_string(),
mount_point: parts[1].to_string(),
fs_type: parts[2].to_string(),
});
}
}
Ok(mounts)
}
/// Parse mergerfs source string to extract member paths
fn parse_mergerfs_sources(source: &str) -> Vec<String> {
// MergerFS source format: "/mnt/disk1:/mnt/disk2:/mnt/disk3"
source.split(':')
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.collect()
}
/// Detect potential parity disks based on data member heuristics
fn detect_parity_disks(data_members: &[String]) -> Vec<String> {
let mut parity_disks = Vec::new();
// Heuristic 1: Look for mount points with "parity" in the name
if let Ok(mounts) = Self::parse_proc_mounts() {
for mount in mounts {
if mount.mount_point.to_lowercase().contains("parity") &&
(mount.fs_type == "xfs" || mount.fs_type == "ext4") {
debug!("Detected parity disk by name: {}", mount.mount_point);
parity_disks.push(mount.mount_point);
}
}
}
// Heuristic 2: Look for sequential device pattern
// If data members are /mnt/disk1, /mnt/disk2, look for /mnt/disk* that's not in data
if parity_disks.is_empty() {
if let Some(pattern) = Self::extract_mount_pattern(data_members) {
if let Ok(mounts) = Self::parse_proc_mounts() {
for mount in mounts {
if mount.mount_point.starts_with(&pattern) &&
!data_members.contains(&mount.mount_point) &&
(mount.fs_type == "xfs" || mount.fs_type == "ext4") {
debug!("Detected parity disk by pattern: {}", mount.mount_point);
parity_disks.push(mount.mount_point);
}
}
}
}
}
parity_disks
}
/// Extract common mount point pattern from data members
fn extract_mount_pattern(data_members: &[String]) -> Option<String> {
if data_members.is_empty() {
return None;
}
// Find common prefix (e.g., "/mnt/disk" from "/mnt/disk1", "/mnt/disk2")
let first = &data_members[0];
if let Some(last_slash) = first.rfind('/') {
let base = &first[..last_slash + 1]; // Include the slash
// Check if all members share this base
if data_members.iter().all(|member| member.starts_with(base)) {
return Some(base.to_string());
}
}
None
}
/// Calculate disk temperature status using hysteresis thresholds
fn calculate_temperature_status(&self, metric_name: &str, temperature: f32, status_tracker: &mut StatusTracker) -> Status {
status_tracker.calculate_with_hysteresis(metric_name, temperature, &self.temperature_thresholds)
}
/// Get storage pools using auto-discovered topology or fallback to configuration
fn get_configured_storage_pools(&self) -> Result<Vec<StoragePool>> {
if let Some(ref topology) = self.storage_topology {
self.get_auto_discovered_storage_pools(topology)
} else {
self.get_legacy_configured_storage_pools()
}
}
/// Get storage pools from auto-discovered topology
fn get_auto_discovered_storage_pools(&self, topology: &StorageTopology) -> Result<Vec<StoragePool>> {
let mut storage_pools = Vec::new();
// Group single disks by physical drive for unified pool display
let grouped_disks = self.group_filesystems_by_physical_drive(&topology.single_disks)?;
// Process grouped single disks (each physical drive becomes a pool)
for (drive_name, filesystems) in grouped_disks {
// Create a unified pool for this physical drive
let pool = self.create_physical_drive_pool(&drive_name, &filesystems)?;
storage_pools.push(pool);
}
// IMPORTANT: Do not create individual filesystem pools when using auto-discovery
// All single disk filesystems should be grouped into physical drive pools above
// Process mergerfs pools (these remain as logical pools)
for pool_info in &topology.mergerfs_pools {
if let Ok((total_bytes, used_bytes)) = self.get_filesystem_info(&pool_info.mount_point) {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else { 0.0 };
let size = self.bytes_to_human_readable(total_bytes);
let used = self.bytes_to_human_readable(used_bytes);
let available = self.bytes_to_human_readable(available_bytes);
// Collect all member and parity drives
let mut all_drives = Vec::new();
// Add data member drives
for member in &pool_info.data_members {
if let Some(devices) = self.detected_devices.get(member) {
all_drives.extend(devices.clone());
}
}
// Add parity drives
for parity in &pool_info.parity_disks {
if let Some(devices) = self.detected_devices.get(parity) {
all_drives.extend(devices.clone());
}
}
let underlying_drives = self.get_drive_info_for_devices(&all_drives)?;
// Calculate pool health
let pool_health = self.calculate_mergerfs_pool_health(&pool_info.data_members, &pool_info.parity_disks, &underlying_drives);
// Generate pool name from mount point
let name = pool_info.mount_point.trim_start_matches('/').replace('/', "_");
storage_pools.push(StoragePool {
name,
mount_point: pool_info.mount_point.clone(),
filesystem: "fuse.mergerfs".to_string(),
pool_type: StoragePoolType::MergerfsPool {
data_disks: pool_info.data_members.iter()
.filter_map(|member| self.detected_devices.get(member).and_then(|devices| devices.first().cloned()))
.collect(),
parity_disks: pool_info.parity_disks.iter()
.filter_map(|parity| self.detected_devices.get(parity).and_then(|devices| devices.first().cloned()))
.collect(),
},
size,
used,
available,
usage_percent: usage_percent as f32,
underlying_drives,
pool_health,
});
debug!("Auto-discovered mergerfs pool: {} with {} data + {} parity disks",
pool_info.mount_point, pool_info.data_members.len(), pool_info.parity_disks.len());
}
}
Ok(storage_pools)
}
/// Group filesystems by their backing physical drive
fn group_filesystems_by_physical_drive(&self, filesystems: &[MountInfo]) -> Result<std::collections::HashMap<String, Vec<MountInfo>>> {
let mut grouped = std::collections::HashMap::new();
for fs in filesystems {
// Get the physical drive name for this mount point
if let Some(devices) = self.detected_devices.get(&fs.mount_point) {
if let Some(device_name) = devices.first() {
// Extract base drive name from detected device
let drive_name = Self::extract_base_device(device_name)
.unwrap_or_else(|| device_name.clone());
debug!("Grouping filesystem {} (device: {}) under drive: {}",
fs.mount_point, device_name, drive_name);
grouped.entry(drive_name).or_insert_with(Vec::new).push(fs.clone());
}
}
}
debug!("Filesystem grouping result: {} drives with filesystems: {:?}",
grouped.len(),
grouped.keys().collect::<Vec<_>>());
Ok(grouped)
}
/// Create a physical drive pool containing multiple filesystems
fn create_physical_drive_pool(&self, drive_name: &str, filesystems: &[MountInfo]) -> Result<StoragePool> {
if filesystems.is_empty() {
return Err(anyhow::anyhow!("No filesystems for drive {}", drive_name));
}
// Calculate total usage across all filesystems on this drive
let mut total_capacity = 0u64;
let mut total_used = 0u64;
for fs in filesystems {
if let Ok((capacity, used)) = self.get_filesystem_info(&fs.mount_point) {
total_capacity += capacity;
total_used += used;
}
}
let total_available = total_capacity.saturating_sub(total_used);
let usage_percent = if total_capacity > 0 {
(total_used as f64 / total_capacity as f64) * 100.0
} else { 0.0 };
// Get drive information for SMART data
let device_names = vec![drive_name.to_string()];
let underlying_drives = self.get_drive_info_for_devices(&device_names)?;
// Collect filesystem mount points for this drive
let filesystem_mount_points: Vec<String> = filesystems.iter()
.map(|fs| fs.mount_point.clone())
.collect();
Ok(StoragePool {
name: drive_name.to_string(),
mount_point: format!("(physical drive)"), // Special marker for physical drives
filesystem: "physical".to_string(),
pool_type: StoragePoolType::PhysicalDrive {
filesystems: filesystem_mount_points,
},
size: self.bytes_to_human_readable(total_capacity),
used: self.bytes_to_human_readable(total_used),
available: self.bytes_to_human_readable(total_available),
usage_percent: usage_percent as f32,
pool_health: if underlying_drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Critical
},
underlying_drives,
})
}
/// Calculate pool health specifically for mergerfs pools
fn calculate_mergerfs_pool_health(&self, data_members: &[String], parity_disks: &[String], drives: &[DriveInfo]) -> PoolHealth {
// Get device names for data and parity drives
let mut data_device_names = Vec::new();
let mut parity_device_names = Vec::new();
for member in data_members {
if let Some(devices) = self.detected_devices.get(member) {
data_device_names.extend(devices.clone());
}
}
for parity in parity_disks {
if let Some(devices) = self.detected_devices.get(parity) {
parity_device_names.extend(devices.clone());
}
}
let failed_data = drives.iter()
.filter(|d| data_device_names.contains(&d.device) && d.health_status != "PASSED")
.count();
let failed_parity = drives.iter()
.filter(|d| parity_device_names.contains(&d.device) && d.health_status != "PASSED")
.count();
match (failed_data, failed_parity) {
(0, 0) => PoolHealth::Healthy,
(1, 0) => PoolHealth::Degraded, // Can recover with parity
(0, 1) => PoolHealth::Degraded, // Lost parity protection
_ => PoolHealth::Critical, // Multiple failures
}
}
/// Fallback to legacy configuration-based storage pools
fn get_legacy_configured_storage_pools(&self) -> Result<Vec<StoragePool>> {
let mut storage_pools = Vec::new();
let mut processed_pools = std::collections::HashSet::new();
// Legacy implementation: use filesystem configuration
for fs_config in &self.config.filesystems {
if !fs_config.monitor {
continue;
}
let (pool_type, skip_in_single_mode) = self.determine_pool_type(&fs_config.storage_type);
// Skip member disks if they're part of a pool
if skip_in_single_mode {
continue;
}
// Check if this pool was already processed (in case of multiple member disks)
let pool_key = match &pool_type {
StoragePoolType::MergerfsPool { .. } => {
// For mergerfs pools, use the main mount point
if fs_config.fs_type == "fuse.mergerfs" {
fs_config.mount_point.clone()
} else {
continue; // Skip member disks
}
}
_ => fs_config.mount_point.clone()
};
if processed_pools.contains(&pool_key) {
continue;
}
processed_pools.insert(pool_key.clone());
// Get filesystem stats for the mount point
match self.get_filesystem_info(&fs_config.mount_point) {
Ok((total_bytes, used_bytes)) => {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else { 0.0 };
// Convert bytes to human-readable format
let size = self.bytes_to_human_readable(total_bytes);
let used = self.bytes_to_human_readable(used_bytes);
let available = self.bytes_to_human_readable(available_bytes);
// Get underlying drives based on pool type
let underlying_drives = self.get_pool_drives(&pool_type, &fs_config.mount_point)?;
// Calculate pool health
let pool_health = self.calculate_pool_health(&pool_type, &underlying_drives);
let drive_count = underlying_drives.len();
storage_pools.push(StoragePool {
name: fs_config.name.clone(),
mount_point: fs_config.mount_point.clone(),
filesystem: fs_config.fs_type.clone(),
pool_type: pool_type.clone(),
size,
used,
available,
usage_percent: usage_percent as f32,
underlying_drives,
pool_health,
});
debug!(
"Legacy configured storage pool '{}' ({:?}) at {} with {} drives, health: {:?}",
fs_config.name, pool_type, fs_config.mount_point, drive_count, pool_health
);
}
Err(e) => {
debug!(
"Failed to get filesystem info for storage pool '{}': {}",
fs_config.name, e
);
}
}
}
Ok(storage_pools)
}
/// Determine the storage pool type from configuration
fn determine_pool_type(&self, storage_type: &str) -> (StoragePoolType, bool) {
match storage_type {
"single" => (StoragePoolType::Single, false),
"mergerfs_pool" | "mergerfs" => {
// Find associated member disks
let data_disks = self.find_pool_member_disks("mergerfs_member");
let parity_disks = self.find_pool_member_disks("parity");
(StoragePoolType::MergerfsPool { data_disks, parity_disks }, false)
}
"mergerfs_member" => (StoragePoolType::Single, true), // Skip, part of pool
"parity" => (StoragePoolType::Single, true), // Skip, part of pool
"raid1" | "raid5" | "raid6" => {
let member_disks = self.find_pool_member_disks(&format!("{}_member", storage_type));
(StoragePoolType::RaidArray {
level: storage_type.to_uppercase(),
member_disks,
spare_disks: Vec::new()
}, false)
}
_ => (StoragePoolType::Single, false) // Default to single
}
}
/// Find member disks for a specific storage type
fn find_pool_member_disks(&self, member_type: &str) -> Vec<String> {
let mut member_disks = Vec::new();
for fs_config in &self.config.filesystems {
if fs_config.storage_type == member_type && fs_config.monitor {
// Get device names for this mount point
if let Some(devices) = self.detected_devices.get(&fs_config.mount_point) {
member_disks.extend(devices.clone());
}
}
}
member_disks
}
/// Get drive information for a specific pool type
fn get_pool_drives(&self, pool_type: &StoragePoolType, mount_point: &str) -> Result<Vec<DriveInfo>> {
match pool_type {
StoragePoolType::Single => {
// Single disk - use detected devices for this mount point
let device_names = self.detected_devices.get(mount_point).cloned().unwrap_or_default();
self.get_drive_info_for_devices(&device_names)
}
StoragePoolType::PhysicalDrive { .. } => {
// Physical drive - get drive info for the drive directly (mount_point not used)
let device_names = vec![mount_point.to_string()];
self.get_drive_info_for_devices(&device_names)
}
StoragePoolType::MergerfsPool { data_disks, parity_disks } => {
// Mergerfs pool - collect all member drives
let mut all_disks = data_disks.clone();
all_disks.extend(parity_disks.clone());
self.get_drive_info_for_devices(&all_disks)
}
StoragePoolType::RaidArray { member_disks, spare_disks, .. } => {
// RAID array - collect member and spare drives
let mut all_disks = member_disks.clone();
all_disks.extend(spare_disks.clone());
self.get_drive_info_for_devices(&all_disks)
}
StoragePoolType::ZfsPool { .. } => {
// ZFS pool - use detected devices (future implementation)
let device_names = self.detected_devices.get(mount_point).cloned().unwrap_or_default();
self.get_drive_info_for_devices(&device_names)
}
}
}
/// Calculate pool health based on drive status and pool type
fn calculate_pool_health(&self, pool_type: &StoragePoolType, drives: &[DriveInfo]) -> PoolHealth {
match pool_type {
StoragePoolType::Single => {
// Single disk - health is just the drive health
if drives.is_empty() {
PoolHealth::Unknown
} else if drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Critical
}
}
StoragePoolType::PhysicalDrive { .. } => {
// Physical drive - health is just the drive health (similar to Single)
if drives.is_empty() {
PoolHealth::Unknown
} else if drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Critical
}
}
StoragePoolType::MergerfsPool { data_disks, parity_disks } => {
let failed_data = drives.iter()
.filter(|d| data_disks.contains(&d.device) && d.health_status != "PASSED")
.count();
let failed_parity = drives.iter()
.filter(|d| parity_disks.contains(&d.device) && d.health_status != "PASSED")
.count();
match (failed_data, failed_parity) {
(0, 0) => PoolHealth::Healthy,
(1, 0) => PoolHealth::Degraded, // Can recover with parity
(0, 1) => PoolHealth::Degraded, // Lost parity protection
_ => PoolHealth::Critical, // Multiple failures
}
}
StoragePoolType::RaidArray { level, .. } => {
let failed_drives = drives.iter().filter(|d| d.health_status != "PASSED").count();
// Basic RAID health logic (can be enhanced per RAID level)
match failed_drives {
0 => PoolHealth::Healthy,
1 if level.contains('1') || level.contains('5') || level.contains('6') => PoolHealth::Degraded,
_ => PoolHealth::Critical,
}
}
StoragePoolType::ZfsPool { .. } => {
// ZFS health would require zpool status parsing (future)
if drives.iter().all(|d| d.health_status == "PASSED") {
PoolHealth::Healthy
} else {
PoolHealth::Degraded
}
}
}
}
/// Get drive information for a list of device names
fn get_drive_info_for_devices(&self, device_names: &[String]) -> Result<Vec<DriveInfo>> {
let mut drives = Vec::new();
for device_name in device_names {
let device_path = format!("/dev/{}", device_name);
// Get SMART data for this drive
let (health_status, temperature, wear_level) = self.get_smart_data(&device_path);
drives.push(DriveInfo {
device: device_name.clone(),
health_status: health_status.clone(),
temperature,
wear_level,
});
debug!(
"Drive info for {}: health={}, temp={:?}°C, wear={:?}%",
device_name, health_status, temperature, wear_level
);
}
Ok(drives)
}
/// Get SMART data for a drive (health, temperature, wear level)
fn get_smart_data(&self, device_path: &str) -> (String, Option<f32>, Option<f32>) {
// Try to get SMART data using smartctl
let output = Command::new("sudo")
.arg("smartctl")
.arg("-a")
.arg(device_path)
.output();
match output {
Ok(result) if result.status.success() => {
let stdout = String::from_utf8_lossy(&result.stdout);
// Parse health status
let health = if stdout.contains("PASSED") {
"PASSED".to_string()
} else if stdout.contains("FAILED") {
"FAILED".to_string()
} else {
"UNKNOWN".to_string()
};
// Parse temperature (look for various temperature indicators)
let temperature = self.parse_temperature_from_smart(&stdout);
// Parse wear level (for SSDs)
let wear_level = self.parse_wear_level_from_smart(&stdout);
(health, temperature, wear_level)
}
_ => {
debug!("Failed to get SMART data for {}", device_path);
("UNKNOWN".to_string(), None, None)
}
}
}
/// Parse temperature from SMART output
fn parse_temperature_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
// Look for temperature in various formats
if line.contains("Temperature_Celsius") || line.contains("Temperature") {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
if let Ok(temp) = parts[9].parse::<f32>() {
return Some(temp);
}
}
}
// NVMe drives might show temperature differently
if line.contains("temperature:") {
if let Some(temp_part) = line.split("temperature:").nth(1) {
if let Some(temp_str) = temp_part.split_whitespace().next() {
if let Ok(temp) = temp_str.parse::<f32>() {
return Some(temp);
}
}
}
}
}
None
}
/// Parse wear level from SMART output (SSD wear leveling)
/// Supports both NVMe and SATA SSD wear indicators
fn parse_wear_level_from_smart(&self, smart_output: &str) -> Option<f32> {
for line in smart_output.lines() {
let line = line.trim();
// NVMe drives - direct percentage used
if line.contains("Percentage Used:") {
if let Some(wear_part) = line.split("Percentage Used:").nth(1) {
if let Some(wear_str) = wear_part.split('%').next() {
if let Ok(wear) = wear_str.trim().parse::<f32>() {
return Some(wear);
}
}
}
}
// SATA SSD attributes - parse SMART table format
// Format: ID ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 10 {
// SSD Life Left / Percent Lifetime Remaining (higher = less wear)
if line.contains("SSD_Life_Left") || line.contains("Percent_Lifetime_Remain") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Media Wearout Indicator (lower = more wear, normalize to 0-100)
if line.contains("Media_Wearout_Indicator") {
if let Ok(remaining) = parts[3].parse::<f32>() { // VALUE column
return Some(100.0 - remaining); // Convert remaining to used
}
}
// Wear Leveling Count (higher = less wear, but varies by manufacturer)
if line.contains("Wear_Leveling_Count") {
if let Ok(wear_count) = parts[3].parse::<f32>() { // VALUE column
// Most SSDs: 100 = new, decreases with wear
if wear_count <= 100.0 {
return Some(100.0 - wear_count);
}
}
}
// Total LBAs Written - calculate against typical endurance if available
// This is more complex and manufacturer-specific, so we skip for now
}
}
None
}
/// Convert bytes to human-readable format
fn bytes_to_human_readable(&self, bytes: u64) -> String {
const UNITS: &[&str] = &["B", "K", "M", "G", "T"];
let mut size = bytes as f64;
let mut unit_index = 0;
while size >= 1024.0 && unit_index < UNITS.len() - 1 {
size /= 1024.0;
unit_index += 1;
}
if unit_index == 0 {
format!("{:.0}{}", size, UNITS[unit_index])
} else {
format!("{:.1}{}", size, UNITS[unit_index])
}
}
/// Convert bytes to gigabytes
fn bytes_to_gb(&self, bytes: u64) -> f32 {
bytes as f32 / (1024.0 * 1024.0 * 1024.0)
}
/// Detect device backing a mount point using lsblk (static version for startup)
fn detect_device_for_mount_point_static(mount_point: &str) -> Result<Vec<String>> {
let output = Command::new("lsblk")
.args(&["-n", "-o", "NAME,MOUNTPOINT"])
.output()?;
if !output.status.success() {
return Ok(Vec::new());
}
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
let parts: Vec<&str> = line.split_whitespace().collect();
if parts.len() >= 2 && parts[1] == mount_point {
// Remove tree symbols and extract device name (e.g., "├─nvme0n1p2" -> "nvme0n1p2")
let device_name = parts[0]
.trim_start_matches('├')
.trim_start_matches('└')
.trim_start_matches('─')
.trim();
// Extract base device name (e.g., "nvme0n1p2" -> "nvme0n1")
if let Some(base_device) = Self::extract_base_device(device_name) {
return Ok(vec![base_device]);
}
}
}
Ok(Vec::new())
}
/// Extract base device name from partition (e.g., "nvme0n1p2" -> "nvme0n1", "sda1" -> "sda")
fn extract_base_device(device_name: &str) -> Option<String> {
// Handle NVMe devices (nvme0n1p1 -> nvme0n1)
if device_name.starts_with("nvme") {
if let Some(p_pos) = device_name.find('p') {
return Some(device_name[..p_pos].to_string());
}
}
// Handle traditional devices (sda1 -> sda)
if device_name.len() > 1 {
let chars: Vec<char> = device_name.chars().collect();
let mut end_idx = chars.len();
// Find where the device name ends and partition number begins
for (i, &c) in chars.iter().enumerate().rev() {
if !c.is_ascii_digit() {
end_idx = i + 1;
break;
}
}
if end_idx > 0 && end_idx < chars.len() {
return Some(chars[..end_idx].iter().collect());
}
}
// If no partition detected, return as-is
Some(device_name.to_string())
}
/// Get filesystem info using df command
fn get_filesystem_info(&self, path: &str) -> Result<(u64, u64)> {
let output = Command::new("df")
.arg("--block-size=1")
.arg(path)
.output()?;
if !output.status.success() {
return Err(anyhow::anyhow!("df command failed for {}", path));
}
let output_str = String::from_utf8(output.stdout)?;
let lines: Vec<&str> = output_str.lines().collect();
if lines.len() < 2 {
return Err(anyhow::anyhow!("Unexpected df output format"));
}
let fields: Vec<&str> = lines[1].split_whitespace().collect();
if fields.len() < 4 {
return Err(anyhow::anyhow!("Unexpected df fields count"));
}
let total_bytes = fields[1].parse::<u64>()?;
let used_bytes = fields[2].parse::<u64>()?;
Ok((total_bytes, used_bytes))
}
/// Parse size string (e.g., "120G", "45M") to GB value
fn parse_size_to_gb(&self, size_str: &str) -> f32 {
let size_str = size_str.trim();
if size_str.is_empty() || size_str == "-" {
return 0.0;
}
// Extract numeric part and unit
let (num_str, unit) = if let Some(last_char) = size_str.chars().last() {
if last_char.is_alphabetic() {
let num_part = &size_str[..size_str.len() - 1];
let unit_part = &size_str[size_str.len() - 1..];
(num_part, unit_part)
} else {
(size_str, "")
}
} else {
(size_str, "")
};
let number: f32 = num_str.parse().unwrap_or(0.0);
match unit.to_uppercase().as_str() {
"T" | "TB" => number * 1024.0,
"G" | "GB" => number,
"M" | "MB" => number / 1024.0,
"K" | "KB" => number / (1024.0 * 1024.0),
"B" | "" => number / (1024.0 * 1024.0 * 1024.0),
_ => number, // Assume GB if unknown unit
}
}
}
#[async_trait]
impl Collector for DiskCollector {
async fn collect(&self, status_tracker: &mut StatusTracker) -> Result<Vec<Metric>, CollectorError> {
let start_time = Instant::now();
debug!("Collecting storage pool and individual drive metrics");
let mut metrics = Vec::new();
// Get configured storage pools with individual drive data
let storage_pools = match self.get_configured_storage_pools() {
Ok(pools) => {
debug!("Found {} storage pools", pools.len());
pools
}
Err(e) => {
debug!("Failed to get storage pools: {}", e);
Vec::new()
}
};
// Generate metrics for each storage pool and its underlying drives
for storage_pool in &storage_pools {
let timestamp = chrono::Utc::now().timestamp() as u64;
// Storage pool overall metrics
let pool_name = &storage_pool.name;
// Parse size strings to get actual values for calculations
let size_gb = self.parse_size_to_gb(&storage_pool.size);
let used_gb = self.parse_size_to_gb(&storage_pool.used);
let avail_gb = self.parse_size_to_gb(&storage_pool.available);
// Calculate status based on configured thresholds and pool health
let usage_status = if storage_pool.usage_percent >= self.config.usage_critical_percent {
Status::Critical
} else if storage_pool.usage_percent >= self.config.usage_warning_percent {
Status::Warning
} else {
Status::Ok
};
let pool_status = match storage_pool.pool_health {
PoolHealth::Critical => Status::Critical,
PoolHealth::Degraded => Status::Warning,
PoolHealth::Rebuilding => Status::Warning,
PoolHealth::Healthy => usage_status,
PoolHealth::Unknown => Status::Unknown,
};
// Storage pool info metrics
metrics.push(Metric {
name: format!("disk_{}_mount_point", pool_name),
value: MetricValue::String(storage_pool.mount_point.clone()),
unit: None,
description: Some(format!("Mount: {}", storage_pool.mount_point)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_filesystem", pool_name),
value: MetricValue::String(storage_pool.filesystem.clone()),
unit: None,
description: Some(format!("FS: {}", storage_pool.filesystem)),
status: Status::Ok,
timestamp,
});
// Enhanced pool type information
let pool_type_str = match &storage_pool.pool_type {
StoragePoolType::Single => "single".to_string(),
StoragePoolType::PhysicalDrive { filesystems } => {
format!("drive ({})", filesystems.len())
}
StoragePoolType::MergerfsPool { data_disks, parity_disks } => {
format!("mergerfs ({}+{})", data_disks.len(), parity_disks.len())
}
StoragePoolType::RaidArray { level, member_disks, spare_disks } => {
format!("{} ({}+{})", level, member_disks.len(), spare_disks.len())
}
StoragePoolType::ZfsPool { pool_name, .. } => {
format!("zfs ({})", pool_name)
}
};
metrics.push(Metric {
name: format!("disk_{}_pool_type", pool_name),
value: MetricValue::String(pool_type_str.clone()),
unit: None,
description: Some(format!("Type: {}", pool_type_str)),
status: Status::Ok,
timestamp,
});
// Pool health status
let health_str = match storage_pool.pool_health {
PoolHealth::Healthy => "healthy",
PoolHealth::Degraded => "degraded",
PoolHealth::Critical => "critical",
PoolHealth::Rebuilding => "rebuilding",
PoolHealth::Unknown => "unknown",
};
metrics.push(Metric {
name: format!("disk_{}_pool_health", pool_name),
value: MetricValue::String(health_str.to_string()),
unit: None,
description: Some(format!("Health: {}", health_str)),
status: pool_status,
timestamp,
});
// Storage pool size metrics
metrics.push(Metric {
name: format!("disk_{}_total_gb", pool_name),
value: MetricValue::Float(size_gb),
unit: Some("GB".to_string()),
description: Some(format!("Total: {}", storage_pool.size)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_used_gb", pool_name),
value: MetricValue::Float(used_gb),
unit: Some("GB".to_string()),
description: Some(format!("Used: {}", storage_pool.used)),
status: pool_status,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_available_gb", pool_name),
value: MetricValue::Float(avail_gb),
unit: Some("GB".to_string()),
description: Some(format!("Available: {}", storage_pool.available)),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_usage_percent", pool_name),
value: MetricValue::Float(storage_pool.usage_percent),
unit: Some("%".to_string()),
description: Some(format!("Usage: {:.1}%", storage_pool.usage_percent)),
status: pool_status,
timestamp,
});
// Individual drive metrics for this storage pool
for drive in &storage_pool.underlying_drives {
// Drive health status
metrics.push(Metric {
name: format!("disk_{}_{}_health", pool_name, drive.device),
value: MetricValue::String(drive.health_status.clone()),
unit: None,
description: Some(format!("{}: {}", drive.device, drive.health_status)),
status: if drive.health_status == "PASSED" { Status::Ok }
else if drive.health_status == "FAILED" { Status::Critical }
else { Status::Unknown },
timestamp,
});
// Drive temperature
if let Some(temp) = drive.temperature {
let temp_status = self.calculate_temperature_status(
&format!("disk_{}_{}_temperature", pool_name, drive.device),
temp,
status_tracker
);
metrics.push(Metric {
name: format!("disk_{}_{}_temperature", pool_name, drive.device),
value: MetricValue::Float(temp),
unit: Some("°C".to_string()),
description: Some(format!("{}: {:.0}°C", drive.device, temp)),
status: temp_status,
timestamp,
});
}
// Drive wear level (for SSDs)
if let Some(wear) = drive.wear_level {
let wear_status = if wear >= self.config.wear_critical_percent { Status::Critical }
else if wear >= self.config.wear_warning_percent { Status::Warning }
else { Status::Ok };
metrics.push(Metric {
name: format!("disk_{}_{}_wear_percent", pool_name, drive.device),
value: MetricValue::Float(wear),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}% wear", drive.device, wear)),
status: wear_status,
timestamp,
});
}
}
// Individual filesystem metrics for PhysicalDrive pools
if let StoragePoolType::PhysicalDrive { filesystems } = &storage_pool.pool_type {
for filesystem_mount in filesystems {
if let Ok((total_bytes, used_bytes)) = self.get_filesystem_info(filesystem_mount) {
let available_bytes = total_bytes - used_bytes;
let usage_percent = if total_bytes > 0 {
(used_bytes as f64 / total_bytes as f64) * 100.0
} else { 0.0 };
let filesystem_name = if filesystem_mount == "/" {
"root".to_string()
} else {
filesystem_mount.trim_start_matches('/').replace('/', "_")
};
// Calculate filesystem status based on usage
let fs_status = if usage_percent >= self.config.usage_critical_percent as f64 {
Status::Critical
} else if usage_percent >= self.config.usage_warning_percent as f64 {
Status::Warning
} else {
Status::Ok
};
// Filesystem usage metrics
metrics.push(Metric {
name: format!("disk_{}_fs_{}_usage_percent", pool_name, filesystem_name),
value: MetricValue::Float(usage_percent as f32),
unit: Some("%".to_string()),
description: Some(format!("{}: {:.0}%", filesystem_mount, usage_percent)),
status: fs_status.clone(),
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_used_gb", pool_name, filesystem_name),
value: MetricValue::Float(self.bytes_to_gb(used_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}GB used", filesystem_mount, self.bytes_to_human_readable(used_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_total_gb", pool_name, filesystem_name),
value: MetricValue::Float(self.bytes_to_gb(total_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}GB total", filesystem_mount, self.bytes_to_human_readable(total_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_available_gb", pool_name, filesystem_name),
value: MetricValue::Float(self.bytes_to_gb(available_bytes)),
unit: Some("GB".to_string()),
description: Some(format!("{}: {}GB available", filesystem_mount, self.bytes_to_human_readable(available_bytes))),
status: Status::Ok,
timestamp,
});
metrics.push(Metric {
name: format!("disk_{}_fs_{}_mount_point", pool_name, filesystem_name),
value: MetricValue::String(filesystem_mount.clone()),
unit: None,
description: Some(format!("Mount: {}", filesystem_mount)),
status: Status::Ok,
timestamp,
});
}
}
}
}
// Add storage pool count metric
metrics.push(Metric {
name: "disk_count".to_string(),
value: MetricValue::Integer(storage_pools.len() as i64),
unit: None,
description: Some(format!("Total storage pools: {}", storage_pools.len())),
status: Status::Ok,
timestamp: chrono::Utc::now().timestamp() as u64,
});
let collection_time = start_time.elapsed();
debug!(
"Multi-disk collection completed in {:?} with {} metrics",
collection_time,
metrics.len()
);
Ok(metrics)
}
}

View File

@@ -1,5 +1,5 @@
use anyhow::Result;
use cm_dashboard_shared::{MessageEnvelope, MetricMessage};
use cm_dashboard_shared::{AgentData, MessageEnvelope};
use tracing::{debug, info};
use zmq::{Context, Socket, SocketType};
@@ -43,17 +43,17 @@ impl ZmqHandler {
})
}
/// Publish metrics message via ZMQ
pub async fn publish_metrics(&self, message: &MetricMessage) -> Result<()> {
/// Publish agent data via ZMQ
pub async fn publish_agent_data(&self, data: &AgentData) -> Result<()> {
debug!(
"Publishing {} metrics for host {}",
message.metrics.len(),
message.hostname
"Publishing agent data for host {}",
data.hostname
);
// Create message envelope
let envelope = MessageEnvelope::metrics(message.clone())
.map_err(|e| anyhow::anyhow!("Failed to create message envelope: {}", e))?;
// Create message envelope for agent data
let envelope = MessageEnvelope::agent_data(data.clone())
.map_err(|e| anyhow::anyhow!("Failed to create agent data envelope: {}", e))?;
// Serialize envelope
let serialized = serde_json::to_vec(&envelope)?;
@@ -61,11 +61,10 @@ impl ZmqHandler {
// Send via ZMQ
self.publisher.send(&serialized, 0)?;
debug!("Published metrics message ({} bytes)", serialized.len());
debug!("Published agent data message ({} bytes)", serialized.len());
Ok(())
}
/// Try to receive a command (non-blocking)
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
match self.command_receiver.recv_bytes(zmq::DONTWAIT) {

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard"
version = "0.1.97"
version = "0.1.131"
edition = "2021"
[dependencies]

View File

@@ -20,12 +20,13 @@ pub struct Dashboard {
tui_app: Option<TuiApp>,
terminal: Option<Terminal<CrosstermBackend<io::Stdout>>>,
headless: bool,
raw_data: bool,
initial_commands_sent: std::collections::HashSet<String>,
config: DashboardConfig,
}
impl Dashboard {
pub async fn new(config_path: Option<String>, headless: bool) -> Result<Self> {
pub async fn new(config_path: Option<String>, headless: bool, raw_data: bool) -> Result<Self> {
info!("Initializing dashboard");
// Load configuration - try default path if not specified
@@ -119,6 +120,7 @@ impl Dashboard {
tui_app,
terminal,
headless,
raw_data,
initial_commands_sent: std::collections::HashSet::new(),
config,
})
@@ -183,30 +185,35 @@ impl Dashboard {
// Check for new metrics
if last_metrics_check.elapsed() >= metrics_check_interval {
if let Ok(Some(metric_message)) = self.zmq_consumer.receive_metrics().await {
if let Ok(Some(agent_data)) = self.zmq_consumer.receive_agent_data().await {
debug!(
"Received metrics from {}: {} metrics",
metric_message.hostname,
metric_message.metrics.len()
"Received agent data from {}",
agent_data.hostname
);
// Track first contact with host (no command needed - agent sends data every 2s)
let is_new_host = !self
.initial_commands_sent
.contains(&metric_message.hostname);
.contains(&agent_data.hostname);
if is_new_host {
info!(
"First contact with host {} - data will update automatically",
metric_message.hostname
agent_data.hostname
);
self.initial_commands_sent
.insert(metric_message.hostname.clone());
.insert(agent_data.hostname.clone());
}
// Update metric store
self.metric_store
.update_metrics(&metric_message.hostname, metric_message.metrics);
// Show raw data if requested (before processing)
if self.raw_data {
println!("RAW AGENT DATA FROM {}:", agent_data.hostname);
println!("{}", serde_json::to_string_pretty(&agent_data).unwrap_or_else(|e| format!("Serialization error: {}", e)));
println!("{}", "".repeat(80));
}
// Update data store
self.metric_store.process_agent_data(agent_data);
// Check for agent version mismatches across hosts
if let Some((current_version, outdated_hosts)) = self.metric_store.get_version_mismatches() {

View File

@@ -1,5 +1,5 @@
use anyhow::Result;
use cm_dashboard_shared::{CommandOutputMessage, MessageEnvelope, MessageType, MetricMessage};
use cm_dashboard_shared::{AgentData, CommandOutputMessage, MessageEnvelope, MessageType};
use tracing::{debug, error, info, warn};
use zmq::{Context, Socket, SocketType};
@@ -117,8 +117,8 @@ impl ZmqConsumer {
}
}
/// Receive metrics from any connected agent (non-blocking)
pub async fn receive_metrics(&mut self) -> Result<Option<MetricMessage>> {
/// Receive agent data (non-blocking)
pub async fn receive_agent_data(&mut self) -> Result<Option<AgentData>> {
match self.subscriber.recv_bytes(zmq::DONTWAIT) {
Ok(data) => {
debug!("Received {} bytes from ZMQ", data.len());
@@ -129,29 +129,27 @@ impl ZmqConsumer {
// Check message type
match envelope.message_type {
MessageType::Metrics => {
let metrics = envelope
.decode_metrics()
.map_err(|e| anyhow::anyhow!("Failed to decode metrics: {}", e))?;
MessageType::AgentData => {
let agent_data = envelope
.decode_agent_data()
.map_err(|e| anyhow::anyhow!("Failed to decode agent data: {}", e))?;
debug!(
"Received {} metrics from {}",
metrics.metrics.len(),
metrics.hostname
"Received agent data from host {}",
agent_data.hostname
);
Ok(Some(metrics))
Ok(Some(agent_data))
}
MessageType::Heartbeat => {
debug!("Received heartbeat");
Ok(None) // Don't return heartbeats as metrics
Ok(None) // Don't return heartbeats
}
MessageType::CommandOutput => {
debug!("Received command output (will be handled by receive_command_output)");
Ok(None) // Command output handled by separate method
}
_ => {
debug!("Received non-metrics message: {:?}", envelope.message_type);
debug!("Received unsupported message: {:?}", envelope.message_type);
Ok(None)
}
}
@@ -166,5 +164,6 @@ impl ZmqConsumer {
}
}
}
}

View File

@@ -51,6 +51,10 @@ struct Cli {
/// Run in headless mode (no TUI, just logging)
#[arg(long)]
headless: bool,
/// Show raw agent data in headless mode
#[arg(long)]
raw_data: bool,
}
#[tokio::main]
@@ -86,7 +90,7 @@ async fn main() -> Result<()> {
}
// Create and run dashboard
let mut dashboard = Dashboard::new(cli.config, cli.headless).await?;
let mut dashboard = Dashboard::new(cli.config, cli.headless, cli.raw_data).await?;
// Setup graceful shutdown
let ctrl_c = async {

View File

@@ -1,4 +1,4 @@
use cm_dashboard_shared::Metric;
use cm_dashboard_shared::{AgentData, Metric};
use std::collections::HashMap;
use std::time::{Duration, Instant};
use tracing::{debug, info, warn};
@@ -76,6 +76,286 @@ impl MetricStore {
);
}
/// Process structured agent data (temporary bridge - converts back to metrics)
/// TODO: Replace entire metric system with direct structured data processing
pub fn process_agent_data(&mut self, agent_data: AgentData) {
let metrics = self.convert_agent_data_to_metrics(&agent_data);
self.update_metrics(&agent_data.hostname, metrics);
}
/// Convert structured agent data to legacy metrics (temporary bridge)
fn convert_agent_data_to_metrics(&self, agent_data: &AgentData) -> Vec<Metric> {
use cm_dashboard_shared::{Metric, MetricValue, Status};
let mut metrics = Vec::new();
// Convert CPU data
metrics.push(Metric::new(
"cpu_load_1min".to_string(),
MetricValue::Float(agent_data.system.cpu.load_1min),
Status::Ok,
));
metrics.push(Metric::new(
"cpu_load_5min".to_string(),
MetricValue::Float(agent_data.system.cpu.load_5min),
Status::Ok,
));
metrics.push(Metric::new(
"cpu_load_15min".to_string(),
MetricValue::Float(agent_data.system.cpu.load_15min),
Status::Ok,
));
metrics.push(Metric::new(
"cpu_frequency_mhz".to_string(),
MetricValue::Float(agent_data.system.cpu.frequency_mhz),
Status::Ok,
));
if let Some(temp) = agent_data.system.cpu.temperature_celsius {
metrics.push(Metric::new(
"cpu_temperature_celsius".to_string(),
MetricValue::Float(temp),
Status::Ok,
));
}
// Convert Memory data
metrics.push(Metric::new(
"memory_usage_percent".to_string(),
MetricValue::Float(agent_data.system.memory.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
"memory_total_gb".to_string(),
MetricValue::Float(agent_data.system.memory.total_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_used_gb".to_string(),
MetricValue::Float(agent_data.system.memory.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_available_gb".to_string(),
MetricValue::Float(agent_data.system.memory.available_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_swap_total_gb".to_string(),
MetricValue::Float(agent_data.system.memory.swap_total_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_swap_used_gb".to_string(),
MetricValue::Float(agent_data.system.memory.swap_used_gb),
Status::Ok,
));
// Convert tmpfs data
for tmpfs in &agent_data.system.memory.tmpfs {
if tmpfs.mount == "/tmp" {
metrics.push(Metric::new(
"memory_tmp_usage_percent".to_string(),
MetricValue::Float(tmpfs.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
"memory_tmp_used_gb".to_string(),
MetricValue::Float(tmpfs.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
"memory_tmp_total_gb".to_string(),
MetricValue::Float(tmpfs.total_gb),
Status::Ok,
));
}
}
// Add agent metadata
metrics.push(Metric::new(
"agent_version".to_string(),
MetricValue::String(agent_data.agent_version.clone()),
Status::Ok,
));
metrics.push(Metric::new(
"agent_heartbeat".to_string(),
MetricValue::Integer(agent_data.timestamp as i64),
Status::Ok,
));
// Convert storage data
for drive in &agent_data.system.storage.drives {
// Drive-level metrics
if let Some(temp) = drive.temperature_celsius {
metrics.push(Metric::new(
format!("disk_{}_temperature", drive.name),
MetricValue::Float(temp),
Status::Ok,
));
}
if let Some(wear) = drive.wear_percent {
metrics.push(Metric::new(
format!("disk_{}_wear_percent", drive.name),
MetricValue::Float(wear),
Status::Ok,
));
}
metrics.push(Metric::new(
format!("disk_{}_health", drive.name),
MetricValue::String(drive.health.clone()),
Status::Ok,
));
// Filesystem metrics
for fs in &drive.filesystems {
let fs_base = format!("disk_{}_fs_{}", drive.name, fs.mount.replace('/', "root"));
metrics.push(Metric::new(
format!("{}_usage_percent", fs_base),
MetricValue::Float(fs.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_used_gb", fs_base),
MetricValue::Float(fs.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_total_gb", fs_base),
MetricValue::Float(fs.total_gb),
Status::Ok,
));
}
}
// Convert storage pools
for pool in &agent_data.system.storage.pools {
let pool_base = format!("disk_{}", pool.name);
metrics.push(Metric::new(
format!("{}_usage_percent", pool_base),
MetricValue::Float(pool.usage_percent),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_used_gb", pool_base),
MetricValue::Float(pool.used_gb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_total_gb", pool_base),
MetricValue::Float(pool.total_gb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_pool_type", pool_base),
MetricValue::String(pool.pool_type.clone()),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_mount_point", pool_base),
MetricValue::String(pool.mount.clone()),
Status::Ok,
));
// Pool drive data
for drive in &pool.data_drives {
if let Some(temp) = drive.temperature_celsius {
metrics.push(Metric::new(
format!("disk_{}_{}_temperature", pool.name, drive.name),
MetricValue::Float(temp),
Status::Ok,
));
}
if let Some(wear) = drive.wear_percent {
metrics.push(Metric::new(
format!("disk_{}_{}_wear_percent", pool.name, drive.name),
MetricValue::Float(wear),
Status::Ok,
));
}
}
for drive in &pool.parity_drives {
if let Some(temp) = drive.temperature_celsius {
metrics.push(Metric::new(
format!("disk_{}_{}_temperature", pool.name, drive.name),
MetricValue::Float(temp),
Status::Ok,
));
}
if let Some(wear) = drive.wear_percent {
metrics.push(Metric::new(
format!("disk_{}_{}_wear_percent", pool.name, drive.name),
MetricValue::Float(wear),
Status::Ok,
));
}
}
}
// Convert service data
for service in &agent_data.services {
let service_base = format!("service_{}", service.name);
metrics.push(Metric::new(
format!("{}_status", service_base),
MetricValue::String(service.status.clone()),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_memory_mb", service_base),
MetricValue::Float(service.memory_mb),
Status::Ok,
));
metrics.push(Metric::new(
format!("{}_disk_gb", service_base),
MetricValue::Float(service.disk_gb),
Status::Ok,
));
if service.user_stopped {
metrics.push(Metric::new(
format!("{}_user_stopped", service_base),
MetricValue::Boolean(true),
Status::Ok,
));
}
}
// Convert backup data
metrics.push(Metric::new(
"backup_status".to_string(),
MetricValue::String(agent_data.backup.status.clone()),
Status::Ok,
));
if let Some(last_run) = agent_data.backup.last_run {
metrics.push(Metric::new(
"backup_last_run_timestamp".to_string(),
MetricValue::Integer(last_run as i64),
Status::Ok,
));
}
if let Some(next_scheduled) = agent_data.backup.next_scheduled {
metrics.push(Metric::new(
"backup_next_scheduled_timestamp".to_string(),
MetricValue::Integer(next_scheduled as i64),
Status::Ok,
));
}
if let Some(size) = agent_data.backup.total_size_gb {
metrics.push(Metric::new(
"backup_size_gb".to_string(),
MetricValue::Float(size),
Status::Ok,
));
}
if let Some(health) = &agent_data.backup.repository_health {
metrics.push(Metric::new(
"backup_repository_health".to_string(),
MetricValue::String(health.clone()),
Status::Ok,
));
}
metrics
}
/// Get current metric for a specific host
pub fn get_metric(&self, hostname: &str, metric_name: &str) -> Option<&Metric> {
self.current_metrics.get(hostname)?.get(metric_name)

View File

@@ -589,12 +589,13 @@ impl TuiApp {
// Split the title bar into left and right sections
let chunks = Layout::default()
.direction(Direction::Horizontal)
.constraints([Constraint::Length(15), Constraint::Min(0)])
.constraints([Constraint::Length(22), Constraint::Min(0)])
.split(area);
// Left side: "cm-dashboard" text
// Left side: "cm-dashboard" text with version
let title_text = format!(" cm-dashboard v{}", env!("CARGO_PKG_VERSION"));
let left_span = Span::styled(
" cm-dashboard",
&title_text,
Style::default().fg(Theme::background()).bg(background_color).add_modifier(Modifier::BOLD)
);
let left_title = Paragraph::new(Line::from(vec![left_span]))
@@ -666,32 +667,27 @@ impl TuiApp {
return host_summary_metric.status;
}
// Fallback to old aggregation logic with proper Pending handling
// Rewritten status aggregation - only Critical, Warning, or OK for top bar
let mut has_critical = false;
let mut has_warning = false;
let mut ok_count = 0;
for metric in &metrics {
match metric.status {
Status::Critical => has_critical = true,
Status::Warning => has_warning = true,
Status::Pending => ok_count += 1, // Treat pending as OK for aggregation
Status::Ok => ok_count += 1,
Status::Inactive => ok_count += 1, // Treat inactive as OK for aggregation
Status::Unknown => ok_count += 1, // Treat unknown as OK for aggregation
Status::Offline => {}, // Ignore offline for aggregation
// Treat all other statuses as OK for top bar aggregation
Status::Ok | Status::Pending | Status::Inactive | Status::Unknown => {},
Status::Offline => {}, // Ignore offline
}
}
// Priority order: Critical > Warning > Ok > Unknown (no Pending)
// Only return Critical, Warning, or OK - no other statuses
if has_critical {
Status::Critical
} else if has_warning {
Status::Warning
} else if ok_count > 0 {
Status::Ok
} else {
Status::Unknown
Status::Ok
}
}

View File

@@ -45,12 +45,15 @@ pub struct SystemWidget {
struct StoragePool {
name: String,
mount_point: String,
pool_type: String, // "Single", "Raid0", etc.
pool_type: String, // "single", "mergerfs (2+1)", "RAID5 (3+1)", etc.
pool_health: Option<String>, // "healthy", "degraded", "critical", "rebuilding"
drives: Vec<StorageDrive>,
filesystems: Vec<FileSystem>, // For physical drive pools: individual filesystem children
usage_percent: Option<f32>,
used_gb: Option<f32>,
total_gb: Option<f32>,
status: Status,
health_status: Status, // Separate status for pool health vs usage
}
#[derive(Clone)]
@@ -61,6 +64,16 @@ struct StorageDrive {
status: Status,
}
#[derive(Clone)]
struct FileSystem {
mount_point: String,
usage_percent: Option<f32>,
used_gb: Option<f32>,
total_gb: Option<f32>,
available_gb: Option<f32>,
status: Status,
}
impl SystemWidget {
pub fn new() -> Self {
Self {
@@ -133,14 +146,14 @@ impl SystemWidget {
self.agent_hash.as_ref()
}
/// Get mount point for a pool name
/// Get default mount point for a pool name (fallback only - should use actual mount_point metrics)
fn get_mount_point_for_pool(&self, pool_name: &str) -> String {
match pool_name {
"root" => "/".to_string(),
"steampool" => "/mnt/steampool".to_string(),
"steampool_1" => "/steampool_1".to_string(),
"steampool_2" => "/steampool_2".to_string(),
_ => format!("/{}", pool_name), // Default fallback
// For device names, use the device name directly as display name
if pool_name.starts_with("nvme") || pool_name.starts_with("sd") || pool_name.starts_with("hd") {
pool_name.to_string()
} else {
// For other pools, use the pool name as-is (will be overridden by mount_point metric)
pool_name.to_string()
}
}
@@ -151,32 +164,57 @@ impl SystemWidget {
for metric in metrics {
if metric.name.starts_with("disk_") {
if let Some(pool_name) = self.extract_pool_name(&metric.name) {
let mount_point = self.get_mount_point_for_pool(&pool_name);
let pool = pools.entry(pool_name.clone()).or_insert_with(|| StoragePool {
name: pool_name.clone(),
mount_point: mount_point.clone(),
pool_type: "Single".to_string(), // Default, could be enhanced
mount_point: self.get_mount_point_for_pool(&pool_name), // Default fallback
pool_type: "single".to_string(), // Default, will be updated
pool_health: None,
drives: Vec::new(),
filesystems: Vec::new(),
usage_percent: None,
used_gb: None,
total_gb: None,
status: Status::Unknown,
health_status: Status::Unknown,
});
// Parse different metric types
if metric.name.contains("_usage_percent") {
if metric.name.contains("_usage_percent") && !metric.name.contains("_fs_") {
// Only use drive-level metrics for pool totals, not filesystem metrics
if let MetricValue::Float(usage) = metric.value {
pool.usage_percent = Some(usage);
pool.status = metric.status.clone();
}
} else if metric.name.contains("_used_gb") {
} else if metric.name.contains("_used_gb") && !metric.name.contains("_fs_") {
// Only use drive-level metrics for pool totals, not filesystem metrics
if let MetricValue::Float(used) = metric.value {
pool.used_gb = Some(used);
}
} else if metric.name.contains("_total_gb") {
} else if metric.name.contains("_total_gb") && !metric.name.contains("_fs_") {
// Only use drive-level metrics for pool totals, not filesystem metrics
if let MetricValue::Float(total) = metric.value {
pool.total_gb = Some(total);
}
} else if metric.name.contains("_mount_point") {
if let MetricValue::String(mount_point) = &metric.value {
pool.mount_point = mount_point.clone();
}
} else if metric.name.contains("_pool_type") {
if let MetricValue::String(pool_type) = &metric.value {
pool.pool_type = pool_type.clone();
}
} else if metric.name.contains("_pool_health") {
if let MetricValue::String(health) = &metric.value {
pool.pool_health = Some(health.clone());
pool.health_status = metric.status.clone();
}
} else if metric.name.contains("_health") && !metric.name.contains("_pool_health") {
// Handle physical drive health metrics (disk_{drive}_health)
if let MetricValue::String(health) = &metric.value {
// For physical drives, use the drive health as pool health
pool.pool_health = Some(health.clone());
pool.health_status = metric.status.clone();
}
} else if metric.name.contains("_temperature") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
// Find existing drive or create new one
@@ -194,12 +232,16 @@ impl SystemWidget {
if let MetricValue::Float(temp) = metric.value {
drive.temperature = Some(temp);
drive.status = metric.status.clone();
// For physical drives, if this is the main drive, also update pool health
if drive.name == pool.name && pool.health_status == Status::Unknown {
pool.health_status = metric.status.clone();
}
}
}
}
} else if metric.name.contains("_wear_percent") {
if let Some(drive_name) = self.extract_drive_name(&metric.name) {
// Find existing drive or create new one
// For physical drives, ensure we create the drive object
let drive_exists = pool.drives.iter().any(|d| d.name == drive_name);
if !drive_exists {
pool.drives.push(StorageDrive {
@@ -214,6 +256,95 @@ impl SystemWidget {
if let MetricValue::Float(wear) = metric.value {
drive.wear_percent = Some(wear);
drive.status = metric.status.clone();
// For physical drives, if this is the main drive, also update pool health
if drive.name == pool.name && pool.health_status == Status::Unknown {
pool.health_status = metric.status.clone();
}
}
}
}
} else if metric.name.contains("_fs_") {
// Handle filesystem metrics for physical drive pools (disk_{pool}_fs_{fs_name}_{metric})
if let (Some(fs_name), Some(metric_type)) = self.extract_filesystem_metric(&metric.name) {
// Find or create filesystem entry
let fs_exists = pool.filesystems.iter().any(|fs| {
let fs_id = if fs.mount_point == "/" {
"root".to_string()
} else {
fs.mount_point.trim_start_matches('/').replace('/', "_")
};
fs_id == fs_name
});
if !fs_exists {
// Create filesystem entry with correct mount point
let mount_point = if metric_type == "mount_point" {
if let MetricValue::String(mount) = &metric.value {
mount.clone()
} else {
// Fallback: handle special cases
if fs_name == "root" {
"/".to_string()
} else {
format!("/{}", fs_name.replace('_', "/"))
}
}
} else {
// Fallback for non-mount_point metrics: generate mount point from fs_name
if fs_name == "root" {
"/".to_string()
} else {
format!("/{}", fs_name.replace('_', "/"))
}
};
pool.filesystems.push(FileSystem {
mount_point,
usage_percent: None,
used_gb: None,
total_gb: None,
available_gb: None,
status: Status::Unknown,
});
}
// Update the filesystem with the metric value
if let Some(filesystem) = pool.filesystems.iter_mut().find(|fs| {
let fs_id = if fs.mount_point == "/" {
"root".to_string()
} else {
fs.mount_point.trim_start_matches('/').replace('/', "_")
};
fs_id == fs_name
}) {
match metric_type.as_str() {
"usage_percent" => {
if let MetricValue::Float(usage) = metric.value {
filesystem.usage_percent = Some(usage);
filesystem.status = metric.status.clone();
}
}
"used_gb" => {
if let MetricValue::Float(used) = metric.value {
filesystem.used_gb = Some(used);
}
}
"total_gb" => {
if let MetricValue::Float(total) = metric.value {
filesystem.total_gb = Some(total);
}
}
"available_gb" => {
if let MetricValue::Float(available) = metric.value {
filesystem.available_gb = Some(available);
}
}
"mount_point" => {
if let MetricValue::String(mount) = &metric.value {
filesystem.mount_point = mount.clone();
}
}
_ => {}
}
}
}
@@ -230,94 +361,279 @@ impl SystemWidget {
/// Extract pool name from disk metric name
fn extract_pool_name(&self, metric_name: &str) -> Option<String> {
// Pattern: disk_{pool_name}_{drive_name}_{metric_type}
// Pattern: disk_{pool_name}_{various suffixes}
// Since pool_name can contain underscores, work backwards from known metric suffixes
if metric_name.starts_with("disk_") {
// First try drive-specific metrics that have device names
if let Some(suffix_pos) = metric_name.rfind("_temperature")
// Handle filesystem metrics: disk_{pool}_fs_{filesystem}_{metric}
if metric_name.contains("_fs_") {
if let Some(fs_pos) = metric_name.find("_fs_") {
return Some(metric_name[5..fs_pos].to_string()); // Skip "disk_", extract pool name before "_fs_"
}
}
// Handle pool-level metrics (usage_percent, used_gb, total_gb, mount_point, pool_type, pool_health)
// Use rfind to get the last occurrence of these suffixes
let pool_suffixes = ["_usage_percent", "_used_gb", "_total_gb", "_available_gb", "_mount_point", "_pool_type", "_pool_health"];
for suffix in pool_suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
}
// Handle physical drive metrics: disk_{drive}_health, disk_{drive}_wear_percent, and disk_{drive}_temperature
if (metric_name.ends_with("_health") && !metric_name.contains("_pool_health"))
|| metric_name.ends_with("_wear_percent")
|| metric_name.ends_with("_temperature") {
// Count underscores to distinguish physical drive metrics (disk_{drive}_metric)
// from pool drive metrics (disk_{pool}_{drive}_metric)
let underscore_count = metric_name.matches('_').count();
// disk_nvme0n1_wear_percent has 3 underscores: disk_nvme0n1_wear_percent
if underscore_count == 3 { // disk_{drive}_metric (where drive has underscores)
if let Some(suffix_pos) = metric_name.rfind("_health")
.or_else(|| metric_name.rfind("_wear_percent"))
.or_else(|| metric_name.rfind("_health")) {
// Find the second-to-last underscore to get pool name
.or_else(|| metric_name.rfind("_temperature")) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
}
}
// Handle drive-specific metrics: disk_{pool}_{drive}_{metric}
let drive_suffixes = ["_temperature", "_health"];
for suffix in drive_suffixes {
if let Some(suffix_pos) = metric_name.rfind(suffix) {
// Extract pool name by finding the second-to-last underscore
let before_suffix = &metric_name[..suffix_pos];
if let Some(drive_start) = before_suffix.rfind('_') {
if drive_start > 5 {
return Some(metric_name[5..drive_start].to_string()); // Skip "disk_"
}
}
// For pool-level metrics (usage_percent, used_gb, total_gb), take everything before the metric suffix
else if let Some(suffix_pos) = metric_name.rfind("_usage_percent")
.or_else(|| metric_name.rfind("_used_gb"))
.or_else(|| metric_name.rfind("_total_gb")) {
return Some(metric_name[5..suffix_pos].to_string()); // Skip "disk_"
}
// Fallback to old behavior for unknown patterns
else if let Some(captures) = metric_name.strip_prefix("disk_") {
if let Some(pos) = captures.find('_') {
return Some(captures[..pos].to_string());
}
}
}
None
}
/// Extract filesystem name and metric type from filesystem metric names
/// Pattern: disk_{pool}_fs_{filesystem_name}_{metric_type}
fn extract_filesystem_metric(&self, metric_name: &str) -> (Option<String>, Option<String>) {
if metric_name.starts_with("disk_") && metric_name.contains("_fs_") {
// Find the _fs_ part
if let Some(fs_start) = metric_name.find("_fs_") {
let after_fs = &metric_name[fs_start + 4..]; // Skip "_fs_"
// Look for known metric suffixes (these can contain underscores)
let known_suffixes = ["usage_percent", "used_gb", "total_gb", "available_gb", "mount_point"];
for suffix in known_suffixes {
if after_fs.ends_with(suffix) {
// Extract filesystem name by removing suffix and underscore
if let Some(underscore_pos) = after_fs.rfind(&format!("_{}", suffix)) {
let fs_name = after_fs[..underscore_pos].to_string();
return (Some(fs_name), Some(suffix.to_string()));
}
}
}
}
}
(None, None)
}
/// Extract drive name from disk metric name
fn extract_drive_name(&self, metric_name: &str) -> Option<String> {
// Pattern: disk_{pool_name}_{drive_name}_{metric_type}
// Since pool_name can contain underscores, work backwards from known metric suffixes
// Pattern: disk_{pool_name}_{drive_name}_{metric_type} OR disk_{drive_name}_{metric_type}
// Pool drives: disk_srv_media_sdb_temperature
// Physical drives: disk_nvme0n1_temperature
if metric_name.starts_with("disk_") {
if let Some(suffix_pos) = metric_name.rfind("_temperature")
.or_else(|| metric_name.rfind("_wear_percent"))
.or_else(|| metric_name.rfind("_health")) {
// Find the second-to-last underscore to get the drive name
let before_suffix = &metric_name[..suffix_pos];
// Extract the last component as drive name (e.g., "sdb", "sdc", "nvme0n1")
if let Some(drive_start) = before_suffix.rfind('_') {
return Some(before_suffix[drive_start + 1..].to_string());
} else {
// Handle physical drive metrics: disk_{drive}_metric (no pool)
// Extract everything after "disk_" as the drive name
return Some(before_suffix[5..].to_string()); // Skip "disk_"
}
}
}
None
}
/// Render storage section with tree structure
/// Render storage section with enhanced tree structure
fn render_storage(&self) -> Vec<Line<'_>> {
let mut lines = Vec::new();
for pool in &self.storage_pools {
// Pool header line
let usage_text = match (pool.usage_percent, pool.used_gb, pool.total_gb) {
(Some(pct), Some(used), Some(total)) => {
format!("{:.0}% {:.1}GB/{:.1}GB", pct, used, total)
}
_ => "—% —GB/—GB".to_string(),
};
// Pool header line with type and health
let pool_label = if pool.pool_type.starts_with("drive (") {
// For physical drives, show the drive name with temperature and wear percentage if available
// Look for any drive with temp/wear data (physical drives may have drives named after the pool)
let temp_opt = pool.drives.iter()
.find_map(|d| d.temperature);
let wear_opt = pool.drives.iter()
.find_map(|d| d.wear_percent);
let pool_label = if pool.pool_type.to_lowercase() == "single" {
let mut drive_info = Vec::new();
if let Some(temp) = temp_opt {
drive_info.push(format!("T: {:.0}°C", temp));
}
if let Some(wear) = wear_opt {
drive_info.push(format!("W: {:.0}%", wear));
}
if drive_info.is_empty() {
format!("{}:", pool.name)
} else {
format!("{} {}:", pool.name, drive_info.join(" "))
}
} else if pool.pool_type == "single" {
format!("{}:", pool.mount_point)
} else {
format!("{} ({}):", pool.mount_point, pool.pool_type)
};
let pool_spans = StatusIcons::create_status_spans(
pool.status.clone(),
pool.health_status.clone(),
&pool_label
);
lines.push(Line::from(pool_spans));
// Drive lines with tree structure
let has_usage_line = pool.usage_percent.is_some();
for (i, drive) in pool.drives.iter().enumerate() {
let is_last_drive = i == pool.drives.len() - 1;
let tree_symbol = if is_last_drive && !has_usage_line { "└─" } else { "├─" };
// Skip pool health line as discussed - removed
// Total usage line (only show for multi-drive pools, skip for single physical drives)
if !pool.pool_type.starts_with("drive (") {
let usage_text = match (pool.usage_percent, pool.used_gb, pool.total_gb) {
(Some(pct), Some(used), Some(total)) => {
format!("Total: {:.0}% {:.1}GB/{:.1}GB", pct, used, total)
}
_ => "Total: —% —GB/—GB".to_string(),
};
let has_drives = !pool.drives.is_empty();
let has_filesystems = !pool.filesystems.is_empty();
let has_children = has_drives || has_filesystems;
let tree_symbol = if has_children { "├─" } else { "└─" };
let mut usage_spans = vec![
Span::raw(" "),
Span::styled(tree_symbol, Typography::tree()),
Span::raw(" "),
];
usage_spans.extend(StatusIcons::create_status_spans(pool.status.clone(), &usage_text));
lines.push(Line::from(usage_spans));
}
// Drive lines with enhanced grouping
if pool.pool_type.contains("mergerfs") && pool.drives.len() > 1 {
// Group drives by type for mergerfs pools
let (data_drives, parity_drives): (Vec<_>, Vec<_>) = pool.drives.iter().enumerate()
.partition(|(_, drive)| {
// Simple heuristic: drives with 'parity' in name or sdc (common parity drive)
!drive.name.to_lowercase().contains("parity") && drive.name != "sdc"
});
// Show data drives
if !data_drives.is_empty() {
lines.push(Line::from(vec![
Span::raw(" "),
Span::styled("├─ ", Typography::tree()),
Span::styled("Data Disks:", Typography::secondary()),
]));
for (i, (_, drive)) in data_drives.iter().enumerate() {
let is_last = i == data_drives.len() - 1;
if is_last && parity_drives.is_empty() {
self.render_drive_line(&mut lines, drive, "│ └─");
} else {
self.render_drive_line(&mut lines, drive, "│ ├─");
}
}
}
// Show parity drives
if !parity_drives.is_empty() {
lines.push(Line::from(vec![
Span::raw(" "),
Span::styled("└─ ", Typography::tree()),
Span::styled("Parity:", Typography::secondary()),
]));
for (i, (_, drive)) in parity_drives.iter().enumerate() {
let is_last = i == parity_drives.len() - 1;
if is_last {
self.render_drive_line(&mut lines, drive, " └─");
} else {
self.render_drive_line(&mut lines, drive, " ├─");
}
}
}
} else if pool.pool_type != "single" && pool.drives.len() > 1 {
// Regular drive listing for non-mergerfs multi-drive pools
for (i, drive) in pool.drives.iter().enumerate() {
let is_last = i == pool.drives.len() - 1;
let tree_symbol = if is_last { "└─" } else { "├─" };
self.render_drive_line(&mut lines, drive, tree_symbol);
}
} else if pool.pool_type.starts_with("drive (") {
// Physical drive pools: wear data shown in header, skip drive lines, show filesystems directly
for (i, filesystem) in pool.filesystems.iter().enumerate() {
let is_last = i == pool.filesystems.len() - 1;
let tree_symbol = if is_last { "└─" } else { "├─" };
let fs_text = match (filesystem.usage_percent, filesystem.used_gb, filesystem.total_gb) {
(Some(pct), Some(used), Some(total)) => {
format!("{}: {:.0}% {:.1}GB/{:.1}GB", filesystem.mount_point, pct, used, total)
}
(Some(pct), _, Some(total)) => {
format!("{}: {:.0}% —GB/{:.1}GB", filesystem.mount_point, pct, total)
}
(Some(pct), _, _) => {
format!("{}: {:.0}% —GB/—GB", filesystem.mount_point, pct)
}
(_, Some(used), Some(total)) => {
format!("{}: —% {:.1}GB/{:.1}GB", filesystem.mount_point, used, total)
}
_ => format!("{}: —% —GB/—GB", filesystem.mount_point),
};
let mut fs_spans = vec![
Span::raw(" "),
Span::styled(tree_symbol, Typography::tree()),
Span::raw(" "),
];
fs_spans.extend(StatusIcons::create_status_spans(filesystem.status.clone(), &fs_text));
lines.push(Line::from(fs_spans));
}
} else {
// Single drive or simple pools
for (i, drive) in pool.drives.iter().enumerate() {
let is_last = i == pool.drives.len() - 1;
let tree_symbol = if is_last { "└─" } else { "├─" };
self.render_drive_line(&mut lines, drive, tree_symbol);
}
}
}
lines
}
/// Helper to render a single drive line
fn render_drive_line<'a>(&self, lines: &mut Vec<Line<'a>>, drive: &StorageDrive, tree_symbol: &'a str) {
let mut drive_info = Vec::new();
if let Some(temp) = drive.temperature {
drive_info.push(format!("T: {:.0}C", temp));
drive_info.push(format!("T: {:.0}°C", temp));
}
if let Some(wear) = drive.wear_percent {
drive_info.push(format!("W: {:.0}%", wear));
}
// Always show drive name with info, or just name if no info available
let drive_text = if drive_info.is_empty() {
drive.name.clone()
} else {
format!("{} {}", drive.name, drive_info.join(" "))
format!("{} {}", drive.name, drive_info.join(" "))
};
let mut drive_spans = vec![
@@ -328,22 +644,6 @@ impl SystemWidget {
drive_spans.extend(StatusIcons::create_status_spans(drive.status.clone(), &drive_text));
lines.push(Line::from(drive_spans));
}
// Usage line
if pool.usage_percent.is_some() {
let tree_symbol = "└─";
let mut usage_spans = vec![
Span::raw(" "),
Span::styled(tree_symbol, Typography::tree()),
Span::raw(" "),
];
usage_spans.extend(StatusIcons::create_status_spans(pool.status.clone(), &usage_text));
lines.push(Line::from(usage_spans));
}
}
lines
}
}
impl Widget for SystemWidget {

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard-shared"
version = "0.1.97"
version = "0.1.131"
edition = "2021"
[dependencies]

161
shared/src/agent_data.rs Normal file
View File

@@ -0,0 +1,161 @@
use serde::{Deserialize, Serialize};
/// Complete structured data from an agent
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AgentData {
pub hostname: String,
pub agent_version: String,
pub timestamp: u64,
pub system: SystemData,
pub services: Vec<ServiceData>,
pub backup: BackupData,
}
/// System-level monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SystemData {
pub cpu: CpuData,
pub memory: MemoryData,
pub storage: StorageData,
}
/// CPU monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CpuData {
pub load_1min: f32,
pub load_5min: f32,
pub load_15min: f32,
pub frequency_mhz: f32,
pub temperature_celsius: Option<f32>,
}
/// Memory monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryData {
pub usage_percent: f32,
pub total_gb: f32,
pub used_gb: f32,
pub available_gb: f32,
pub swap_total_gb: f32,
pub swap_used_gb: f32,
pub tmpfs: Vec<TmpfsData>,
}
/// Tmpfs filesystem data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TmpfsData {
pub mount: String,
pub usage_percent: f32,
pub used_gb: f32,
pub total_gb: f32,
}
/// Storage monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct StorageData {
pub drives: Vec<DriveData>,
pub pools: Vec<PoolData>,
}
/// Individual drive data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct DriveData {
pub name: String,
pub health: String,
pub temperature_celsius: Option<f32>,
pub wear_percent: Option<f32>,
pub filesystems: Vec<FilesystemData>,
}
/// Filesystem on a drive
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FilesystemData {
pub mount: String,
pub usage_percent: f32,
pub used_gb: f32,
pub total_gb: f32,
}
/// Storage pool (MergerFS, RAID, etc.)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolData {
pub name: String,
pub mount: String,
pub pool_type: String, // "mergerfs", "raid", etc.
pub health: String,
pub usage_percent: f32,
pub used_gb: f32,
pub total_gb: f32,
pub data_drives: Vec<PoolDriveData>,
pub parity_drives: Vec<PoolDriveData>,
}
/// Drive in a storage pool
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PoolDriveData {
pub name: String,
pub temperature_celsius: Option<f32>,
pub wear_percent: Option<f32>,
pub health: String,
}
/// Service monitoring data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServiceData {
pub name: String,
pub status: String, // "active", "inactive", "failed"
pub memory_mb: f32,
pub disk_gb: f32,
pub user_stopped: bool,
}
/// Backup system data
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct BackupData {
pub status: String,
pub last_run: Option<u64>,
pub next_scheduled: Option<u64>,
pub total_size_gb: Option<f32>,
pub repository_health: Option<String>,
}
impl AgentData {
/// Create new agent data with current timestamp
pub fn new(hostname: String, agent_version: String) -> Self {
Self {
hostname,
agent_version,
timestamp: chrono::Utc::now().timestamp() as u64,
system: SystemData {
cpu: CpuData {
load_1min: 0.0,
load_5min: 0.0,
load_15min: 0.0,
frequency_mhz: 0.0,
temperature_celsius: None,
},
memory: MemoryData {
usage_percent: 0.0,
total_gb: 0.0,
used_gb: 0.0,
available_gb: 0.0,
swap_total_gb: 0.0,
swap_used_gb: 0.0,
tmpfs: Vec::new(),
},
storage: StorageData {
drives: Vec::new(),
pools: Vec::new(),
},
},
services: Vec::new(),
backup: BackupData {
status: "unknown".to_string(),
last_run: None,
next_scheduled: None,
total_size_gb: None,
repository_health: None,
},
}
}
}

View File

@@ -1,8 +1,10 @@
pub mod agent_data;
pub mod cache;
pub mod error;
pub mod metrics;
pub mod protocol;
pub use agent_data::*;
pub use cache::*;
pub use error::*;
pub use metrics::*;

View File

@@ -82,13 +82,13 @@ impl MetricValue {
/// Health status for metrics
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, PartialOrd, Ord)]
pub enum Status {
Inactive, // Lowest priority - treated as good
Ok, // Second lowest - also good
Unknown,
Offline,
Pending,
Warning,
Critical,
Inactive, // Lowest priority
Unknown, //
Offline, //
Pending, //
Ok, // 5th place - good status has higher priority than unknown states
Warning, //
Critical, // Highest priority
}
impl Status {

View File

@@ -1,13 +1,9 @@
use crate::metrics::Metric;
use crate::agent_data::AgentData;
use serde::{Deserialize, Serialize};
/// Message sent from agent to dashboard via ZMQ
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MetricMessage {
pub hostname: String,
pub timestamp: u64,
pub metrics: Vec<Metric>,
}
/// Always structured data - no legacy metrics support
pub type AgentMessage = AgentData;
/// Command output streaming message
#[derive(Debug, Clone, Serialize, Deserialize)]
@@ -20,15 +16,6 @@ pub struct CommandOutputMessage {
pub timestamp: u64,
}
impl MetricMessage {
pub fn new(hostname: String, metrics: Vec<Metric>) -> Self {
Self {
hostname,
timestamp: chrono::Utc::now().timestamp() as u64,
metrics,
}
}
}
impl CommandOutputMessage {
pub fn new(hostname: String, command_id: String, command_type: String, output_line: String, is_complete: bool) -> Self {
@@ -59,8 +46,8 @@ pub enum Command {
pub enum CommandResponse {
/// Acknowledgment of command
Ack,
/// Metrics response
Metrics(Vec<Metric>),
/// Agent data response
AgentData(AgentData),
/// Pong response to ping
Pong,
/// Error response
@@ -76,7 +63,7 @@ pub struct MessageEnvelope {
#[derive(Debug, Serialize, Deserialize)]
pub enum MessageType {
Metrics,
AgentData,
Command,
CommandResponse,
CommandOutput,
@@ -84,10 +71,10 @@ pub enum MessageType {
}
impl MessageEnvelope {
pub fn metrics(message: MetricMessage) -> Result<Self, crate::SharedError> {
pub fn agent_data(data: AgentData) -> Result<Self, crate::SharedError> {
Ok(Self {
message_type: MessageType::Metrics,
payload: serde_json::to_vec(&message)?,
message_type: MessageType::AgentData,
payload: serde_json::to_vec(&data)?,
})
}
@@ -119,11 +106,11 @@ impl MessageEnvelope {
})
}
pub fn decode_metrics(&self) -> Result<MetricMessage, crate::SharedError> {
pub fn decode_agent_data(&self) -> Result<AgentData, crate::SharedError> {
match self.message_type {
MessageType::Metrics => Ok(serde_json::from_slice(&self.payload)?),
MessageType::AgentData => Ok(serde_json::from_slice(&self.payload)?),
_ => Err(crate::SharedError::Protocol {
message: "Expected metrics message".to_string(),
message: "Expected agent data message".to_string(),
}),
}
}