cm-dashboard

Author	SHA1	Message	Date
Christoffer Martinsson	c77aa6eaaa	Fix Data_3 timeout by removing sequential SMART during pool detection All checks were successful Build and Release / build-and-release (push) Successful in 1m34s Details Root cause: SMART data was collected TWICE: 1. Sequential collection during pool detection in get_drive_info_for_path() using problematic tokio::task::block_in_place() nesting 2. Parallel collection in get_smart_data_for_drives() (v0.1.223) The sequential collection happened FIRST during pool detection, causing sda (Data_3) to timeout due to: - Bad async nesting: block_in_place() wrapping block_on() - Sequential execution causing runtime issues - sda being third in sequence, runtime degraded by then Solution: Remove SMART collection from get_drive_info_for_path(). Pool drive temperatures are populated later from the parallel SMART collection which properly uses futures::join_all. Benefits: - Eliminates problematic async nesting - All SMART queries happen once in parallel only - sda/Data_3 should now show serial (ZDZ4VE0B) and temperature Bump version to v0.1.224	2025-11-30 00:14:25 +01:00
Christoffer Martinsson	8a0e68f0e3	Fix Data_3 timeout by parallelizing SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m10s Details Root cause: SMART data was collected sequentially, one drive at a time. With 5 drives taking ~500ms each, total collection time was 2.5+ seconds. When disk collector runs every 1 second, this caused overlapping collections creating resource contention. The last drive (sda/Data_3) would timeout due to the drive being accessed by the previous collection. Solution: Query all drives in parallel using futures::join_all. Now all drives get their SMART data collected simultaneously with independent 3-second timeouts, eliminating contention and reducing total collection time from 2.5+ seconds to ~500ms (the slowest single drive). Benefits: - All drives complete in ~500ms instead of 2.5+ seconds - No overlapping collections causing resource contention - Each drive gets full 3-second timeout window - sda/Data_3 should now show temperature and serial number Bump version to v0.1.223	2025-11-29 23:51:43 +01:00
Christoffer Martinsson	caba78004e	Fix empty Storage section by properly aliasing command types All checks were successful Build and Release / build-and-release (push) Successful in 2m6s Details v0.1.220 broke disk collector by changing the import from std::process::Command to tokio::process::Command, but lines 193 and 767 explicitly used std::process::Command::new() which silently failed. Solution: Import both as aliases (TokioCommand/StdCommand) and use appropriate type for each operation - async commands use TokioCommand with run_command_with_timeout, sync commands use StdCommand with system timeout wrapper. Fixes: Empty Storage section after v0.1.220 deployment Bump version to v0.1.221	2025-11-29 21:29:33 +01:00
Christoffer Martinsson	77bf08a978	Fix blocking smartctl commands with proper async/timeout handling All checks were successful Build and Release / build-and-release (push) Successful in 2m2s Details - Changed disk collector to use tokio::process::Command instead of std::process::Command - Updated run_command_with_timeout to properly kill processes on timeout - Fixes issue where smartctl hangs on problematic drives (/dev/sda) freezing entire agent - Timeout now force-kills hung processes using kill -9, preventing orphaned smartctl processes This resolves the issue where Data_3 showed unknown status because smartctl was hanging indefinitely trying to read from a problematic drive, blocking the entire collector. Bump version to v0.1.220 Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 21:09:04 +01:00
Christoffer Martinsson	374b126446	Reduce all command timeouts to 2-3 seconds max With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds. Changed timeouts: - smartctl: 10s → 3s (critical: multiple drives queried sequentially) - du: 5s → 2s - lsblk: 5s → 2s - systemctl list commands: 5s → 3s - systemctl show/is-active: 3s → 2s - docker commands: 5s → 3s - df, ip commands: 3s → 2s Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.	2025-11-27 16:38:54 +01:00
Christoffer Martinsson	1e0510be81	Add comprehensive timeouts to all blocking system commands Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission. Changes: - Add run_command_with_timeout() wrapper using tokio for async command execution - Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives) - Apply 5s timeout to du, lsblk, systemctl list commands - Apply 3s timeout to systemctl show/is-active, df, ip commands - Apply 2s timeout to hostname command - Use system 'timeout' command for sync operations where async not needed Critical fixes: - smartctl: Failing drives could block for 30+ seconds per drive - du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds - systemctl/docker: Commands could block indefinitely during system issues With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.	2025-11-27 16:34:08 +01:00
Christoffer Martinsson	7a68da01f5	Remove debug logging for NVMe SMART collection All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details	2025-11-27 15:40:16 +01:00
Christoffer Martinsson	5be67fed64	Add debug logging for NVMe SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 15:00:48 +01:00
Christoffer Martinsson	cac836601b	Add NVMe device type flag for SMART data collection All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 13:34:30 +01:00
Christoffer Martinsson	bd22ce265b	Use direct smartctl with CAP_SYS_RAWIO instead of sudo All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details	2025-11-27 13:22:13 +01:00
Christoffer Martinsson	bbc8b7b1cb	Add info-level logging for SMART data collection debugging All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details	2025-11-27 13:15:53 +01:00
Christoffer Martinsson	df104bf940	Remove debug prints and unused code All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details - Remove all debug println statements - Remove unused service_tracker module - Remove unused struct fields and methods - Remove empty placeholder files (cpu.rs, memory.rs, defaults.rs) - Fix all compiler warnings - Clean build with zero warnings Version bump to 0.1.159	2025-11-25 12:19:04 +01:00
Christoffer Martinsson	d5ce36ee18	Add support for additional SMART attributes All checks were successful Build and Release / build-and-release (push) Successful in 1m30s Details - Support Temperature_Case attribute for Intel SSDs - Support Media_Wearout_Indicator attribute for wear percentage - Parse wear value from column 3 (VALUE) for Media_Wearout_Indicator - Fixes temperature and wear display for Intel PHLA847000FL512DGN drives	2025-11-25 11:53:08 +01:00
Christoffer Martinsson	4f80701671	Fix NVMe serial display and improve pool health logic All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details - Fix physical drive serial number display in dashboard - Improve pool health calculation for arrays with multiple disks - Support proper tree symbols for multiple parity drives - Read git commit hash from /var/lib/cm-dashboard/git-commit for Build display	2025-11-25 11:44:20 +01:00
Christoffer Martinsson	267654fda4	Improve NVMe serial parsing and restructure MergerFS display All checks were successful Build and Release / build-and-release (push) Successful in 1m25s Details - Fix NVMe serial number parsing to handle whitespace variations - Move mount point to MergerFS header, remove drive count - Restructure data drives to same level as parity with Data_1, Data_2 labels - Remove "Total:" label from pool usage line - Update parity to use closing tree symbol as last item	2025-11-25 11:28:54 +01:00
Christoffer Martinsson	dc1105eefe	Display disk serial numbers instead of device names All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details - Add serial_number field to DriveData structure - Collect serial numbers from SMART data for all drives - Display truncated serial numbers (last 8 chars) in dashboard - Fix parity drive label to show status icon before "Parity:" - Fix mount point label styling to match other labels	2025-11-25 11:06:54 +01:00
Christoffer Martinsson	c9d12793ef	Replace device names with serial numbers in MergerFS pool display All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details Updates disk collector and dashboard to show drive serial numbers instead of device names (sdX) for MergerFS data/parity drives. Agent extracts serial numbers from SMART data and dashboard displays them when available, falling back to device names.	2025-11-25 10:30:37 +01:00
Christoffer Martinsson	8f80015273	Fix dashboard storage pool label styling All checks were successful Build and Release / build-and-release (push) Successful in 1m20s Details Replace non-existent Typography::primary() with Typography::secondary() for MergerFS pool labels following existing UI patterns.	2025-11-25 10:16:26 +01:00
Christoffer Martinsson	7b11db990c	Restore complete MergerFS and SnapRAID functionality to disk collector All checks were successful Build and Release / build-and-release (push) Successful in 1m17s Details Updated the disk collector to include all missing functionality from the previous string-based implementation while working with the new structured JSON data architecture: - MergerFS pool discovery from /proc/mounts parsing - SnapRAID parity drive detection via mount path heuristics - Drive categorization (data vs parity) based on path analysis - Numeric mergerfs reference resolution (1:2 -> /mnt/disk paths) - Pool health calculation based on member drive SMART status - Complete SMART data integration for temperatures and wear levels - Proper exclusion of pool member drives from physical drive grouping The implementation replicates the exact logic from the old code while adapting to structured AgentData output format. All mergerfs and snapraid monitoring capabilities are fully restored.	2025-11-25 08:37:32 +01:00
Christoffer Martinsson	66ab7a492d	Complete monitoring system restoration All checks were successful Build and Release / build-and-release (push) Successful in 2m39s Details Fully restored CM Dashboard as a complete monitoring system with working status evaluation and email notifications. COMPLETED PHASES: ✅ Phase 1: Fixed storage display issues - Use lsblk instead of findmnt (eliminates /nix/store bind mount) - Fixed NVMe SMART parsing (Temperature: and Percentage Used:) - Added sudo to smartctl for permissions - Consistent filesystem and tmpfs sorting ✅ Phase 2a: Fixed missing NixOS build information - Added build_version field to AgentData - NixOS collector now populates build info - Dashboard shows actual build instead of "unknown" ✅ Phase 2b: Restored status evaluation system - Added status fields to all structured data types - CPU: load and temperature status evaluation - Memory: usage status evaluation - Storage: temperature, health, and filesystem usage status - All collectors now use their threshold configurations ✅ Phase 3: Restored notification system - Status change detection between collection cycles - Email alerts on status degradation (OK→Warning/Critical) - Detailed notification content with metric values - Full NotificationManager integration CORE FUNCTIONALITY RESTORED: - Real-time monitoring with proper status evaluation - Email notifications on threshold violations - Correct storage display (nvme0n1 T: 28°C W: 1%) - Complete status-aware infrastructure monitoring - Dashboard is now a monitoring system, not just data viewer The CM Dashboard monitoring system is fully operational.	2025-11-24 19:58:26 +01:00
Christoffer Martinsson	4d615a7f45	Fix mount point ordering consistency - Sort filesystems by mount point in disk collector for consistent display - Sort tmpfs mounts by mount point in memory collector - Eliminates random swapping of / and /boot order between refreshes - Eliminates random swapping of tmpfs mount order in RAM section Ensures predictable, alphabetical ordering for all mount points.	2025-11-24 19:44:37 +01:00
Christoffer Martinsson	fd7ad23205	Fix storage display issues and use dynamic versioning All checks were successful Build and Release / build-and-release (push) Successful in 1m7s Details Phase 1 fixes for storage display: - Replace findmnt with lsblk to eliminate bind mount issues (/nix/store) - Add sudo to smartctl commands for permission access - Fix NVMe SMART parsing for Temperature: and Percentage Used: fields - Use dynamic version from CARGO_PKG_VERSION instead of hardcoded strings Storage display should now show correct mount points and temperature/wear. Status evaluation and notifications still need restoration in subsequent phases.	2025-11-24 19:26:09 +01:00
Christoffer Martinsson	2b2cb2da3e	Complete atomic migration to structured data architecture All checks were successful Build and Release / build-and-release (push) Successful in 1m7s Details Implements clean structured data collection eliminating all string metric parsing bugs. Collectors now populate AgentData directly with type-safe field access. Key improvements: - Mount points preserved correctly (/ and /boot instead of root/boot) - Tmpfs discovery added to memory collector - Temperature data flows as typed f32 fields - Zero string parsing overhead - Complete removal of MetricCollectionManager bridge - Direct ZMQ transmission of structured JSON All functionality maintained: service tracking, notifications, status evaluation, and multi-host monitoring.	2025-11-24 18:53:31 +01:00
Christoffer Martinsson	41ded0170c	Add wear percentage display and NVMe temperature collection All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details - Display wear percentage in storage headers for single physical drives - Remove redundant drive type indicators, show wear data instead - Fix wear metric parsing for physical drives (underscore count issue) - Add NVMe temperature parsing support (Temperature: format) - Add raw metrics debugging functionality for troubleshooting - Clean up physical drive display to remove redundant information	2025-11-23 20:29:24 +01:00
Christoffer Martinsson	53dbb43352	Fix SnapRAID parity association using directory-based discovery All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details - Replace blanket parity drive inclusion with smart relationship detection - Only associate parity drives from same parent directory as data drives - Prevent incorrect exclusion of nvme0n1 physical drives from grouping - Maintain zero-configuration auto-discovery without hardcoded paths	2025-11-23 18:42:48 +01:00
Christoffer Martinsson	86501fd486	Fix display format to match CLAUDE.md specification All checks were successful Build and Release / build-and-release (push) Successful in 1m17s Details - Use actual device names (sdb, sdc) instead of data_0, parity_0 - Fix physical drive naming to show device names instead of mount points - Update pool name extraction to handle new device-based naming - Ensure Drive: line shows temperature and wear data for physical drives	2025-11-23 18:13:35 +01:00
Christoffer Martinsson	192eea6e0c	Integrate SnapRAID parity drives into mergerfs pools All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details - Add SnapRAID parity drive detection to mergerfs discovery - Remove Pool Status health line as discussed - Update drive display to always show wear data when available - Include /mnt/parity drives as part of mergerfs pool structure	2025-11-23 18:05:19 +01:00
Christoffer Martinsson	e47803b705	Fix mergerfs pool consolidation and naming All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details - Improve pool name extraction in dashboard parsing - Use consistent mergerfs pool naming in agent - Add mount_point metric parsing to use actual mount paths - Fix pool consolidation to prevent duplicate entries	2025-11-23 17:35:23 +01:00
Christoffer Martinsson	439d0d9af6	Fix mergerfs numeric reference parsing for proper pool detection All checks were successful Build and Release / build-and-release (push) Successful in 2m11s Details Add support for numeric mergerfs references like "1:2" by mapping them to actual mount points (/mnt/disk1, /mnt/disk2). This enables proper mergerfs pool detection and hides individual member drives as intended.	2025-11-23 17:27:45 +01:00
Christoffer Martinsson	2242b5ddfe	Make mergerfs detection more robust to prevent discovery failures All checks were successful Build and Release / build-and-release (push) Successful in 2m9s Details Skip mergerfs pools with numeric device references (e.g., "1:2") instead of crashing. This allows regular drive detection to work even when mergerfs uses non-standard mount formats. Preserves existing functionality for standard mergerfs setups.	2025-11-23 17:19:15 +01:00
Christoffer Martinsson	9d0f42d55c	Fix filesystem usage_percent parsing and remove hardcoded status All checks were successful Build and Release / build-and-release (push) Successful in 1m8s Details 1. Add missing _fs_ filter to usage_percent parsing in dashboard 2. Fix agent to use calculated fs_status instead of hardcoded Status::Ok This completes the disk collector auto-discovery by ensuring filesystem usage percentages and status indicators display correctly.	2025-11-23 16:47:20 +01:00
Christoffer Martinsson	006f27f7d9	Fix lsblk parsing for filesystem discovery All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Remove unused debug code and fix device name parsing to properly handle lsblk tree characters. This resolves the issue where only /boot filesystem was discovered instead of both /boot and /.	2025-11-23 16:09:48 +01:00
Christoffer Martinsson	07422cd0a7	Add debug logging for filesystem discovery All checks were successful Build and Release / build-and-release (push) Successful in 1m18s Details	2025-11-23 15:26:49 +01:00
Christoffer Martinsson	7d96ca9fad	Fix disk collector filesystem discovery with debug logging All checks were successful Build and Release / build-and-release (push) Successful in 1m9s Details Add debug logging to filesystem usage collection to identify why some mount points are being dropped during discovery. This should resolve the issue where total capacity shows incorrect values.	2025-11-23 15:15:56 +01:00
Christoffer Martinsson	1e7f1616aa	Complete disk collector rewrite with clean architecture All checks were successful Build and Release / build-and-release (push) Successful in 2m8s Details Replaced complex disk collector with simple lsblk → df → group workflow. Supports both physical drives and mergerfs pools with unified metrics. Eliminates configuration complexity through pure auto-discovery. - Clean discovery pipeline using lsblk and df commands - Physical drive grouping with filesystem children - MergerFS pool detection with parity heuristics - Unified metric generation for consistent dashboard display - SMART data collection for temperature, wear, and health	2025-11-23 14:22:19 +01:00
Christoffer Martinsson	7a3ee3d5ba	Fix physical drive grouping logic for unified pool visualization All checks were successful Build and Release / build-and-release (push) Successful in 2m11s Details Updated filesystem grouping to use extract_base_device method for proper partition-to-drive mapping. This ensures nvme0n1p1 and nvme0n1p2 are correctly grouped under nvme0n1 drive pool instead of separate pools.	2025-11-23 13:54:33 +01:00
Christoffer Martinsson	d68ecfbc64	Complete unified pool visualization with filesystem children All checks were successful Build and Release / build-and-release (push) Successful in 2m17s Details - Implement filesystem children display under physical drive pools - Agent generates individual filesystem metrics for each mount point - Dashboard parses filesystem metrics and displays as tree children - Add filesystem usage, total, and available space metrics - Support target format: drive info + filesystem children hierarchy - Fix compilation warnings by properly using available_bytes calculation	2025-11-23 12:48:24 +01:00
Christoffer Martinsson	d1272a6c13	Implement unified pool visualization for single drives All checks were successful Build and Release / build-and-release (push) Successful in 1m19s Details - Group single disk filesystems by physical drive during auto-discovery - Create physical drive pools with filesystem children - Display temperature, wear, and health at drive level - Provide consistent hierarchical storage visualization - Fix borrow checker issues in create_physical_drive_pool method - Add PhysicalDrive case to all StoragePoolType match statements	2025-11-23 12:10:42 +01:00
Christoffer Martinsson	33b3beb342	Implement storage auto-discovery system All checks were successful Build and Release / build-and-release (push) Successful in 1m49s Details - Add automatic detection of mergerfs pools by parsing /proc/mounts - Implement smart heuristics for parity disk identification - Store discovered topology at agent startup for efficient monitoring - Eliminate need for manual storage pool configuration - Support zero-config storage visualization with backward compatibility - Clean up mount parsing and remove unused fields	2025-11-23 11:44:57 +01:00
Christoffer Martinsson	f9384d9df6	Implement enhanced storage pool visualization All checks were successful Build and Release / build-and-release (push) Successful in 2m34s Details - Add support for mergerfs pool grouping with data and parity disk separation - Implement pool health monitoring (healthy/degraded/critical status) - Create hierarchical tree view for multi-disk storage arrays - Add automatic pool type detection and member disk association - Maintain backward compatibility for single disk configurations - Support future extension for RAID and ZFS pool types	2025-11-23 11:18:21 +01:00
Christoffer Martinsson	2910b7d875	Update version to 0.1.22 and fix system metric status calculation All checks were successful Build and Release / build-and-release (push) Successful in 1m11s Details - Fix /tmp usage status to use proper thresholds instead of hardcoded Ok status - Fix wear level status to use configurable thresholds instead of hardcoded values - Add dedicated tmp_status field to SystemWidget for proper /tmp status display - Remove host-level hourglass icon during service operations - Implement immediate service status updates after start/stop/restart commands - Remove active users display and collection from NixOS section - Fix immediate host status aggregation transmission to dashboard	2025-10-28 13:21:56 +01:00
Christoffer Martinsson	8dffe18a23	Improve SATA SSD wear level calculation Some checks failed Build and Release / build-and-release (push) Failing after 1m24s Details - Support multiple SATA SSD wear attributes (SSD_Life_Left, Media_Wearout_Indicator, etc.) - Handle manufacturer differences in wear reporting - Proper parsing of SMART table format with VALUE column - Covers Samsung, Intel, Crucial and other common SSD types - NVMe Percentage Used support maintained	2025-10-25 22:32:09 +02:00
Christoffer Martinsson	0c544753f9	Move SMART configuration into disk config - Consolidate SMART thresholds into DiskConfig structure - Remove separate SmartConfig - disk collector handles all drive data - Update NixOS configuration to use disk.temperature_* settings - Remove hardcoded temperature thresholds in disk collector - Logical grouping: disk collector owns all disk/drive configuration	2025-10-25 22:29:26 +02:00
Christoffer Martinsson	4b54a59e35	Remove unused code and eliminate compiler warnings - Remove unused fields from CommandStatus variants - Clean up unused methods and unused collector fields - Fix lifetime syntax warning in SystemWidget - Delete unused cache module completely - Remove redundant render methods from widgets All agent and dashboard warnings eliminated while preserving panel switching and scrolling functionality.	2025-10-25 14:15:52 +02:00
Christoffer Martinsson	d193b90ba1	Fix device detection to properly parse lsblk output - Handle lsblk tree symbols (├─, └─) in device parsing - Extract base device names from partitions (nvme0n1p2 -> nvme0n1) - Support both NVMe and traditional device naming schemes - Fixes missing device lines in storage display	2025-10-23 19:16:33 +02:00
Christoffer Martinsson	ad298ac70c	Fix device detection, tree indentation, and hide Single storage type - Replace findmnt with lsblk for efficient device name detection - Fix tree indentation to align consistently with status icon text - Hide '(Single)' label for single disk storage pools - Device detection returns actual names (nvme0n1, sda) not UUID paths	2025-10-23 19:06:52 +02:00
Christoffer Martinsson	9f34c67bfa	Fix debug log reference to removed underlying_devices field	2025-10-23 18:56:16 +02:00
Christoffer Martinsson	5134c5320a	Fix disk collector to use dynamic device detection - Remove underlying_devices field from FilesystemConfig - Add device detection at startup using findmnt command - Store detected devices in HashMap for reuse during collection - Keep all existing functionality (StoragePool, DriveInfo, SMART data) - Detect devices only once at initialization, not every collection cycle - Fixes agent startup failure due to missing underlying_devices config	2025-10-23 18:50:40 +02:00
Christoffer Martinsson	9e80d6b654	Remove hardcoded /tmp autodetection and implement proper tmpfs monitoring - Remove /tmp autodetection from disk collector (57 lines removed) - Add tmpfs monitoring to memory collector with get_tmpfs_metrics() method - Generate memory_tmp_* metrics for proper RAM-based tmpfs monitoring - Fix type annotations in tmpfs parsing for compilation - System widget now correctly displays tmpfs usage in RAM section	2025-10-23 14:26:15 +02:00
Christoffer Martinsson	08d3454683	Enhance disk collector with individual drive health monitoring - Add StoragePool and DriveInfo structures for grouping drives by mount point - Implement SMART data collection for individual drives (health, temperature, wear) - Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types - Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear - Add storage_type and underlying_devices to filesystem configuration - Move hardcoded service directory mappings to NixOS configuration - Move hardcoded host-to-user mapping to NixOS configuration - Remove all unused code and fix compilation warnings - Clean implementation with zero warnings and no dead code Individual drives now show health status per storage pool: Storage root (ext4): nvme0n1 PASSED 42°C 5% wear Storage steampool (mergerfs): sda/sdb/sdc with individual health data	2025-10-22 19:59:25 +02:00

1 2

58 Commits