Updated the disk collector to include all missing functionality from the
previous string-based implementation while working with the new structured
JSON data architecture:
- MergerFS pool discovery from /proc/mounts parsing
- SnapRAID parity drive detection via mount path heuristics
- Drive categorization (data vs parity) based on path analysis
- Numeric mergerfs reference resolution (1:2 -> /mnt/disk paths)
- Pool health calculation based on member drive SMART status
- Complete SMART data integration for temperatures and wear levels
- Proper exclusion of pool member drives from physical drive grouping
The implementation replicates the exact logic from the old code while
adapting to structured AgentData output format. All mergerfs and snapraid
monitoring capabilities are fully restored.
Replace timestamp parsing with direct display of start_time from backup TOML file to ensure timestamp always appears regardless of format. Remove empty line spacing above backup section for compact layout.
Changes:
- Remove parsed timestamp fields and use raw start_time string from TOML
- Display backup time directly from TOML file without parsing
- Remove blank line above backup section for tighter layout
- Simplify BackupData structure by removing last_run and next_scheduled fields
Version bump to v0.1.150
Replace standalone backup widget with compact backup section in system widget displaying disk serial, temperature, wear level, timing, and usage information.
Changes:
- Remove standalone backup widget and integrate into system widget
- Update backup collector to read TOML format from backup script
- Add BackupDiskData structure with serial, usage, temperature, wear fields
- Implement compact backup display matching specification format
- Add time formatting utilities for backup timing display
- Update backup data extraction from TOML with disk space parsing
Version bump to v0.1.149
Resolves nginx sites appearing only briefly during collection cycles by implementing proper caching of complete service data including sub-services.
Changes:
- Add cached_service_data field to store complete ServiceData with sub-services
- Modify collection logic to cache full service objects instead of basic ServiceInfo
- Update cache retrieval to use complete cached data preserving nginx site metrics
- Eliminate flickering of nginx sites between collection cycles
Version bump to v0.1.148
- Remove nginx_ prefix from site names in hierarchical structure
- Fix get_nginx_site_metrics to call correct internal method
- Implement same caching functionality as old working version
- Sites now stay visible continuously with 30s latency updates
- Preserve cached results between refresh cycles
- Remove duplicate status string fields from ServiceData and SubServiceData
- Use only Status enum as single source of truth for service status
- Agent calculates Status enum using calculate_service_status()
- Dashboard converts Status enum to display text for UI
- Implement flexible metrics system for sub-services with label/value/unit
- Fix status icon/text mismatches (inactive services now show gray circles)
- Ensure perfect alignment between service icons and status text
- Add nginx site metrics caching with configurable intervals matching original
- Implement complex nginx config parsing with brace counting and redirect detection
- Replace curl with reqwest HTTP client for proper timeout and redirect handling
- Fix docker container parsing to use comma format with proper status mapping
- Add sudo to directory size command for permission handling
- Change nginx URLs to use https protocol matching original
- Add advanced NixOS ExecStart parsing for argv[] format support
- Add nginx -T fallback functionality for config discovery
- Implement proper server block parsing with domain validation and brace tracking
- Add get_service_memory function matching original signature
All functionality now matches pre-refactor implementation architecture.
- Enhanced directory size logic with minimum 0.001GB visibility and permission error logging
- Added nginx site monitoring with latency checks and NixOS config discovery
- Added docker container monitoring as sub-services
- Integrated sub-service collection for active nginx and docker services
- All missing features from original implementation now restored
Fixes missing services and 0B disk usage issues by restoring:
- Wildcard pattern matching for service filters (gitea*, redis*)
- Service disk usage calculation from directories and WorkingDirectory
- Proper Status::Inactive for inactive services
Services now properly discovered and show actual disk usage.
Fully restored CM Dashboard as a complete monitoring system with working
status evaluation and email notifications.
COMPLETED PHASES:
✅ Phase 1: Fixed storage display issues
- Use lsblk instead of findmnt (eliminates /nix/store bind mount)
- Fixed NVMe SMART parsing (Temperature: and Percentage Used:)
- Added sudo to smartctl for permissions
- Consistent filesystem and tmpfs sorting
✅ Phase 2a: Fixed missing NixOS build information
- Added build_version field to AgentData
- NixOS collector now populates build info
- Dashboard shows actual build instead of "unknown"
✅ Phase 2b: Restored status evaluation system
- Added status fields to all structured data types
- CPU: load and temperature status evaluation
- Memory: usage status evaluation
- Storage: temperature, health, and filesystem usage status
- All collectors now use their threshold configurations
✅ Phase 3: Restored notification system
- Status change detection between collection cycles
- Email alerts on status degradation (OK→Warning/Critical)
- Detailed notification content with metric values
- Full NotificationManager integration
CORE FUNCTIONALITY RESTORED:
- Real-time monitoring with proper status evaluation
- Email notifications on threshold violations
- Correct storage display (nvme0n1 T: 28°C W: 1%)
- Complete status-aware infrastructure monitoring
- Dashboard is now a monitoring system, not just data viewer
The CM Dashboard monitoring system is fully operational.
- Sort filesystems by mount point in disk collector for consistent display
- Sort tmpfs mounts by mount point in memory collector
- Eliminates random swapping of / and /boot order between refreshes
- Eliminates random swapping of tmpfs mount order in RAM section
Ensures predictable, alphabetical ordering for all mount points.
Phase 1 fixes for storage display:
- Replace findmnt with lsblk to eliminate bind mount issues (/nix/store)
- Add sudo to smartctl commands for permission access
- Fix NVMe SMART parsing for Temperature: and Percentage Used: fields
- Use dynamic version from CARGO_PKG_VERSION instead of hardcoded strings
Storage display should now show correct mount points and temperature/wear.
Status evaluation and notifications still need restoration in subsequent phases.
Implements clean structured data collection eliminating all string metric
parsing bugs. Collectors now populate AgentData directly with type-safe
field access.
Key improvements:
- Mount points preserved correctly (/ and /boot instead of root/boot)
- Tmpfs discovery added to memory collector
- Temperature data flows as typed f32 fields
- Zero string parsing overhead
- Complete removal of MetricCollectionManager bridge
- Direct ZMQ transmission of structured JSON
All functionality maintained: service tracking, notifications, status
evaluation, and multi-host monitoring.
Update storage display to match CLAUDE.md specification:
- Show drive temp/wear on main line: nvme0n1 T: 25°C W: 4%
- Display individual filesystems as sub-items: /: 55% 250.5GB/456.4GB
- Remove Total usage line in favor of filesystem breakdown
Clean up code warnings:
- Remove unused heartbeat methods and fields
- Remove unused backup widget fields and methods
- Add allow attributes for legacy methods
Agent heartbeat was sending empty AgentData every few seconds, causing
dashboard to display zero values for all metrics intermittently. Since
agent already transmits complete data every 1 second, heartbeat is
redundant. Dashboard will detect offline hosts via data timestamps.
Allow agent configuration without explicit filesystems list by making
the field optional with serde default, enabling pure auto-discovery mode.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Update last_collection timestamp even when collectors fail to prevent
immediate retry loops that cause data transmission gaps every 5 seconds.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Replace fragile string-based metrics with type-safe JSON data structures.
Agent converts all metrics to structured data, dashboard processes typed fields.
Changes:
- Add AgentData struct with CPU, memory, storage, services, backup fields
- Replace string parsing with direct field access throughout system
- Maintain UI compatibility via temporary metric bridge conversion
- Fix NVMe temperature display and eliminate string parsing bugs
- Update protocol to support structured data transmission over ZMQ
- Comprehensive metric type coverage: CPU, memory, storage, services, backup
Version bump to 0.1.131
- Display wear percentage in storage headers for single physical drives
- Remove redundant drive type indicators, show wear data instead
- Fix wear metric parsing for physical drives (underscore count issue)
- Add NVMe temperature parsing support (Temperature: format)
- Add raw metrics debugging functionality for troubleshooting
- Clean up physical drive display to remove redundant information
- Replace blanket parity drive inclusion with smart relationship detection
- Only associate parity drives from same parent directory as data drives
- Prevent incorrect exclusion of nvme0n1 physical drives from grouping
- Maintain zero-configuration auto-discovery without hardcoded paths
- Use actual device names (sdb, sdc) instead of data_0, parity_0
- Fix physical drive naming to show device names instead of mount points
- Update pool name extraction to handle new device-based naming
- Ensure Drive: line shows temperature and wear data for physical drives
- Add SnapRAID parity drive detection to mergerfs discovery
- Remove Pool Status health line as discussed
- Update drive display to always show wear data when available
- Include /mnt/parity drives as part of mergerfs pool structure
- Improve pool name extraction in dashboard parsing
- Use consistent mergerfs pool naming in agent
- Add mount_point metric parsing to use actual mount paths
- Fix pool consolidation to prevent duplicate entries
Add support for numeric mergerfs references like "1:2" by mapping them
to actual mount points (/mnt/disk1, /mnt/disk2). This enables proper
mergerfs pool detection and hides individual member drives as intended.
Skip mergerfs pools with numeric device references (e.g., "1:2")
instead of crashing. This allows regular drive detection to work
even when mergerfs uses non-standard mount formats.
Preserves existing functionality for standard mergerfs setups.
1. Add missing _fs_ filter to usage_percent parsing in dashboard
2. Fix agent to use calculated fs_status instead of hardcoded Status::Ok
This completes the disk collector auto-discovery by ensuring filesystem
usage percentages and status indicators display correctly.
Remove unused debug code and fix device name parsing to properly
handle lsblk tree characters. This resolves the issue where only
/boot filesystem was discovered instead of both /boot and /.
Add debug logging to filesystem usage collection to identify why
some mount points are being dropped during discovery. This should
resolve the issue where total capacity shows incorrect values.
Replaced complex disk collector with simple lsblk → df → group workflow.
Supports both physical drives and mergerfs pools with unified metrics.
Eliminates configuration complexity through pure auto-discovery.
- Clean discovery pipeline using lsblk and df commands
- Physical drive grouping with filesystem children
- MergerFS pool detection with parity heuristics
- Unified metric generation for consistent dashboard display
- SMART data collection for temperature, wear, and health
Updated filesystem grouping to use extract_base_device method for proper
partition-to-drive mapping. This ensures nvme0n1p1 and nvme0n1p2 are
correctly grouped under nvme0n1 drive pool instead of separate pools.
- Implement filesystem children display under physical drive pools
- Agent generates individual filesystem metrics for each mount point
- Dashboard parses filesystem metrics and displays as tree children
- Add filesystem usage, total, and available space metrics
- Support target format: drive info + filesystem children hierarchy
- Fix compilation warnings by properly using available_bytes calculation
- Group single disk filesystems by physical drive during auto-discovery
- Create physical drive pools with filesystem children
- Display temperature, wear, and health at drive level
- Provide consistent hierarchical storage visualization
- Fix borrow checker issues in create_physical_drive_pool method
- Add PhysicalDrive case to all StoragePoolType match statements
- Add automatic detection of mergerfs pools by parsing /proc/mounts
- Implement smart heuristics for parity disk identification
- Store discovered topology at agent startup for efficient monitoring
- Eliminate need for manual storage pool configuration
- Support zero-config storage visualization with backward compatibility
- Clean up mount parsing and remove unused fields
- Add support for mergerfs pool grouping with data and parity disk separation
- Implement pool health monitoring (healthy/degraded/critical status)
- Create hierarchical tree view for multi-disk storage arrays
- Add automatic pool type detection and member disk association
- Maintain backward compatibility for single disk configurations
- Support future extension for RAID and ZFS pool types
- Add disk wear percentage collection from SMART data in backup script
- Add backup_disk_wear_percent metric to backup collector with thresholds
- Display wear percentage in backup widget disk section
- Fix storage section overflow handling to use consistent "X more below" logic
- Update maintenance mode to return pending status instead of unknown
- Remove scroll offset fields from HostWidgets struct
- Replace scrolling with simple "X more below" indicators in all widgets
- Remove user-stopped service tracking from agent (now uses SSH control)
- Inactive services now consistently show Status::Inactive with empty circles
- Simplify widget render methods by removing scroll parameters
- Clean up unused imports and legacy scrolling infrastructure
- Fix journalctl command to use -fu for proper log following
- Add new Status::Inactive variant to enum for better service state representation
- Agent now assigns Status::Inactive instead of Status::Warning for inactive services
- Dashboard displays inactive services with empty circle (○) icon in gray color
- User-stopped services still show as Status::Ok with green filled circle
- Inactive services treated as OK for host status aggregation
- Improves visual clarity between active (●), inactive (○), and warning (◐) states
- Remove all transitional icon infrastructure (CommandType, pending transitions)
- Clean up ZMQ command system remnants after SSH migration
- Add real-time log streaming for service start operations
- Show final logs and status for service stop operations
- Fix compilation warnings by removing unused methods
- Simplify UI architecture with pure SSH-based service control
Replace ZMQ-based service start/stop commands with SSH execution in tmux
popups. This provides better user feedback with real-time systemctl output
while eliminating blocking operations from the main message processing loop.
Changes:
- Service start/stop now use SSH with progress display
- Added backup functionality with 'B' key
- Preserved transitional icons (↑/↓) for immediate visual feedback
- Removed all ZMQ service control commands and handlers
- Updated configuration to include backup_alias setting
- All operations (rebuild, backup, services) now use consistent SSH interface
This ensures stable heartbeat processing while providing superior user
experience with live command output and service status feedback.
Simplifies host connection configuration by removing tailscale_ip field,
connection_type preferences, and fallback retry logic. Now uses only the
ip field or hostname as fallback. Eliminates blocking TCP connectivity
tests that interfered with heartbeat processing.
This resolves intermittent host lost/found issues by removing the
connection retry timeouts that blocked the ZMQ message processing loop.
Implement maintenance_mode_file configuration option in NotificationConfig
to allow customizable file paths for suppressing email notifications.
Updates maintenance mode check to use configured path instead of hardcoded
/tmp/cm-maintenance file.
- Add dedicated heartbeat transmission every 5 seconds independent of metric collection
- Fix host offline detection by clearing metrics for disconnected hosts
- Move exclude_email_metrics to NotificationConfig for better organization
- Add cleanup_offline_hosts method to remove stale metrics after heartbeat timeout
- Ensure offline hosts show proper status icons and visual indicators
Version 0.1.63
- Add agent_heartbeat metric to agent transmission for reliable host detection
- Update dashboard to track heartbeat timestamps per host instead of general metrics
- Add configurable heartbeat_timeout_seconds to dashboard ZMQ config (default 10s)
- Remove unused timeout_ms from agent config and revert to non-blocking command reception
- Remove unused heartbeat_interval_ms from agent configuration
- Host disconnect detection now uses dedicated heartbeat metrics for improved reliability
- Bump version to 0.1.57
- Add exclude_email_metrics field to AgentConfig for filtering email notifications
- Metrics matching excluded names skip notification processing but still appear in dashboard
- Optional field with serde(default) for backward compatibility
- Bump version to 0.1.56
- Add Status::Offline enum variant for disconnected hosts
- All configured hosts now always visible showing offline status when disconnected
- Add WakeOnLAN support using wake-on-lan Rust crate
- Implement w key binding to wake offline hosts with MAC addresses
- Simplify configuration to single [hosts] section with MAC addresses only
- Change critical status icon from ◯ to ! for better visibility
- Add proper MAC address parsing and error handling
- Silent WakeOnLAN operation with logging for success/failure
Configuration format:
[hosts]
hostname = { mac_address = "AA:BB:CC:DD:EE:FF" }
Fix notification issues for better operational experience:
Startup Notification Suppression:
- Suppress notifications for transitions from Status::Unknown during agent/server startup
- Prevents notification spam when services transition from Unknown to Warning/Critical on restart
- Only real status changes (not initial discovery) trigger notifications
- Maintains alerting for actual service state changes after startup
Recovery Notification Refinement:
- Recovery notifications only sent when ALL services reach OK status
- Individual service recoveries suppressed if other services still have problems
- Ensures recovery notifications indicate complete system health restoration
- Prevents premature celebration when partial recoveries occur
Result: Clean startup experience without false alerts and meaningful recovery
notifications that truly indicate full system health restoration.
Bump version to v0.1.48