- Handle lsblk tree symbols (├─, └─) in device parsing
- Extract base device names from partitions (nvme0n1p2 -> nvme0n1)
- Support both NVMe and traditional device naming schemes
- Fixes missing device lines in storage display
- Replace findmnt with lsblk for efficient device name detection
- Fix tree indentation to align consistently with status icon text
- Hide '(Single)' label for single disk storage pools
- Device detection returns actual names (nvme0n1, sda) not UUID paths
- Remove underlying_devices field from FilesystemConfig
- Add device detection at startup using findmnt command
- Store detected devices in HashMap for reuse during collection
- Keep all existing functionality (StoragePool, DriveInfo, SMART data)
- Detect devices only once at initialization, not every collection cycle
- Fixes agent startup failure due to missing underlying_devices config
- Remove Active users line from NixOS section
- Add status icons to CPU and RAM sections
- Restructure layout with proper tree symbols and spacing
- Add empty lines between sections for better readability
- Remove Storage header, show each filesystem as top-level item
- Fix tree indentation to match specification
- CPU shows load averages with frequency as sub-item
- RAM shows usage with /tmp as sub-item with status icon
- Add mount_point field to StoragePool struct
- Create mapping from pool names to mount points
- Update display to show user-friendly mount points (/, /mnt/steampool)
- Keep device detection for SMART data (temperature, wear)
- Resolves disk name confusion on different hosts
Implement agent version tracking to diagnose deployment issues:
- Add get_agent_hash() method to extract Nix store hash from executable path
- Collect system_agent_hash metric in NixOS collector
- Display "Agent Hash" in system panel under NixOS section
- Update metric filtering to include agent hash
This helps identify which version of the agent is actually running
when troubleshooting deployment or metric collection issues.
Update metric filtering to use exact metric names instead of prefix matching.
This resolves the issue where build version showed 'unknown' despite agent
correctly collecting the metric.
- Strip codename part (e.g., '(Warbler)') from nixos-version output
- Display clean version format: '25.05.20251004.3bcc93c'
- Simplify parsing to use raw nixos-version output as requested
- Change from showing version to build format: 'hash dd/mm/yy H:M:S'
- Parse nixos-version output to extract short hash and format date
- Update system widget to display 'Build:' instead of 'Version:'
- Remove version/build_date fields in favor of single build string
- Follow TODO.md specification for NixOS section layout
- Add memory_tmp_usage_percent, memory_tmp_used_gb, memory_tmp_total_gb metric parsing
- Fix tmpfs display showing as —% —GB/—GB in dashboard
- System widget now properly receives and displays tmpfs metrics from memory collector
- Remove /tmp autodetection from disk collector (57 lines removed)
- Add tmpfs monitoring to memory collector with get_tmpfs_metrics() method
- Generate memory_tmp_* metrics for proper RAM-based tmpfs monitoring
- Fix type annotations in tmpfs parsing for compilation
- System widget now correctly displays tmpfs usage in RAM section
- Create NixOS collector for version and active users detection
- Add SystemWidget combining all system information in TODO.md layout
- Replace separate CPU/Memory widgets with unified system display
- Add tree structure for storage with drive temperature/wear info
- Support NixOS version, active users, load averages, memory usage
- Follow exact decimal formatting from specification
- Use config.interval_seconds instead of hardcoded 300 seconds
- Discovery now happens every 10 seconds (configurable) instead of 5 minutes
- Follows configuration-driven architecture requirements
- Restructure get_monitored_services to avoid nested write locks
- Split discover_services into discover_services_internal that returns data
- Update state in separate scope to prevent deadlock
- Fix borrow checker errors with clone() for status cache
Add detailed debug logging to track:
- Service discovery start
- Individual service parsing
- Final service count and list
- Empty results indication
This will help identify why cmbox disappeared from dashboard.
Major performance optimization:
- Parse and cache service status during discovery from systemctl list-units
- Eliminate per-service systemctl is-active and show calls
- Reduce systemctl calls from 1+2N to just 1 call total
- For 10 services: 21 calls → 1 call (95% reduction)
- Add fallback to systemctl for cache misses
This completes the major systemctl call reduction goal from TODO.md.
Implement glob pattern matching for service filters:
- nginx* matches nginx, nginx-config-reload, etc.
- *backup matches any service ending with 'backup'
- docker*prune matches docker-weekly-prune, etc.
- Exact matches still work as before (backward compatible)
Addresses TODO.md requirement for '*' filtering support.
Reduce from 2 systemctl commands to 1 by using only:
systemctl list-units --type=service --all
This captures all services (active, inactive, failed) in one call,
eliminating the redundant list-unit-files command.
Achieves the TODO.md goal of reducing systemctl calls.
Remove all sudo -u systemctl commands and user service processing.
Now only collects system services via systemctl list-units/list-unit-files.
Eliminates user service discovery completely as planned in TODO.md.
Change service matching logic from contains-based to exact equality.
Services now match only if service_name == pattern exactly.
This is the first step in the systemd collector optimization plan.
- Fix duplicate storage pool issue by clearing cache on agent startup
- Change storage pool header text to normal color for better readability
- Improve services panel tree icons with proper └─ symbols for last items
- Ensure fresh metrics data on each agent restart
Eliminate duplicate storage entries by removing old disk_count dependency.
Dashboard now uses pure auto-discovery of disk_{pool}_usage_percent metrics.
Fixes multiple storage instances (Storage 0, Storage 1, Storage root)
showing only proper tree structure format.
Add proper hierarchical tree display for storage pools and drives:
- Pool headers with status icons and type indication (Single/multi-drive)
- Individual drive lines with ├─ tree symbols and health status
- Usage summary with └─ end symbol and capacity status
- T: and W: prefixes for temperature and wear level metrics
- Themed status icons using StatusIcons::get_icon() with proper colors
- 2-space indentation for clean tree structure appearance
Replace flat storage display with beautiful tree format:
● Storage steampool (multi-drive):
├─ ● sdb T:35°C W:12%
├─ ● sdc T:38°C W:8%
└─ ● 78.1% 1250.3GB/1600.0GB
Uses agent-calculated status from NixOS-configured thresholds.
Update CLAUDE.md with complete implementation specification.
Restructure storage display to handle new individual metrics architecture:
- Parse disk_{pool}_* metrics instead of indexed disk_{index}_* format
- Support individual drive metrics disk_{pool}_{drive}_health/temperature/wear
- Display tree structure: "Storage {pool} ({type}): drive details"
- Show pool usage summary with individual drive health/temp/wear status
- Auto-discover storage pools and drives from metric patterns
- Maintain proper status aggregation from individual metrics
The dashboard now correctly displays the new enhanced disk collector output
with storage pools containing multiple drives and their individual metrics.
- Add StoragePool and DriveInfo structures for grouping drives by mount point
- Implement SMART data collection for individual drives (health, temperature, wear)
- Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types
- Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear
- Add storage_type and underlying_devices to filesystem configuration
- Move hardcoded service directory mappings to NixOS configuration
- Move hardcoded host-to-user mapping to NixOS configuration
- Remove all unused code and fix compilation warnings
- Clean implementation with zero warnings and no dead code
Individual drives now show health status per storage pool:
Storage root (ext4): nvme0n1 PASSED 42°C 5% wear
Storage steampool (mergerfs): sda/sdb/sdc with individual health data
- Dashboard now automatically looks for /etc/cm-dashboard/dashboard.toml
- No need to specify --config flag when using standard NixOS deployment
- Fallback to manual config path if default not found
- Update help text to reflect optional config parameter
- Simplifies dashboard usage - just run 'cm-dashboard' without arguments
- Remove all unused configuration options from dashboard config module
- Eliminate hardcoded defaults - dashboard now requires config file like agent
- Keep only actually used config: zmq.subscriber_ports and hosts.predefined_hosts
- Remove unused get_host_metrics function from metric store
- Clean up missing module imports (hosts, utils)
- Make dashboard fail fast if no configuration provided
- Align dashboard config approach with agent configuration pattern
- Remove all Default implementations from agent configuration structs
- Make configuration file required for agent startup
- Update NixOS module to generate complete agent.toml configuration
- Add comprehensive configuration options to NixOS module including:
- Service include/exclude patterns for systemd collector
- All thresholds and intervals
- ZMQ communication settings
- Notification and cache configuration
- Agent now fails fast if no configuration provided
- Eliminates configuration drift between defaults and NixOS settings
- Use systemctl --user commands to discover user-level services
- Include both user unit files and loaded user units
- Gracefully handle cases where user commands fail (no user session)
- Treat user services same as system services in filtering
- Enables monitoring of user-level Docker, development servers, etc.
- Add ark-permissions to exclusion list (maintenance service)
- Add sunshine to service_name_filters (game streaming server)
- Improves service discovery for game streaming infrastructure