46 Commits

Author SHA1 Message Date
4b7d08153c Implement per-host widget cache for instant host switching
Resolves widget data persistence issue where switching hosts left stale data
from the previous host displayed in widgets.

Key improvements:
- Add Clone derives to all widget structs (CpuWidget, MemoryWidget,
  ServicesWidget, BackupWidget)
- Create HostWidgets struct to cache widget states per hostname
- Update TuiApp with HashMap<String, HostWidgets> for per-host storage
- Fix borrowing issues by cloning hostname before mutable self borrow
- Implement instant widget state restoration when switching hosts

Tab key host switching now displays cached widget data for each host
without stale information persistence between switches.
2025-10-18 19:54:08 +02:00
46cc813a68 Implement Tab key host switching functionality
- Add KeyCode::Tab support to main dashboard event loop
- Add Tab key handling to TuiApp handle_input method
- Tab key now cycles to next host using existing navigate_host logic
- Host switching infrastructure was already implemented, just needed Tab key support
- Current host displayed in bold in title bar, other hosts shown normally
- Metrics filtered by selected host, full navigation working
2025-10-18 19:26:58 +02:00
125111ee99 Implement comprehensive backup monitoring and fix timestamp issues
- Add BackupCollector for reading TOML status files with disk space metrics
- Implement BackupWidget with disk usage display and service status details
- Fix backup script disk space parsing by adding missing capture_output=True
- Update backup widget to show actual disk usage instead of repository size
- Fix timestamp parsing to use backup completion time instead of start time
- Resolve timezone issues by using UTC timestamps in backup script
- Add disk identification metrics (product name, serial number) to backup status
- Enhance UI layout with proper backup monitoring integration
2025-10-18 18:33:41 +02:00
8a36472a3d Implement real-time process monitoring and fix UI hardcoded data
This commit addresses several key issues identified during development:

Major Changes:
- Replace hardcoded top CPU/RAM process display with real system data
- Add intelligent process monitoring to CpuCollector using ps command
- Fix disk metrics permission issues in systemd collector
- Optimize service collection to focus on status, memory, and disk only
- Update dashboard widgets to display live process information

Process Monitoring Implementation:
- Added collect_top_cpu_process() and collect_top_ram_process() methods
- Implemented ps-based monitoring with accurate CPU percentages
- Added filtering to prevent self-monitoring artifacts (ps commands)
- Enhanced error handling and validation for process data
- Dashboard now shows realistic values like "claude (PID 2974) 11.0%"

Service Collection Optimization:
- Removed CPU monitoring from systemd collector for efficiency
- Enhanced service directory permission error logging
- Simplified services widget to show essential metrics only
- Fixed service-to-directory mapping accuracy

UI and Dashboard Improvements:
- Reorganized dashboard layout with btop-inspired multi-panel design
- Updated system panel to include real top CPU/RAM process display
- Enhanced widget formatting and data presentation
- Removed placeholder/hardcoded data throughout the interface

Technical Details:
- Updated agent/src/collectors/cpu.rs with process monitoring
- Modified dashboard/src/ui/mod.rs for real-time process display
- Enhanced systemd collector error handling and disk metrics
- Updated CLAUDE.md documentation with implementation details
2025-10-16 23:55:05 +02:00
7a664ef0fb Remove refresh functionality that causes dashboard to hang
- Remove 'r' key handler that was causing hang on refresh
- Remove RefreshRequested event and check_refresh_request method
- Remove send_refresh_commands function and ZMQ command protocol
- Remove refresh_requested field from App struct
- Clean up status line text (refresh -> tick)

The refresh functionality was causing the dashboard to become unresponsive
when pressing 'r' key. This removes all refresh-related code to fix the issue.
2025-10-16 01:00:39 +02:00
6bc7f97375 Add refresh shortkey 'r' for on-demand metrics refresh
Implements ZMQ command protocol for dashboard-to-agent communication:
- Agents listen on port 6131 for REQ/REP commands
- Dashboard sends "refresh" command when 'r' key is pressed
- Agents force immediate collection of all metrics via force_refresh_all()
- Fresh data is broadcast immediately to dashboard
- Updated help text to show "r: Refresh all metrics"

Also includes metric-level caching architecture foundation for future
granular control over individual metric update frequencies.
2025-10-15 22:30:04 +02:00
efdd713f62 Improve dashboard display and fix service issues
- Remove unreachable descriptions from failed nginx sites
- Show complete site URLs instead of truncating at first dot
- Implement service-specific disk quotas (docker: 4GB, immich: 4GB, others: 1-2GB)
- Truncate process names to show only executable name without full path
- Display only highest C-state instead of all C-states for cleaner output
- Format system RAM as xxxMB/GB (totalGB) to match services format
2025-10-15 09:36:03 +02:00
a64464142c Remove nginx site accessibility filtering to monitor all sites
- Remove check_site_accessibility function and filtering logic
- Monitor ALL nginx sites from config regardless of current status
- Site status determined by measure_site_latency, not accessibility filter
- Fixes missing git.cmtec.se when backend is down (502 errors)
- Sites with errors now show as failed instead of being filtered out
2025-10-14 22:46:06 +02:00
0cb69ea8fa Consolidate HTTP checking and improve display formatting
- Change site latency timeout from 5s to 2s for faster error detection
- Replace curl with reqwest for external connectivity checks (consistent timeouts)
- Remove unused gitea-specific monitoring functionality
- Update dashboard: show 'unreachable' for latency > 2000ms, add arrows (→) between site and latency
- Add percentage signs to CPU metrics display
- All HTTP requests now use reqwest with 2-second timeouts
2025-10-14 22:24:22 +02:00
f3b6d12f68 Add top CPU and RAM process monitoring to System widget
- Implement get_top_cpu_process() and get_top_ram_process() functions in SystemCollector
- Add top_cpu_process and top_ram_process fields to SystemSummary data structure
- Update System widget to display top processes as description rows
- Show process name and percentage usage for highest CPU and RAM consumers
- Skip kernel threads and filter out processes with minimal usage (<0.1%)
2025-10-14 21:47:52 +02:00
77795c44d3 Implement nginx site status monitoring with unreachable detection
- Show 'unreachable' status for nginx sites that fail connection tests
- Set service status to error (red) for unreachable sites
- Display latency in milliseconds for responsive sites
- Properly count failed sites in service summary statistics
- Improve nginx site monitoring reliability and visibility
2025-10-14 20:19:39 +02:00
fd8aa0678e Implement nginx site latency monitoring and improve disk usage display
Agent improvements:
- Add reqwest dependency for HTTP latency testing
- Implement measure_site_latency() function for nginx sites
- Add latency_ms field to ServiceData structure
- Measure response times for nginx sites using HEAD requests
- Handle connection failures gracefully with 5-second timeout
- Use HTTPS for external sites, HTTP for localhost

Dashboard improvements:
- Add latency_ms field to ServiceInfo structure
- Display latency for nginx sites: "docker.cmtec.se 134ms"
- Only show latency for nginx sub-services, not other services
- Change disk usage "0" to "<1MB" for better readability

The Services widget now shows:
- Nginx sites with response times when measurable
- Cleaner disk usage formatting for small values
- Improved user experience with meaningful latency data
2025-10-14 19:38:36 +02:00
c6e8749ddd Implement logged-in users monitoring and improve widget formatting
Agent improvements:
- Add get_logged_in_users() function to SystemCollector using 'who' command
- Collect unique, sorted list of currently logged-in users
- Include logged_in_users field in system metrics JSON output
- Change C-state formatting to show 2 states per row instead of 4

Dashboard improvements:
- Update Backups widget to show "Archives: XX, ..." format
- System widget ready to display logged-in users with proper formatting

The System widget will now show:
- C-states formatted as 2 per row for better readability
- Logged-in users displayed as "Logged in: user" or "Logged in: X users (user1, user2)"
2025-10-14 19:23:26 +02:00
1ee398e648 Improve widget formatting and add logged-in users support
Services widget:
- Fix disk quota formatting with proper rounding instead of truncation
- Remove decimals from RAM quotas and use GB instead of G
- Change quota display to use GB consistently

Backups widget:
- Change GiB to GB for consistency
- Remove spaces between numbers and units
- Update disk usage format to match other widgets: used (totalGB)
- Remove percentage display for cleaner format

System widget:
- Add support for logged-in users in description lines
- Format C-states with "C-State:" prefix on first line, indent subsequent lines
- Add logged_in_users field to SystemSummary data structure

Documentation:
- Add example hash error output to NixOS update instructions
2025-10-14 18:59:31 +02:00
3e5e91f078 Remove SB column and improve widget formatting
Services widget:
- Remove SB (sandbox) column and related formatting function
- Fix quota formatting to show decimals when needed (1.5G not 1G)
- Remove spaces in unit display (128MB not 128 MB)

Storage widget:
- Change usage format to 23GB (932GB) for better readability

Documentation:
- Add NixOS configuration update process to CLAUDE.md
2025-10-14 18:40:12 +02:00
b0d3d85fb9 Improve services widget column headers and value formatting
- Update column headers to be more concise: RAM (GB) → RAM, CPU (%) → CPU, Disk (GB) → Disk
- Change sandbox column "no(ok)" to "-" for excluded services
- Implement smart unit formatting for memory and disk values (kB/MB/GB)
- Display quotas as (XG) format without decimals when limits exist
- Add format_bytes() helper for consistent unit display across metrics
2025-10-14 18:21:45 +02:00
c6d5a3f2a5 Add sandbox exclusion list for system services
Implement exclusion list for services that don't require sandboxing due
to their nature (SSH, Docker, system services). These services now show
"no(ok)" in SB column and maintain green status instead of warning.

Changes:
- Add is_sandbox_excluded field to ServiceData and ServiceInfo structs
- Add is_sandbox_excluded() method with system service exclusions:
  - sshd/ssh (needs system access for auth/shell)
  - docker (needs broad system access)
  - systemd services, dbus, NetworkManager, etc.
- Update status determination to accept excluded services as ok
- Update format_sandbox_value to show "no(ok)" for excluded services
- Update all ServiceData constructors with exclusion field

Service status logic:
- Sandboxed: Status=Running, SB="yes"
- Excluded: Status=Running, SB="no(ok)"
- Should be sandboxed but isn't: Status=Degraded, SB="no"

This provides clear distinction between services that legitimately don't
need sandboxing vs. those requiring security attention.
2025-10-14 11:35:42 +02:00
4fa2b079f1 Add sandbox column and security-based service status
Add new "SB" column to services widget showing systemd sandboxing status.
Service status now reflects security posture with unsandboxed services
showing as degraded/warning status.

Changes:
- Add is_sandboxed field to ServiceData and ServiceInfo structs
- Add check_service_sandbox method detecting systemd hardening features
- Add format_sandbox_value function showing "yes"/"no" for sandboxing
- Update service status determination to consider sandbox status:
  - Sandboxed + Running = "Running" (green/ok)
  - Unsandboxed + Running = "Degraded" (yellow/warning)
  - Failed services = "Stopped" (red/critical)
- Add "SB" column header to services widget

Services without proper NixOS hardening (PrivateTmp, ProtectSystem, etc.)
now show warning status to highlight security concerns.
2025-10-14 11:18:07 +02:00
17dda1ae67 Implement proper disk and memory quota detection
Replace misleading system total quotas with actual service-specific
quota detection. Services now only show quotas when real limits exist.

Changes:
- Add get_service_disk_quota method with filesystem quota detection
- Add check_filesystem_quota and docker storage quota helpers
- Remove automatic assignment of system totals as fake quotas
- Update dashboard formatting to show usage only when no quota exists

Display behavior:
- Services with real limits: "2.1/8.0" (usage/quota)
- Services without limits: "2.1" (usage only)

This provides accurate monitoring instead of misleading system capacity
values that suggested all services had massive quotas.
2025-10-14 11:01:04 +02:00
8de3d2ba79 Clean up services widget column headers and units
Standardize all services widget columns to show units in headers
and remove units from metric values for cleaner display.

Changes:
- Update column headers: "RAM (GB)", "CPU (%)", "Disk (GB)"
- Remove units from metric values:
  - RAM: "5.2/32.0" (no GB)
  - CPU: "2.5" (no %)
  - Disk: "1.5/500.0" (no GB)
- Simplify disk formatting to always show GB format

All columns now consistently display units in headers with
clean, uncluttered metric values.
2025-10-14 10:36:38 +02:00
4be1223a8d Update services widget memory display and formatting
Improve services widget to show consistent usage/total format for both
RAM and Disk columns, using system totals when no service quotas exist.

Changes:
- Change column header from "Memory (GB)" to "RAM (GB)"
- Remove "GB" units from memory values (units now in header)
- Add system memory total detection from /proc/meminfo
- Use system memory total as default quota for services without limits
- Services now show "5.2/32.0" format for both RAM and disk

Both RAM and Disk columns now consistently display usage/quota format
where quota is either service-specific limit or system total capacity.
2025-10-14 10:23:16 +02:00
630d2ff674 Add disk quota display to services widget
Implement disk quota/total display in services widget showing usage/quota
format. When services don't have specific disk quotas configured, use
system total disk capacity as the quota value.

Changes:
- Add disk_quota_gb field to ServiceData struct in agent
- Add disk_quota_gb field to ServiceInfo struct in dashboard
- Update format_disk_value to show usage/quota format
- Use system disk total capacity as default quota for services
- Rename DiskUsage.total_gb to total_capacity_gb for clarity

Services will now display disk usage as "5.2/500.0 GB" format where
500.0 GB is either the service's specific quota or system total capacity.
2025-10-14 10:14:24 +02:00
dca3642e46 Implement multi-host autoconnect with consolidated host configuration
- Add DEFAULT_HOSTS constant in config.rs for centralized host management
- Update ZMQ endpoint generation to connect to all configured hosts
- Implement graceful connection handling for unreachable endpoints
- Dashboard now auto-discovers and connects to available agents on cmbox, labbox, simonbox, steambox, srv01
2025-10-14 00:44:38 +02:00
c8655bf852 Update dashboard ZMQ endpoints to use hostnames for all CMTEC hosts 2025-10-13 23:39:46 +02:00
cd4764596f Implement comprehensive dashboard improvements and maintenance mode
- Storage widget: Restructure with Name/Temp/Wear/Usage columns, SMART details as descriptions
- Host navigation: Only cycle through connected hosts, no disconnected hosts
- Auto-discovery: Skip config files, use predefined CMTEC host list
- Maintenance mode: Suppress notifications during backup via /tmp/cm-maintenance file
- CPU thresholds: Update to warning ≥9.0, critical ≥10.0 for production use
- Agent-dashboard separation: Agent provides descriptions, dashboard displays only
2025-10-13 11:18:23 +02:00
bb69f0f31b Testing 2025-10-13 10:23:42 +02:00
42aaebf6a7 Testing 2025-10-13 09:57:43 +02:00
d76302e1c4 Testing 2025-10-13 08:45:12 +02:00
859df2dec1 Testing 2025-10-13 08:38:57 +02:00
5e8a0ce108 Testing 2025-10-13 08:31:18 +02:00
bab387c74d Refactor services widget with unified system metrics display
- Rename alerts widget to hosts widget for clarity
- Add sub_service field to ServiceInfo for display differentiation
- Integrate system metrics (CPU load, memory, temperature, disk) as service rows
- Convert nginx sites to individual sub-service rows with tree structure
- Remove nginx site checkmarks - status now shown via row indicators
- Update dashboard layout to display system and service data together
- Maintain description lines for connection counts and service details

Services widget now shows:
- System metrics as regular service rows with status
- Nginx sites as sub-services with ├─/└─ tree formatting
- Regular services with full resource data and descriptions
- Unified status indication across all row types
2025-10-13 08:10:38 +02:00
57b676ad25 Testing 2025-10-13 00:16:24 +02:00
9e344fb66d Testing 2025-10-12 22:31:46 +02:00
59bc3adad5 Testing 2025-10-12 19:57:05 +02:00
9c836e0862 Testing 2025-10-12 19:32:47 +02:00
c312916687 Testing 2025-10-12 18:55:15 +02:00
75910610e4 Testing 2025-10-12 18:39:03 +02:00
0656af17f2 Testing 2025-10-12 18:31:44 +02:00
b226217ba4 Testing 2025-10-12 18:03:32 +02:00
49aee702f2 Testing 2025-10-12 17:29:29 +02:00
088c42e55a Testing 2025-10-12 17:25:15 +02:00
bd6c14c8c1 Testing 2025-10-12 16:01:56 +02:00
656d410a7a Testing 2025-10-12 14:58:10 +02:00
2239badc8a Testing 2025-10-12 14:53:27 +02:00
2581435b10 Implement per-service disk usage monitoring
Replaced system-wide disk usage with accurate per-service tracking by scanning
service-specific directories. Services like sshd now correctly show minimal
disk usage instead of misleading system totals.

- Rename storage widget and add drive capacity/usage columns
- Move host display to main dashboard title for cleaner layout
- Replace separate alert displays with color-coded row highlighting
- Add per-service disk usage collection using du command
- Update services widget formatting to handle small disk values
- Restructure into workspace with dedicated agent and dashboard packages
2025-10-11 22:59:16 +02:00
82afe3d4f1 Restructure into workspace with dashboard and agent 2025-10-11 14:19:05 +02:00