63 Commits

Author SHA1 Message Date
28cfd5758f Fix service metrics not showing - remove cache check
The service_status_cache from discovery only has active_state with
all detailed metrics set to None. During collection, get_service_status()
was returning cached data instead of fetching fresh systemctl show data.

Now always fetch fresh data to populate memory_bytes, restart_count,
and uptime_seconds properly.
2025-11-28 23:15:51 +01:00
0e01813ff5 Add service metrics from systemctl (memory, uptime, restarts)
Shared:
- Add memory_bytes, restart_count, uptime_seconds to ServiceData

Agent:
- Add new fields to ServiceStatusInfo struct
- Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show
- Calculate uptime from start timestamp
- Parse and populate new fields in ServiceData
- Remove unused load_state and sub_state fields

Dashboard:
- Add memory_bytes, restart_count, uptime_seconds to ServiceInfo
- Update header: Service, Status, RAM, Uptime, ↻ (restarts)
- Format memory as MB/GB
- Format uptime as Xd Xh, Xh Xm, or Xm
- Show restart count with ! prefix if > 0 to indicate instability

All metrics obtained from single systemctl show call - zero overhead.
2025-11-28 23:06:13 +01:00
67b686f8c7 Remove RAM and disk collection for services
Complete removal of service resource metrics:

Agent:
- Remove memory_mb and disk_gb fields from ServiceData struct
- Remove get_service_memory_usage() method
- Remove get_service_disk_usage() method
- Remove get_directory_size() method
- Remove unused warn import

Dashboard:
- Remove memory_mb and disk_gb from ServiceInfo struct
- Remove memory/disk display from format_parent_service_line
- Remove memory/disk parsing in legacy metric path
- Remove unused format_disk_size() function

Service resource metrics were slow, unreliable, and never worked
properly since structured data migration. Will be handled differently
in the future.
2025-11-28 14:25:12 +01:00
2f94a4b853 Add service_type field to separate data from presentation
Changes:
- Add service_type field to SubServiceData: 'nginx_site', 'container', 'image'
- Agent sends pure data without display formatting
- Dashboard checks service_type to decide presentation
- Docker images now display without status icon (service_type='image')
- Remove unused image_size_str from docker images tuple

Clean separation: agent provides data, dashboard handles display logic.
2025-11-27 18:09:20 +01:00
fac0188c6f Change docker image display format and status
Changes:
- Rename docker images from 'image_node:18...' to 'I node:18...' for conciseness
- Change image status from 'active' to 'inactive' for neutral informational display
- Images now show with gray empty circle ○ instead of green filled circle ●

Docker images are static artifacts without meaningful operational status, so using inactive status provides neutral gray display that won't trigger alerts or affect service status aggregation.
2025-11-27 17:57:24 +01:00
374b126446 Reduce all command timeouts to 2-3 seconds max
With 10-second host heartbeat timeout, all command timeouts must be significantly lower to ensure total collection time stays under 10 seconds.

Changed timeouts:
- smartctl: 10s → 3s (critical: multiple drives queried sequentially)
- du: 5s → 2s
- lsblk: 5s → 2s
- systemctl list commands: 5s → 3s
- systemctl show/is-active: 3s → 2s
- docker commands: 5s → 3s
- df, ip commands: 3s → 2s

Total worst-case collection time now capped at more reasonable levels, preventing false host offline alerts from blocking operations.
2025-11-27 16:38:54 +01:00
1e0510be81 Add comprehensive timeouts to all blocking system commands
Fixes random host disconnections caused by blocking operations preventing timely ZMQ packet transmission.

Changes:
- Add run_command_with_timeout() wrapper using tokio for async command execution
- Apply 10s timeout to smartctl (prevents 30+ second hangs on failing drives)
- Apply 5s timeout to du, lsblk, systemctl list commands
- Apply 3s timeout to systemctl show/is-active, df, ip commands
- Apply 2s timeout to hostname command
- Use system 'timeout' command for sync operations where async not needed

Critical fixes:
- smartctl: Failing drives could block for 30+ seconds per drive
- du: Large directories (Docker, PostgreSQL) could block 10-30+ seconds
- systemctl/docker: Commands could block indefinitely during system issues

With 1-second collection interval and 10-second heartbeat timeout, any blocking operation >10s causes false "host offline" alerts. These timeouts ensure collection completes quickly even during system degradation.
2025-11-27 16:34:08 +01:00
6d6beb207d Parse Docker image sizes to MB and sort services alphabetically
All checks were successful
Build and Release / build-and-release (push) Successful in 1m18s
2025-11-27 15:57:38 +01:00
5dd8cadef3 Remove debug logging from Docker collection code
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
2025-11-27 12:50:20 +01:00
fefe30ec51 Remove sudo from docker commands - use docker group membership instead
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
Agent changes:
- Changed docker ps and docker images commands to run without sudo
- cm-agent user is already in docker group, so sudo is not needed
- Fixes "unable to change to root gid: Operation not permitted" error
- Systemd security restrictions were blocking sudo gid changes

This fixes Docker container and image collection on systems with
systemd security hardening enabled.

Updated to version 0.1.178
2025-11-27 12:35:38 +01:00
fb40cce748 Add stderr logging for Docker images command failure
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent changes:
- Log stderr output when docker images command fails
- This will show the actual error message (e.g., permission denied, docker not found)
- Helps diagnose why docker images collection is failing

Updated to version 0.1.177
2025-11-27 12:28:55 +01:00
eaa057b284 Change Docker collection logging from debug to info level
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Agent changes:
- Changed debug!() to info!() for Docker collection logs
- This allows logs to show with default RUST_LOG=info setting
- Added info import to tracing use statement

Now logs will be visible in journalctl without needing to change log level:
- "Collecting Docker sub-services for service: docker"
- "Found X Docker containers"
- "Found X Docker images"
- "Total Docker sub-services added: X"

Updated to version 0.1.176
2025-11-27 12:18:17 +01:00
f23a1b5cec Add debug logging for Docker container and image collection
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Agent changes:
- Added debug logging to Docker images collection function
- Log when Docker sub-services are being collected for a service
- Log count of containers and images found
- Log total sub-services added
- Show command failure details instead of silently returning empty vec

This will help diagnose why Docker images aren't showing up as sub-services
on some hosts. The logs will show if the docker commands are failing or if
the collection is working but data isn't being transmitted properly.

Updated to version 0.1.175
2025-11-27 12:04:51 +01:00
3f98f68b51 Show Docker images as sub-services under docker service
All checks were successful
Build and Release / build-and-release (push) Successful in 1m23s
Agent changes:
- Added get_docker_images() function to list all Docker images
- Use docker images to show stored images with repository:tag and size
- Display images as sub-services under docker service with size in parentheses
- Skip dangling images (<none>:<none>)
- Images shown with active status (always present when listed)

Example display:
● docker                      active     139M     1MB
  ├─ ● docker_gitea           active
  ├─ ○ docker_old-app         inactive
  ├─ ● image_nginx:latest     (142MB)
  ├─ ● image_postgres:15      (379MB)
  └─ ● image_gitea:latest     (256MB)

Updated to version 0.1.174
2025-11-27 11:43:35 +01:00
3d38a7a984 Show all Docker containers as sub-services with active/inactive status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Agent changes:
- Use docker ps -a to show ALL containers (running and stopped)
- Map container status: Up -> active, Exited/Created -> inactive, other -> failed
- Display Docker containers as sub-services under the docker service
- Each container shown with proper status indicator

Example display:
● docker                 active     139M     1MB
  ├─ ● docker_gitea      active
  ├─ ○ docker_old-app    inactive
  └─ ● docker_immich     active

Updated to version 0.1.173
2025-11-27 10:56:15 +01:00
b0ee0242bd Show all Docker containers as top-level services with active/inactive status
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
Agent changes:
- Changed docker ps to docker ps -a to show ALL containers (running and stopped)
- Map container status: Up -> active, Exited/Created -> inactive, other -> failed
- Display Docker containers as individual top-level services instead of sub-services
- Each container shown as "docker_{container_name}" in service list

This provides better visibility of all containers and their status directly in the
services panel, making it easier to see stopped containers at a glance.

Updated to version 0.1.172
2025-11-27 10:51:47 +01:00
3858309a5d Fix Docker container detection with sudo permissions
Some checks failed
Build and Release / build-and-release (push) Failing after 1m19s
Update systemd collector to use sudo for docker ps command to resolve
permission issues when cm-agent user lacks docker group membership.
This ensures Docker containers are properly discovered and displayed
as sub-services under the docker service.

Version: 0.1.160
2025-11-25 12:40:27 +01:00
df104bf940 Remove debug prints and unused code
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Remove all debug println statements
- Remove unused service_tracker module
- Remove unused struct fields and methods
- Remove empty placeholder files (cpu.rs, memory.rs, defaults.rs)
- Fix all compiler warnings
- Clean build with zero warnings

Version bump to 0.1.159
2025-11-25 12:19:04 +01:00
d89b3ac881 Fix nginx sub-services persistent caching with complete service data storage
All checks were successful
Build and Release / build-and-release (push) Successful in 1m17s
Resolves nginx sites appearing only briefly during collection cycles by implementing proper caching of complete service data including sub-services.

Changes:
- Add cached_service_data field to store complete ServiceData with sub-services
- Modify collection logic to cache full service objects instead of basic ServiceInfo
- Update cache retrieval to use complete cached data preserving nginx site metrics
- Eliminate flickering of nginx sites between collection cycles

Version bump to v0.1.148
2025-11-24 23:24:00 +01:00
7f26991609 Fix nginx sub-services flickering with persistent caching
All checks were successful
Build and Release / build-and-release (push) Successful in 1m19s
- Remove nginx_ prefix from site names in hierarchical structure
- Fix get_nginx_site_metrics to call correct internal method
- Implement same caching functionality as old working version
- Sites now stay visible continuously with 30s latency updates
- Preserve cached results between refresh cycles
2025-11-24 23:01:51 +01:00
75ec190b93 Fix service status icon mismatch with single source of truth architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
- Remove duplicate status string fields from ServiceData and SubServiceData
- Use only Status enum as single source of truth for service status
- Agent calculates Status enum using calculate_service_status()
- Dashboard converts Status enum to display text for UI
- Implement flexible metrics system for sub-services with label/value/unit
- Fix status icon/text mismatches (inactive services now show gray circles)
- Ensure perfect alignment between service icons and status text
2025-11-24 22:43:22 +01:00
eb892096d9 Complete systemd collector restoration matching original architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 2m8s
- Add nginx site metrics caching with configurable intervals matching original
- Implement complex nginx config parsing with brace counting and redirect detection
- Replace curl with reqwest HTTP client for proper timeout and redirect handling
- Fix docker container parsing to use comma format with proper status mapping
- Add sudo to directory size command for permission handling
- Change nginx URLs to use https protocol matching original
- Add advanced NixOS ExecStart parsing for argv[] format support
- Add nginx -T fallback functionality for config discovery
- Implement proper server block parsing with domain validation and brace tracking
- Add get_service_memory function matching original signature

All functionality now matches pre-refactor implementation architecture.
2025-11-24 22:02:15 +01:00
c006625a3f Restore complete systemd collector functionality
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Enhanced directory size logic with minimum 0.001GB visibility and permission error logging
- Added nginx site monitoring with latency checks and NixOS config discovery
- Added docker container monitoring as sub-services
- Integrated sub-service collection for active nginx and docker services
- All missing features from original implementation now restored
2025-11-24 21:51:42 +01:00
9357e5f2a8 Properly restore systemd collector with original architecture
Some checks failed
Build and Release / build-and-release (push) Failing after 1m16s
- Restore service discovery caching with configurable intervals
- Add excluded services filtering logic
- Implement complete wildcard pattern matching (*prefix, suffix*, glob)
- Add ServiceStatusInfo caching from systemctl commands
- Restore cached service status retrieval to avoid repeated systemctl calls
- Add proper systemctl command error handling

All functionality now matches pre-refactor implementation.
2025-11-24 21:36:15 +01:00
b120f95f8a Restore service discovery and disk usage calculation
Some checks failed
Build and Release / build-and-release (push) Failing after 1m2s
Fixes missing services and 0B disk usage issues by restoring:
- Wildcard pattern matching for service filters (gitea*, redis*)
- Service disk usage calculation from directories and WorkingDirectory
- Proper Status::Inactive for inactive services

Services now properly discovered and show actual disk usage.
2025-11-24 20:25:08 +01:00
2b2cb2da3e Complete atomic migration to structured data architecture
All checks were successful
Build and Release / build-and-release (push) Successful in 1m7s
Implements clean structured data collection eliminating all string metric
parsing bugs. Collectors now populate AgentData directly with type-safe
field access.

Key improvements:
- Mount points preserved correctly (/ and /boot instead of root/boot)
- Tmpfs discovery added to memory collector
- Temperature data flows as typed f32 fields
- Zero string parsing overhead
- Complete removal of MetricCollectionManager bridge
- Direct ZMQ transmission of structured JSON

All functionality maintained: service tracking, notifications, status
evaluation, and multi-host monitoring.
2025-11-24 18:53:31 +01:00
c0f7a97a6f Remove all scrolling code and user-stopped tracking logic
All checks were successful
Build and Release / build-and-release (push) Successful in 2m36s
- Remove scroll offset fields from HostWidgets struct
- Replace scrolling with simple "X more below" indicators in all widgets
- Remove user-stopped service tracking from agent (now uses SSH control)
- Inactive services now consistently show Status::Inactive with empty circles
- Simplify widget render methods by removing scroll parameters
- Clean up unused imports and legacy scrolling infrastructure
- Fix journalctl command to use -fu for proper log following
2025-11-19 08:32:42 +01:00
d11aa11f99 Add Status::Inactive for inactive services with empty circle display
All checks were successful
Build and Release / build-and-release (push) Successful in 1m12s
- Add new Status::Inactive variant to enum for better service state representation
- Agent now assigns Status::Inactive instead of Status::Warning for inactive services
- Dashboard displays inactive services with empty circle (○) icon in gray color
- User-stopped services still show as Status::Ok with green filled circle
- Inactive services treated as OK for host status aggregation
- Improves visual clarity between active (●), inactive (○), and warning (◐) states
2025-11-18 17:54:51 +01:00
bd20f0cae1 Fix user-stopped flag timing and service transition handling
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Correct user-stopped service behavior during startup transitions:

User-Stopped Flag Timing Fix:
- Clear user-stopped flag only when service actually becomes active, not when start command succeeds
- Remove premature flag clearing from service control handler
- Add automatic flag clearing when service status metrics show active state
- Services retain user-stopped status during activating/transitioning states

Service Transition Handling:
- User-stopped services in activating state now report Status::OK instead of Status::Pending
- Prevents host warnings during legitimate service startup transitions
- Maintains accurate status reporting throughout service lifecycle
- Failed service starts preserve user-stopped flags correctly

Journalctl Popup Fix:
- Fix terminal corruption when using J key for service logs
- Correct command quoting to prevent tmux popup interference
- Stable popup display without dashboard interface corruption

Result: Clean service startup experience with no false warnings and proper
user-stopped tracking throughout the entire service lifecycle.

Bump version to v0.1.47
2025-10-30 12:05:54 +01:00
c56e9d7be2 Implement user-stopped service tracking system
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
Add comprehensive tracking for services stopped via dashboard to prevent
false alerts when users intentionally stop services.

Features:
- User-stopped services report Status::Ok instead of Warning
- Persistent storage survives agent restarts
- Dashboard sends UserStart/UserStop commands
- Agent tracks and syncs user-stopped state globally
- Systemd collector respects user-stopped flags

Implementation:
- New service_tracker module with persistent JSON storage
- Enhanced ServiceAction enum with UserStart/UserStop variants
- Global singleton tracker accessible by collectors
- Service status logic updated to check user-stopped flag
- Dashboard version now uses CARGO_PKG_VERSION automatically

Bump version to v0.1.43
2025-10-30 10:42:56 +01:00
6509a2b91a Make nginx site latency thresholds configurable and simplify status logic
All checks were successful
Build and Release / build-and-release (push) Successful in 4m25s
- Replace hardcoded 500ms/2000ms thresholds with configurable nginx_latency_critical_ms
- Simplify status logic to only OK or Critical (no Warning status)
- Add validation for nginx latency threshold configuration
- Re-enable nginx site collection with configurable thresholds
- Resolves issue where sites showed critical at 2000ms despite 30s timeout setting
- Bump version to v0.1.38
2025-10-28 21:24:34 +01:00
e890c5e810 Fix service status detection with combined discovery and status approach
All checks were successful
Build and Release / build-and-release (push) Successful in 2m9s
Enhanced service discovery to properly show status for all services:

Changes:
- Use systemctl list-unit-files for complete service discovery (finds all services)
- Use systemctl list-units --all for batch runtime status fetching
- Combine both datasets to get comprehensive service list with correct status
- Services found in unit-files but not runtime are marked as inactive (Warning status)
- Eliminates 'unknown' status issue while maintaining complete service visibility

Now inactive services show as Warning (yellow ◐) and active services show as Ok (green ●)
instead of all services showing as unknown (? icon).
2025-10-28 15:56:47 +01:00
078c30a592 Fix service discovery to show all configured services regardless of state
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
Changed service discovery from 'systemctl list-units --all' to 'systemctl list-unit-files'
to ensure ALL service unit files are discovered, including services that have never been started.

Changes:
- Updated systemctl command to use list-unit-files instead of list-units --all
- Modified parsing logic to handle unit file format (2 fields vs 4 fields)
- Set placeholder values in discovery cache, actual runtime status fetched during collection
- This ensures all configured services (like inactive ARK servers) appear in dashboard

The issue was that list-units --all only shows services systemd has loaded/attempted to load,
but list-unit-files shows ALL service unit files regardless of their runtime state.
2025-10-28 15:41:58 +01:00
627c533724 Update to v0.1.18 with per-collector intervals and tmux check
All checks were successful
Build and Release / build-and-release (push) Successful in 2m7s
- Implement per-collector interval timing respecting NixOS config
- Remove all hardcoded timeout/interval values and make configurable
- Add tmux session requirement check for TUI mode (bypassed for headless)
- Update agent to send config hash in Build field instead of nixos version
- Add nginx check interval, HTTP timeouts, and ZMQ transmission interval configs
- Update NixOS configuration with new configurable values

Breaking changes:
- Build field now shows nix store config hash (8 chars) instead of nixos version
- All intervals now follow individual collector configuration instead of global

New configuration fields:
- systemd.nginx_check_interval_seconds
- systemd.http_timeout_seconds
- systemd.http_connect_timeout_seconds
- zmq.transmission_interval_seconds
2025-10-28 10:08:25 +01:00
4b54a59e35 Remove unused code and eliminate compiler warnings
- Remove unused fields from CommandStatus variants
- Clean up unused methods and unused collector fields
- Fix lifetime syntax warning in SystemWidget
- Delete unused cache module completely
- Remove redundant render methods from widgets

All agent and dashboard warnings eliminated while preserving
panel switching and scrolling functionality.
2025-10-25 14:15:52 +02:00
c99e0bd8ee Remove hardcoded discovery interval in systemd collector
- Use config.interval_seconds instead of hardcoded 300 seconds
- Discovery now happens every 10 seconds (configurable) instead of 5 minutes
- Follows configuration-driven architecture requirements
2025-10-23 13:20:48 +02:00
0f12438ab4 Fix RwLock deadlock in systemd collector Phase 4
- Restructure get_monitored_services to avoid nested write locks
- Split discover_services into discover_services_internal that returns data
- Update state in separate scope to prevent deadlock
- Fix borrow checker errors with clone() for status cache
2025-10-23 13:12:53 +02:00
7607e971b8 Add debug logging to diagnose Phase 4 service discovery issue
Add detailed debug logging to track:
- Service discovery start
- Individual service parsing
- Final service count and list
- Empty results indication

This will help identify why cmbox disappeared from dashboard.
2025-10-23 12:57:10 +02:00
da6f3c3855 Phase 4: Cache service status from discovery to eliminate per-service calls
Major performance optimization:
- Parse and cache service status during discovery from systemctl list-units
- Eliminate per-service systemctl is-active and show calls
- Reduce systemctl calls from 1+2N to just 1 call total
- For 10 services: 21 calls → 1 call (95% reduction)
- Add fallback to systemctl for cache misses

This completes the major systemctl call reduction goal from TODO.md.
2025-10-23 12:51:17 +02:00
174b27f31a Phase 3: Add wildcard support for service pattern matching
Implement glob pattern matching for service filters:
- nginx* matches nginx, nginx-config-reload, etc.
- *backup matches any service ending with 'backup'
- docker*prune matches docker-weekly-prune, etc.
- Exact matches still work as before (backward compatible)

Addresses TODO.md requirement for '*' filtering support.
2025-10-23 12:37:16 +02:00
dc11538ae9 Phase 2b: Optimize to single systemctl command
Reduce from 2 systemctl commands to 1 by using only:
systemctl list-units --type=service --all

This captures all services (active, inactive, failed) in one call,
eliminating the redundant list-unit-files command.
Achieves the TODO.md goal of reducing systemctl calls.
2025-10-23 12:34:54 +02:00
9133e18090 Phase 2: Remove user service collection logic
Remove all sudo -u systemctl commands and user service processing.
Now only collects system services via systemctl list-units/list-unit-files.
Eliminates user service discovery completely as planned in TODO.md.
2025-10-23 12:32:19 +02:00
616fad2c5d Phase 1: Implement exact name filtering for service matching
Change service matching logic from contains-based to exact equality.
Services now match only if service_name == pattern exactly.
This is the first step in the systemd collector optimization plan.
2025-10-23 12:22:26 +02:00
08d3454683 Enhance disk collector with individual drive health monitoring
- Add StoragePool and DriveInfo structures for grouping drives by mount point
- Implement SMART data collection for individual drives (health, temperature, wear)
- Support for ext4, zfs, xfs, mergerfs, btrfs filesystem types
- Generate individual drive metrics: disk_[pool]_[drive]_health/temperature/wear
- Add storage_type and underlying_devices to filesystem configuration
- Move hardcoded service directory mappings to NixOS configuration
- Move hardcoded host-to-user mapping to NixOS configuration
- Remove all unused code and fix compilation warnings
- Clean implementation with zero warnings and no dead code

Individual drives now show health status per storage pool:
Storage root (ext4): nvme0n1 PASSED 42°C 5% wear
Storage steampool (mergerfs): sda/sdb/sdc with individual health data
2025-10-22 19:59:25 +02:00
34822bd835 Fix systemd collector to use Status::Pending for transitional states 2025-10-21 19:08:58 +02:00
a937032eb1 Remove hardcoded defaults, require configuration file
- Remove all Default implementations from agent configuration structs
- Make configuration file required for agent startup
- Update NixOS module to generate complete agent.toml configuration
- Add comprehensive configuration options to NixOS module including:
  - Service include/exclude patterns for systemd collector
  - All thresholds and intervals
  - ZMQ communication settings
  - Notification and cache configuration
- Agent now fails fast if no configuration provided
- Eliminates configuration drift between defaults and NixOS settings
2025-10-21 00:01:26 +02:00
1e8da8c187 Add user service discovery to systemd collector
- Use systemctl --user commands to discover user-level services
- Include both user unit files and loaded user units
- Gracefully handle cases where user commands fail (no user session)
- Treat user services same as system services in filtering
- Enables monitoring of user-level Docker, development servers, etc.
2025-10-20 23:11:11 +02:00
1cc31ec26a Update service filters for better discovery
- Add ark-permissions to exclusion list (maintenance service)
- Add sunshine to service_name_filters (game streaming server)
- Improves service discovery for game streaming infrastructure
2025-10-20 23:01:03 +02:00
b580cfde8c Add more services to exclusion list
- Add docker-prune (cleanup services don't need monitoring)
- Add sshd-unix-local@ and sshd@ (SSH instance services)
- Add docker-registry-gar (Google Artifact Registry services)
- Keep main sshd service monitored while excluding per-connection instances
2025-10-20 22:51:15 +02:00
5886426dac Fix service discovery to detect all services regardless of state
- Use systemctl list-unit-files and list-units --all to find inactive services
- Parse both outputs to ensure all services are discovered
- Remove special SSH detection logic since sshd is in service filters
- Rename interesting_services to service_name_filters for clarity
- Now detects services in any state: active, inactive, failed, dead, etc.
2025-10-20 22:41:21 +02:00