Compare commits

..

15 Commits

Author SHA1 Message Date
5deb8cf8d8 Bump version to v0.1.206
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
2025-11-28 23:07:20 +01:00
0e01813ff5 Add service metrics from systemctl (memory, uptime, restarts)
Shared:
- Add memory_bytes, restart_count, uptime_seconds to ServiceData

Agent:
- Add new fields to ServiceStatusInfo struct
- Fetch MemoryCurrent, NRestarts, ExecMainStartTimestamp from systemctl show
- Calculate uptime from start timestamp
- Parse and populate new fields in ServiceData
- Remove unused load_state and sub_state fields

Dashboard:
- Add memory_bytes, restart_count, uptime_seconds to ServiceInfo
- Update header: Service, Status, RAM, Uptime, ↻ (restarts)
- Format memory as MB/GB
- Format uptime as Xd Xh, Xh Xm, or Xm
- Show restart count with ! prefix if > 0 to indicate instability

All metrics obtained from single systemctl show call - zero overhead.
2025-11-28 23:06:13 +01:00
c3c9507a42 Bump version to v0.1.205
All checks were successful
Build and Release / build-and-release (push) Successful in 1m22s
2025-11-28 22:37:28 +01:00
4d77ffe17e Remove RAM and Disk columns from services widget header
Changed header from 4 columns to 2 columns:
- Before: Service, Status, RAM, Disk
- After: Service, Status

Matches the removal of memory_mb and disk_gb fields.
2025-11-28 22:37:14 +01:00
14f74b4cac Bump version to v0.1.204
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
2025-11-28 14:33:19 +01:00
67b686f8c7 Remove RAM and disk collection for services
Complete removal of service resource metrics:

Agent:
- Remove memory_mb and disk_gb fields from ServiceData struct
- Remove get_service_memory_usage() method
- Remove get_service_disk_usage() method
- Remove get_directory_size() method
- Remove unused warn import

Dashboard:
- Remove memory_mb and disk_gb from ServiceInfo struct
- Remove memory/disk display from format_parent_service_line
- Remove memory/disk parsing in legacy metric path
- Remove unused format_disk_size() function

Service resource metrics were slow, unreliable, and never worked
properly since structured data migration. Will be handled differently
in the future.
2025-11-28 14:25:12 +01:00
e3996fdb84 Fix compilation errors from command receiver removal
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
- Remove AgentCommand import from agent.rs
- Remove handle_commands() method
- Remove command handling from main loop
- Remove command_port validation checks
2025-11-28 13:01:36 +01:00
f94ca60e69 Bump version to v0.1.203
Some checks failed
Build and Release / build-and-release (push) Failing after 1m36s
2025-11-28 12:53:56 +01:00
c19ff56df8 Remove unused ZMQ command receiver (port 6131)
Service control migrated to SSH, command receiver no longer needed.
- Remove command_receiver Socket from ZmqHandler
- Remove try_receive_command method
- Remove AgentCommand enum
- Remove command_port from ZmqConfig
2025-11-28 12:52:43 +01:00
fe2f604703 Bump version to v0.1.202
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
2025-11-28 12:45:25 +01:00
8bfd416327 Revert to v0.1.192 - fix agent hang issue
Some checks failed
Build and Release / build-and-release (push) Failing after 1m8s
2025-11-28 12:42:24 +01:00
85c6c624fb Revert D-Bus usage, use systemctl commands only
All checks were successful
Build and Release / build-and-release (push) Successful in 1m20s
- Remove zbus dependency from agent
- Replace D-Bus Connection calls with systemctl show commands
- Fix agent hang by eliminating blocking D-Bus operations
- get_unit_property now uses systemctl show with property flags
- Memory, disk usage, and nginx config queries use systemctl
- Simpler, more reliable service monitoring
2025-11-28 12:15:04 +01:00
eab3f17428 Fix agent hang by reverting service discovery to systemctl
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
The D-Bus ListUnits call in discover_services_internal() was causing
the agent to hang on startup.

**Root cause:**
- D-Bus ListUnits call with complex tuple destructuring hung indefinitely
- Agent never completed first collection cycle
- No collector output in logs

**Fix:**
- Revert discover_services_internal() to use systemctl list-units/list-unit-files
- Keep D-Bus-based property queries (WorkingDirectory, MemoryCurrent, ExecStart)
- Hybrid approach: systemctl for discovery, D-Bus for individual queries

**External commands still used:**
- systemctl list-units, list-unit-files (service discovery)
- smartctl (SMART data)
- sudo du (directory sizes)
- nginx -T (config fallback)

Version bump: 0.1.198 → 0.1.199
2025-11-28 11:57:31 +01:00
7ad149bbe4 Replace all systemctl commands with zbus D-Bus API
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Complete migration from systemctl subprocess calls to native D-Bus communication:

**Removed systemctl commands:**
- systemctl is-active (fallback) - use D-Bus cache from ListUnits
- systemctl show --property=LoadState,ActiveState,SubState - use D-Bus cache
- systemctl show --property=WorkingDirectory - use D-Bus Properties.Get
- systemctl show --property=MemoryCurrent - use D-Bus Properties.Get
- systemctl show nginx --property=ExecStart - use D-Bus Properties.Get

**Implementation details:**
- Added get_unit_property() helper for D-Bus property access
- Made get_nginx_site_metrics() async to support D-Bus calls
- Made get_nginx_sites_internal() async
- Made discover_nginx_sites() async
- Made get_nginx_config_from_systemd() async
- Fixed RwLock guard Send issues by using scoped locks

**Remaining external commands:**
- smartctl (disk.rs) - No Rust alternative for SMART data
- sudo du (systemd.rs) - Directory size measurement
- nginx -T (systemd.rs) - Nginx config fallback
- timeout hostname (nixos.rs) - Rare fallback only

Version bump: 0.1.197 → 0.1.198
2025-11-28 11:46:28 +01:00
b444c88ea0 Replace external commands with native Rust APIs
All checks were successful
Build and Release / build-and-release (push) Successful in 1m54s
Significant performance improvements by eliminating subprocess spawning:

- Replace 'ip' commands with rtnetlink for network interface discovery
- Replace 'docker ps/images' with bollard Docker API client
- Replace 'systemctl list-units' with zbus D-Bus for systemd interaction
- Replace 'df' with statvfs() syscall for filesystem statistics
- Replace 'lsblk' with /proc/mounts parsing

Add interval-based caching to collectors:
- DiskCollector now respects interval_seconds configuration
- SystemdCollector now respects interval_seconds configuration
- CpuCollector now respects interval_seconds configuration

Remove unused command communication infrastructure:
- Remove port 6131 ZMQ command receiver
- Clean up unused AgentCommand types

Dependencies added:
- rtnetlink = "0.14"
- netlink-packet-route = "0.19"
- bollard = "0.17"
- zbus = "4.0"
- nix (fs features for statvfs)
2025-11-28 11:27:33 +01:00
12 changed files with 165 additions and 374 deletions

View File

@@ -156,56 +156,6 @@ Complete migration from string-based metrics to structured JSON data. Eliminates
- ✅ Backward compatibility via bridge conversion to existing UI widgets - ✅ Backward compatibility via bridge conversion to existing UI widgets
- ✅ All string parsing bugs eliminated - ✅ All string parsing bugs eliminated
### Cached Collector Architecture (🚧 PLANNED)
**Problem:** Blocking collectors prevent timely ZMQ transmission, causing false "host offline" alerts.
**Previous (Sequential Blocking):**
```
Every 1 second:
└─ collect_all_data() [BLOCKS for 2-10+ seconds]
├─ CPU (fast: 10ms)
├─ Memory (fast: 20ms)
├─ Disk SMART (slow: 3s per drive × 4 drives = 12s)
├─ Service disk usage (slow: 2-8s per service)
└─ Docker (medium: 500ms)
└─ send_via_zmq() [Only after ALL collection completes]
Result: If any collector takes >10s → "host offline" false alert
```
**New (Cached Independent Collectors):**
```
Shared Cache: Arc<RwLock<AgentData>>
Background Collectors (independent async tasks):
├─ Fast collectors (CPU, RAM, Network)
│ └─ Update cache every 1 second
├─ Medium collectors (Services, Docker)
│ └─ Update cache every 5 seconds
└─ Slow collectors (Disk usage, SMART data)
└─ Update cache every 60 seconds
ZMQ Sender (separate async task):
Every 1 second:
└─ Read current cache
└─ Send via ZMQ [Always instant, never blocked]
```
**Benefits:**
- ✅ ZMQ sends every 1 second regardless of collector speed
- ✅ No false "host offline" alerts from slow collectors
- ✅ Different update rates for different metrics (CPU=1s, SMART=60s)
- ✅ System stays responsive even with slow operations
- ✅ Slow collectors can use longer timeouts without blocking
**Implementation:**
- Shared `AgentData` cache wrapped in `Arc<RwLock<>>`
- Each collector spawned as independent tokio task
- Collectors update their section of cache at their own rate
- ZMQ sender reads cache every 1s and transmits
- Stale data acceptable for slow-changing metrics (disk usage, SMART)
### Maintenance Mode ### Maintenance Mode
- Agent checks for `/tmp/cm-maintenance` file before sending notifications - Agent checks for `/tmp/cm-maintenance` file before sending notifications

57
Cargo.lock generated
View File

@@ -1,5 +1,6 @@
# This file is automatically @generated by Cargo. # This file is automatically @generated by Cargo.
# It is not intended for manual editing. # It is not intended for manual editing.
# This file is automatically generated by Cargo.
version = 4 version = 4
[[package]] [[package]]
@@ -164,9 +165,9 @@ checksum = "df8670b8c7b9dae1793364eafadf7239c40d669904660c5960d74cfd80b46a53"
[[package]] [[package]]
name = "cc" name = "cc"
version = "1.2.47" version = "1.2.46"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cd405d82c84ff7f35739f175f67d8b9fb7687a0e84ccdc78bd3568839827cf07" checksum = "b97463e1064cb1b1c1384ad0a0b9c8abd0988e2a91f52606c80ef14aadb63e36"
dependencies = [ dependencies = [
"find-msvc-tools", "find-msvc-tools",
"jobserver", "jobserver",
@@ -238,9 +239,9 @@ dependencies = [
[[package]] [[package]]
name = "clap" name = "clap"
version = "4.5.53" version = "4.5.52"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c9e340e012a1bf4935f5282ed1436d1489548e8f72308207ea5df0e23d2d03f8" checksum = "aa8120877db0e5c011242f96806ce3c94e0737ab8108532a76a3300a01db2ab8"
dependencies = [ dependencies = [
"clap_builder", "clap_builder",
"clap_derive", "clap_derive",
@@ -248,9 +249,9 @@ dependencies = [
[[package]] [[package]]
name = "clap_builder" name = "clap_builder"
version = "4.5.53" version = "4.5.52"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d76b5d13eaa18c901fd2f7fca939fefe3a0727a953561fefdf3b2922b8569d00" checksum = "02576b399397b659c26064fbc92a75fede9d18ffd5f80ca1cd74ddab167016e1"
dependencies = [ dependencies = [
"anstream", "anstream",
"anstyle", "anstyle",
@@ -278,7 +279,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]] [[package]]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.196" version = "0.1.205"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"chrono", "chrono",
@@ -300,7 +301,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.196" version = "0.1.205"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"async-trait", "async-trait",
@@ -323,7 +324,7 @@ dependencies = [
[[package]] [[package]]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.196" version = "0.1.205"
dependencies = [ dependencies = [
"chrono", "chrono",
"serde", "serde",
@@ -663,9 +664,9 @@ dependencies = [
[[package]] [[package]]
name = "hashbrown" name = "hashbrown"
version = "0.16.1" version = "0.16.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100" checksum = "5419bdc4f6a9207fbeba6d11b604d481addf78ecd10c11ad51e76c2f6482748d"
[[package]] [[package]]
name = "heck" name = "heck"
@@ -878,12 +879,12 @@ dependencies = [
[[package]] [[package]]
name = "indexmap" name = "indexmap"
version = "2.12.1" version = "2.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ad4bb2b565bca0645f4d68c5c9af97fba094e9791da685bf83cb5f3ce74acf2" checksum = "6717a8d2a5a929a1a2eb43a12812498ed141a0bcfb7e8f7844fbdbe4303bba9f"
dependencies = [ dependencies = [
"equivalent", "equivalent",
"hashbrown 0.16.1", "hashbrown 0.16.0",
] ]
[[package]] [[package]]
@@ -1638,9 +1639,9 @@ dependencies = [
[[package]] [[package]]
name = "signal-hook-registry" name = "signal-hook-registry"
version = "1.4.7" version = "1.4.6"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7664a098b8e616bdfcc2dc0e9ac44eb231eedf41db4e9fe95d8d32ec728dedad" checksum = "b2a4719bff48cee6b39d12c020eeb490953ad2443b7055bd0b21fca26bd8c28b"
dependencies = [ dependencies = [
"libc", "libc",
] ]
@@ -1732,9 +1733,9 @@ dependencies = [
[[package]] [[package]]
name = "syn" name = "syn"
version = "2.0.111" version = "2.0.110"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "390cc9a294ab71bdb1aa2e99d13be9c753cd2d7bd6560c77118597410c4d2e87" checksum = "a99801b5bd34ede4cf3fc688c5919368fea4e4814a4664359503e6015b280aea"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
@@ -1961,9 +1962,9 @@ dependencies = [
[[package]] [[package]]
name = "tracing-attributes" name = "tracing-attributes"
version = "0.1.31" version = "0.1.30"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7490cfa5ec963746568740651ac6781f701c9c5ea257c58e057f3ba8cf69e8da" checksum = "81383ab64e72a7a8b8e13130c49e3dab29def6d0c7d76a03087b3cf71c5c6903"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
@@ -1972,9 +1973,9 @@ dependencies = [
[[package]] [[package]]
name = "tracing-core" name = "tracing-core"
version = "0.1.35" version = "0.1.34"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a04e24fab5c89c6a36eb8558c9656f30d81de51dfa4d3b45f26b21d61fa0a6c" checksum = "b9d12581f227e93f094d3af2ae690a574abb8a2b9b7a96e7cfe9647b2b617678"
dependencies = [ dependencies = [
"once_cell", "once_cell",
"valuable", "valuable",
@@ -2512,9 +2513,9 @@ checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650"
[[package]] [[package]]
name = "winnow" name = "winnow"
version = "0.7.14" version = "0.7.13"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a5364e9d77fcdeeaa6062ced926ee3381faa2ee02d3eb83a5c27a8825540829" checksum = "21a0236b59786fed61e2a80582dd500fe61f18b5dca67a4a067d0bc9039339cf"
dependencies = [ dependencies = [
"memchr", "memchr",
] ]
@@ -2566,18 +2567,18 @@ dependencies = [
[[package]] [[package]]
name = "zerocopy" name = "zerocopy"
version = "0.8.30" version = "0.8.27"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4ea879c944afe8a2b25fef16bb4ba234f47c694565e97383b36f3a878219065c" checksum = "0894878a5fa3edfd6da3f88c4805f4c8558e2b996227a3d864f47fe11e38282c"
dependencies = [ dependencies = [
"zerocopy-derive", "zerocopy-derive",
] ]
[[package]] [[package]]
name = "zerocopy-derive" name = "zerocopy-derive"
version = "0.8.30" version = "0.8.27"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cf955aa904d6040f70dc8e9384444cb1030aed272ba3cb09bbc4ab9e7c1f34f5" checksum = "88d2b8d9c68ad2b9e4340d7832716a4d21a22a1154777ad56ea55c51a9cf3831"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-agent" name = "cm-dashboard-agent"
version = "0.1.196" version = "0.1.206"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -4,7 +4,7 @@ use std::time::Duration;
use tokio::time::interval; use tokio::time::interval;
use tracing::{debug, error, info}; use tracing::{debug, error, info};
use crate::communication::{AgentCommand, ZmqHandler}; use crate::communication::ZmqHandler;
use crate::config::AgentConfig; use crate::config::AgentConfig;
use crate::collectors::{ use crate::collectors::{
Collector, Collector,
@@ -134,12 +134,6 @@ impl Agent {
// NOTE: With structured data, we might need to implement status tracking differently // NOTE: With structured data, we might need to implement status tracking differently
// For now, we skip this until status evaluation is migrated // For now, we skip this until status evaluation is migrated
} }
// Handle incoming commands (check periodically)
_ = tokio::time::sleep(Duration::from_millis(100)) => {
if let Err(e) = self.handle_commands().await {
error!("Error handling commands: {}", e);
}
}
_ = &mut shutdown_rx => { _ = &mut shutdown_rx => {
info!("Shutdown signal received, stopping agent loop"); info!("Shutdown signal received, stopping agent loop");
break; break;
@@ -259,36 +253,4 @@ impl Agent {
Ok(()) Ok(())
} }
/// Handle incoming commands from dashboard
async fn handle_commands(&mut self) -> Result<()> {
// Try to receive a command (non-blocking)
if let Ok(Some(command)) = self.zmq_handler.try_receive_command() {
info!("Received command: {:?}", command);
match command {
AgentCommand::CollectNow => {
info!("Received immediate collection request");
if let Err(e) = self.collect_and_broadcast().await {
error!("Failed to collect on demand: {}", e);
}
}
AgentCommand::SetInterval { seconds } => {
info!("Received interval change request: {}s", seconds);
// Note: This would require more complex handling to update the interval
// For now, just acknowledge
}
AgentCommand::ToggleCollector { name, enabled } => {
info!("Received collector toggle request: {} -> {}", name, enabled);
// Note: This would require more complex handling to enable/disable collectors
// For now, just acknowledge
}
AgentCommand::Ping => {
info!("Received ping command");
// Maybe send back a pong or status
}
}
}
Ok(())
}
} }

View File

@@ -4,7 +4,7 @@ use cm_dashboard_shared::{AgentData, ServiceData, SubServiceData, SubServiceMetr
use std::process::Command; use std::process::Command;
use std::sync::RwLock; use std::sync::RwLock;
use std::time::Instant; use std::time::Instant;
use tracing::{debug, warn}; use tracing::debug;
use super::{Collector, CollectorError}; use super::{Collector, CollectorError};
use crate::config::SystemdConfig; use crate::config::SystemdConfig;
@@ -43,9 +43,10 @@ struct ServiceCacheState {
/// Cached service status information from systemctl list-units /// Cached service status information from systemctl list-units
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
struct ServiceStatusInfo { struct ServiceStatusInfo {
load_state: String,
active_state: String, active_state: String,
sub_state: String, memory_bytes: Option<u64>,
restart_count: Option<u32>,
start_timestamp: Option<u64>,
} }
impl SystemdCollector { impl SystemdCollector {
@@ -86,14 +87,20 @@ impl SystemdCollector {
let mut complete_service_data = Vec::new(); let mut complete_service_data = Vec::new();
for service_name in &monitored_services { for service_name in &monitored_services {
match self.get_service_status(service_name) { match self.get_service_status(service_name) {
Ok((active_status, _detailed_info)) => { Ok(status_info) => {
let memory_mb = self.get_service_memory_usage(service_name).await.unwrap_or(0.0);
let disk_gb = self.get_service_disk_usage(service_name).await.unwrap_or(0.0);
let mut sub_services = Vec::new(); let mut sub_services = Vec::new();
// Calculate uptime if we have start timestamp
let uptime_seconds = status_info.start_timestamp.and_then(|start| {
let now = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.ok()?
.as_secs();
Some(now.saturating_sub(start))
});
// Sub-service metrics for specific services (always include cached results) // Sub-service metrics for specific services (always include cached results)
if service_name.contains("nginx") && active_status == "active" { if service_name.contains("nginx") && status_info.active_state == "active" {
let nginx_sites = self.get_nginx_site_metrics(); let nginx_sites = self.get_nginx_site_metrics();
for (site_name, latency_ms) in nginx_sites { for (site_name, latency_ms) in nginx_sites {
let site_status = if latency_ms >= 0.0 && latency_ms < self.config.nginx_latency_critical_ms { let site_status = if latency_ms >= 0.0 && latency_ms < self.config.nginx_latency_critical_ms {
@@ -118,7 +125,7 @@ impl SystemdCollector {
} }
} }
if service_name.contains("docker") && active_status == "active" { if service_name.contains("docker") && status_info.active_state == "active" {
let docker_containers = self.get_docker_containers(); let docker_containers = self.get_docker_containers();
for (container_name, container_status) in docker_containers { for (container_name, container_status) in docker_containers {
// For now, docker containers have no additional metrics // For now, docker containers have no additional metrics
@@ -155,11 +162,12 @@ impl SystemdCollector {
// Create complete service data // Create complete service data
let service_data = ServiceData { let service_data = ServiceData {
name: service_name.clone(), name: service_name.clone(),
memory_mb,
disk_gb,
user_stopped: false, // TODO: Integrate with service tracker user_stopped: false, // TODO: Integrate with service tracker
service_status: self.calculate_service_status(service_name, &active_status), service_status: self.calculate_service_status(service_name, &status_info.active_state),
sub_services, sub_services,
memory_bytes: status_info.memory_bytes,
restart_count: status_info.restart_count,
uptime_seconds,
}; };
// Add to AgentData and cache // Add to AgentData and cache
@@ -295,14 +303,13 @@ impl SystemdCollector {
let fields: Vec<&str> = line.split_whitespace().collect(); let fields: Vec<&str> = line.split_whitespace().collect();
if fields.len() >= 4 && fields[0].ends_with(".service") { if fields.len() >= 4 && fields[0].ends_with(".service") {
let service_name = fields[0].trim_end_matches(".service"); let service_name = fields[0].trim_end_matches(".service");
let load_state = fields.get(1).unwrap_or(&"unknown").to_string();
let active_state = fields.get(2).unwrap_or(&"unknown").to_string(); let active_state = fields.get(2).unwrap_or(&"unknown").to_string();
let sub_state = fields.get(3).unwrap_or(&"unknown").to_string();
status_cache.insert(service_name.to_string(), ServiceStatusInfo { status_cache.insert(service_name.to_string(), ServiceStatusInfo {
load_state,
active_state, active_state,
sub_state, memory_bytes: None,
restart_count: None,
start_timestamp: None,
}); });
} }
} }
@@ -311,9 +318,10 @@ impl SystemdCollector {
for service_name in &all_service_names { for service_name in &all_service_names {
if !status_cache.contains_key(service_name) { if !status_cache.contains_key(service_name) {
status_cache.insert(service_name.to_string(), ServiceStatusInfo { status_cache.insert(service_name.to_string(), ServiceStatusInfo {
load_state: "not-loaded".to_string(),
active_state: "inactive".to_string(), active_state: "inactive".to_string(),
sub_state: "dead".to_string(), memory_bytes: None,
restart_count: None,
start_timestamp: None,
}); });
} }
} }
@@ -346,35 +354,63 @@ impl SystemdCollector {
} }
/// Get service status from cache (if available) or fallback to systemctl /// Get service status from cache (if available) or fallback to systemctl
fn get_service_status(&self, service: &str) -> Result<(String, String)> { fn get_service_status(&self, service: &str) -> Result<ServiceStatusInfo> {
// Try to get status from cache first // Try to get status from cache first
if let Ok(state) = self.state.read() { if let Ok(state) = self.state.read() {
if let Some(cached_info) = state.service_status_cache.get(service) { if let Some(cached_info) = state.service_status_cache.get(service) {
let active_status = cached_info.active_state.clone(); return Ok(cached_info.clone());
let detailed_info = format!(
"LoadState={}\nActiveState={}\nSubState={}",
cached_info.load_state,
cached_info.active_state,
cached_info.sub_state
);
return Ok((active_status, detailed_info));
} }
} }
// Fallback to systemctl if not in cache (with 2 second timeout) // Fallback to systemctl if not in cache (with 2 second timeout)
let output = Command::new("timeout") let output = Command::new("timeout")
.args(&["2", "systemctl", "is-active", &format!("{}.service", service)]) .args(&[
"2",
"systemctl",
"show",
&format!("{}.service", service),
"--property=LoadState,ActiveState,SubState,MemoryCurrent,NRestarts,ExecMainStartTimestamp"
])
.output()?; .output()?;
let active_status = String::from_utf8(output.stdout)?.trim().to_string(); let output_str = String::from_utf8(output.stdout)?;
// Get more detailed info (with 2 second timeout) // Parse properties
let output = Command::new("timeout") let mut active_state = String::new();
.args(&["2", "systemctl", "show", &format!("{}.service", service), "--property=LoadState,ActiveState,SubState"]) let mut memory_bytes = None;
.output()?; let mut restart_count = None;
let mut start_timestamp = None;
let detailed_info = String::from_utf8(output.stdout)?; for line in output_str.lines() {
Ok((active_status, detailed_info)) if let Some(value) = line.strip_prefix("ActiveState=") {
active_state = value.to_string();
} else if let Some(value) = line.strip_prefix("MemoryCurrent=") {
if value != "[not set]" {
memory_bytes = value.parse().ok();
}
} else if let Some(value) = line.strip_prefix("NRestarts=") {
restart_count = value.parse().ok();
} else if let Some(value) = line.strip_prefix("ExecMainStartTimestamp=") {
if value != "[not set]" && !value.is_empty() {
// Parse timestamp to seconds since epoch
if let Ok(output) = Command::new("date")
.args(&["+%s", "-d", value])
.output()
{
if let Ok(timestamp_str) = String::from_utf8(output.stdout) {
start_timestamp = timestamp_str.trim().parse().ok();
}
}
}
}
}
Ok(ServiceStatusInfo {
active_state,
memory_bytes,
restart_count,
start_timestamp,
})
} }
/// Check if service name matches pattern (supports wildcards like nginx*) /// Check if service name matches pattern (supports wildcards like nginx*)
@@ -416,80 +452,6 @@ impl SystemdCollector {
true true
} }
/// Get disk usage for a specific service
async fn get_service_disk_usage(&self, service_name: &str) -> Result<f32, CollectorError> {
// Check if this service has configured directory paths
if let Some(dirs) = self.config.service_directories.get(service_name) {
// Service has configured paths - use the first accessible one
for dir in dirs {
if let Some(size) = self.get_directory_size(dir).await {
return Ok(size);
}
}
// If configured paths failed, return 0
return Ok(0.0);
}
// No configured path - try to get WorkingDirectory from systemctl (with 2 second timeout)
let output = Command::new("timeout")
.args(&["2", "systemctl", "show", &format!("{}.service", service_name), "--property=WorkingDirectory"])
.output()
.map_err(|e| CollectorError::SystemRead {
path: format!("WorkingDirectory for {}", service_name),
error: e.to_string(),
})?;
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
if line.starts_with("WorkingDirectory=") && !line.contains("[not set]") {
let dir = line.strip_prefix("WorkingDirectory=").unwrap_or("");
if !dir.is_empty() && dir != "/" {
return Ok(self.get_directory_size(dir).await.unwrap_or(0.0));
}
}
}
Ok(0.0)
}
/// Get size of a directory in GB (with 2 second timeout)
async fn get_directory_size(&self, path: &str) -> Option<f32> {
use super::run_command_with_timeout;
// Use -s (summary) and --apparent-size for speed, 2 second timeout
let mut cmd = Command::new("sudo");
cmd.args(&["du", "-s", "--apparent-size", "--block-size=1", path]);
let output = run_command_with_timeout(cmd, 2).await.ok()?;
if !output.status.success() {
// Log permission errors for debugging but don't spam logs
let stderr = String::from_utf8_lossy(&output.stderr);
if stderr.contains("Permission denied") {
debug!("Permission denied accessing directory: {}", path);
} else if stderr.contains("timed out") {
warn!("Directory size check timed out for {}", path);
} else {
debug!("Failed to get size for directory {}: {}", path, stderr);
}
return None;
}
let output_str = String::from_utf8(output.stdout).ok()?;
let size_str = output_str.split_whitespace().next()?;
if let Ok(size_bytes) = size_str.parse::<u64>() {
let size_gb = size_bytes as f32 / (1024.0 * 1024.0 * 1024.0);
// Return size even if very small (minimum 0.001 GB = 1MB for visibility)
if size_gb > 0.0 {
Some(size_gb.max(0.001))
} else {
None
}
} else {
None
}
}
/// Calculate service status, taking user-stopped services into account /// Calculate service status, taking user-stopped services into account
fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status { fn calculate_service_status(&self, service_name: &str, active_status: &str) -> Status {
match active_status.to_lowercase().as_str() { match active_status.to_lowercase().as_str() {
@@ -507,33 +469,6 @@ impl SystemdCollector {
} }
} }
/// Get memory usage for a specific service
async fn get_service_memory_usage(&self, service_name: &str) -> Result<f32, CollectorError> {
let output = Command::new("systemctl")
.args(&["show", &format!("{}.service", service_name), "--property=MemoryCurrent"])
.output()
.map_err(|e| CollectorError::SystemRead {
path: format!("memory usage for {}", service_name),
error: e.to_string(),
})?;
let output_str = String::from_utf8_lossy(&output.stdout);
for line in output_str.lines() {
if line.starts_with("MemoryCurrent=") {
if let Some(mem_str) = line.strip_prefix("MemoryCurrent=") {
if mem_str != "[not set]" {
if let Ok(memory_bytes) = mem_str.parse::<u64>() {
return Ok(memory_bytes as f32 / (1024.0 * 1024.0)); // Convert to MB
}
}
}
}
}
Ok(0.0)
}
/// Check if service collection cache should be updated /// Check if service collection cache should be updated
fn should_update_cache(&self) -> bool { fn should_update_cache(&self) -> bool {
let state = self.state.read().unwrap(); let state = self.state.read().unwrap();

View File

@@ -5,10 +5,9 @@ use zmq::{Context, Socket, SocketType};
use crate::config::ZmqConfig; use crate::config::ZmqConfig;
/// ZMQ communication handler for publishing metrics and receiving commands /// ZMQ communication handler for publishing metrics
pub struct ZmqHandler { pub struct ZmqHandler {
publisher: Socket, publisher: Socket,
command_receiver: Socket,
} }
impl ZmqHandler { impl ZmqHandler {
@@ -26,20 +25,8 @@ impl ZmqHandler {
publisher.set_sndhwm(1000)?; // High water mark for outbound messages publisher.set_sndhwm(1000)?; // High water mark for outbound messages
publisher.set_linger(1000)?; // Linger time on close publisher.set_linger(1000)?; // Linger time on close
// Create command receiver socket (PULL socket to receive commands from dashboard)
let command_receiver = context.socket(SocketType::PULL)?;
let cmd_bind_address = format!("tcp://{}:{}", config.bind_address, config.command_port);
command_receiver.bind(&cmd_bind_address)?;
info!("ZMQ command receiver bound to {}", cmd_bind_address);
// Set non-blocking mode for command receiver
command_receiver.set_rcvtimeo(0)?; // Non-blocking receive
command_receiver.set_linger(1000)?;
Ok(Self { Ok(Self {
publisher, publisher,
command_receiver,
}) })
} }
@@ -65,36 +52,4 @@ impl ZmqHandler {
Ok(()) Ok(())
} }
/// Try to receive a command (non-blocking)
pub fn try_receive_command(&self) -> Result<Option<AgentCommand>> {
match self.command_receiver.recv_bytes(zmq::DONTWAIT) {
Ok(bytes) => {
debug!("Received command message ({} bytes)", bytes.len());
let command: AgentCommand = serde_json::from_slice(&bytes)
.map_err(|e| anyhow::anyhow!("Failed to deserialize command: {}", e))?;
debug!("Parsed command: {:?}", command);
Ok(Some(command))
}
Err(zmq::Error::EAGAIN) => {
// No message available (non-blocking)
Ok(None)
}
Err(e) => Err(anyhow::anyhow!("ZMQ receive error: {}", e)),
}
}
}
/// Commands that can be sent to the agent
#[derive(Debug, Clone, serde::Deserialize, serde::Serialize)]
pub enum AgentCommand {
/// Request immediate metric collection
CollectNow,
/// Change collection interval
SetInterval { seconds: u64 },
/// Enable/disable a collector
ToggleCollector { name: String, enabled: bool },
/// Request status/health check
Ping,
} }

View File

@@ -20,7 +20,6 @@ pub struct AgentConfig {
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ZmqConfig { pub struct ZmqConfig {
pub publisher_port: u16, pub publisher_port: u16,
pub command_port: u16,
pub bind_address: String, pub bind_address: String,
pub transmission_interval_seconds: u64, pub transmission_interval_seconds: u64,
/// Heartbeat transmission interval in seconds for host connectivity detection /// Heartbeat transmission interval in seconds for host connectivity detection

View File

@@ -7,14 +7,6 @@ pub fn validate_config(config: &AgentConfig) -> Result<()> {
bail!("ZMQ publisher port cannot be 0"); bail!("ZMQ publisher port cannot be 0");
} }
if config.zmq.command_port == 0 {
bail!("ZMQ command port cannot be 0");
}
if config.zmq.publisher_port == config.zmq.command_port {
bail!("ZMQ publisher and command ports cannot be the same");
}
if config.zmq.bind_address.is_empty() { if config.zmq.bind_address.is_empty() {
bail!("ZMQ bind address cannot be empty"); bail!("ZMQ bind address cannot be empty");
} }

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard" name = "cm-dashboard"
version = "0.1.196" version = "0.1.206"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -28,11 +28,12 @@ pub struct ServicesWidget {
#[derive(Clone)] #[derive(Clone)]
struct ServiceInfo { struct ServiceInfo {
memory_mb: Option<f32>,
disk_gb: Option<f32>,
metrics: Vec<(String, f32, Option<String>)>, // (label, value, unit) metrics: Vec<(String, f32, Option<String>)>, // (label, value, unit)
widget_status: Status, widget_status: Status,
service_type: String, // "nginx_site", "container", "image", or empty for parent services service_type: String, // "nginx_site", "container", "image", or empty for parent services
memory_bytes: Option<u64>,
restart_count: Option<u32>,
uptime_seconds: Option<u64>,
} }
impl ServicesWidget { impl ServicesWidget {
@@ -52,8 +53,6 @@ impl ServicesWidget {
if metric_name.starts_with("service_") { if metric_name.starts_with("service_") {
if let Some(end_pos) = metric_name if let Some(end_pos) = metric_name
.rfind("_status") .rfind("_status")
.or_else(|| metric_name.rfind("_memory_mb"))
.or_else(|| metric_name.rfind("_disk_gb"))
.or_else(|| metric_name.rfind("_latency_ms")) .or_else(|| metric_name.rfind("_latency_ms"))
{ {
let service_part = &metric_name[8..end_pos]; // Remove "service_" prefix let service_part = &metric_name[8..end_pos]; // Remove "service_" prefix
@@ -76,36 +75,8 @@ impl ServicesWidget {
None None
} }
/// Format disk size with appropriate units (kB/MB/GB)
fn format_disk_size(size_gb: f32) -> String {
let size_mb = size_gb * 1024.0; // Convert GB to MB
if size_mb >= 1024.0 {
// Show as GB
format!("{:.1}GB", size_gb)
} else if size_mb >= 1.0 {
// Show as MB
format!("{:.0}MB", size_mb)
} else if size_mb >= 0.001 {
// Convert to kB
let size_kb = size_mb * 1024.0;
format!("{:.0}kB", size_kb)
} else {
// Show very small sizes as bytes
let size_bytes = size_mb * 1024.0 * 1024.0;
format!("{:.0}B", size_bytes)
}
}
/// Format parent service line - returns text without icon for span formatting /// Format parent service line - returns text without icon for span formatting
fn format_parent_service_line(&self, name: &str, info: &ServiceInfo) -> String { fn format_parent_service_line(&self, name: &str, info: &ServiceInfo) -> String {
let memory_str = info
.memory_mb
.map_or("0M".to_string(), |m| format!("{:.0}M", m));
let disk_str = info
.disk_gb
.map_or("0".to_string(), |d| Self::format_disk_size(d));
// Truncate long service names to fit layout (account for icon space) // Truncate long service names to fit layout (account for icon space)
let short_name = if name.len() > 22 { let short_name = if name.len() > 22 {
format!("{}...", &name[..19]) format!("{}...", &name[..19])
@@ -116,7 +87,7 @@ impl ServicesWidget {
// Convert Status enum to display text // Convert Status enum to display text
let status_str = match info.widget_status { let status_str = match info.widget_status {
Status::Ok => "active", Status::Ok => "active",
Status::Inactive => "inactive", Status::Inactive => "inactive",
Status::Critical => "failed", Status::Critical => "failed",
Status::Pending => "pending", Status::Pending => "pending",
Status::Warning => "warning", Status::Warning => "warning",
@@ -124,9 +95,43 @@ impl ServicesWidget {
Status::Offline => "offline", Status::Offline => "offline",
}; };
// Format memory
let memory_str = info.memory_bytes.map_or("-".to_string(), |bytes| {
let mb = bytes as f64 / (1024.0 * 1024.0);
if mb >= 1000.0 {
format!("{:.1}G", mb / 1024.0)
} else {
format!("{:.0}M", mb)
}
});
// Format uptime
let uptime_str = info.uptime_seconds.map_or("-".to_string(), |secs| {
let days = secs / 86400;
let hours = (secs % 86400) / 3600;
let mins = (secs % 3600) / 60;
if days > 0 {
format!("{}d{}h", days, hours)
} else if hours > 0 {
format!("{}h{}m", hours, mins)
} else {
format!("{}m", mins)
}
});
// Format restarts (show "!" if > 0 to indicate instability)
let restart_str = info.restart_count.map_or("-".to_string(), |count| {
if count > 0 {
format!("!{}", count)
} else {
"0".to_string()
}
});
format!( format!(
"{:<23} {:<10} {:<8} {:<8}", "{:<23} {:<10} {:<8} {:<8} {:<5}",
short_name, status_str, memory_str, disk_str short_name, status_str, memory_str, uptime_str, restart_str
) )
} }
@@ -309,11 +314,12 @@ impl Widget for ServicesWidget {
for service in &agent_data.services { for service in &agent_data.services {
// Store parent service // Store parent service
let parent_info = ServiceInfo { let parent_info = ServiceInfo {
memory_mb: Some(service.memory_mb),
disk_gb: Some(service.disk_gb),
metrics: Vec::new(), // Parent services don't have custom metrics metrics: Vec::new(), // Parent services don't have custom metrics
widget_status: service.service_status, widget_status: service.service_status,
service_type: String::new(), // Parent services have no type service_type: String::new(), // Parent services have no type
memory_bytes: service.memory_bytes,
restart_count: service.restart_count,
uptime_seconds: service.uptime_seconds,
}; };
self.parent_services.insert(service.name.clone(), parent_info); self.parent_services.insert(service.name.clone(), parent_info);
@@ -327,11 +333,12 @@ impl Widget for ServicesWidget {
.collect(); .collect();
let sub_info = ServiceInfo { let sub_info = ServiceInfo {
memory_mb: None, // Not used for sub-services
disk_gb: None, // Not used for sub-services
metrics, metrics,
widget_status: sub_service.service_status, widget_status: sub_service.service_status,
service_type: sub_service.service_type.clone(), service_type: sub_service.service_type.clone(),
memory_bytes: None, // Sub-services don't have individual metrics yet
restart_count: None,
uptime_seconds: None,
}; };
sub_list.push((sub_service.name.clone(), sub_info)); sub_list.push((sub_service.name.clone(), sub_info));
} }
@@ -371,23 +378,16 @@ impl ServicesWidget {
self.parent_services self.parent_services
.entry(parent_service) .entry(parent_service)
.or_insert(ServiceInfo { .or_insert(ServiceInfo {
memory_mb: None,
disk_gb: None,
metrics: Vec::new(), metrics: Vec::new(),
widget_status: Status::Unknown, widget_status: Status::Unknown,
service_type: String::new(), service_type: String::new(),
memory_bytes: None,
restart_count: None,
uptime_seconds: None,
}); });
if metric.name.ends_with("_status") { if metric.name.ends_with("_status") {
service_info.widget_status = metric.status; service_info.widget_status = metric.status;
} else if metric.name.ends_with("_memory_mb") {
if let Some(memory) = metric.value.as_f32() {
service_info.memory_mb = Some(memory);
}
} else if metric.name.ends_with("_disk_gb") {
if let Some(disk) = metric.value.as_f32() {
service_info.disk_gb = Some(disk);
}
} }
} }
Some(sub_name) => { Some(sub_name) => {
@@ -407,11 +407,12 @@ impl ServicesWidget {
sub_service_list.push(( sub_service_list.push((
sub_name.clone(), sub_name.clone(),
ServiceInfo { ServiceInfo {
memory_mb: None,
disk_gb: None,
metrics: Vec::new(), metrics: Vec::new(),
widget_status: Status::Unknown, widget_status: Status::Unknown,
service_type: String::new(), // Unknown type in legacy path service_type: String::new(), // Unknown type in legacy path
memory_bytes: None,
restart_count: None,
uptime_seconds: None,
}, },
)); ));
&mut sub_service_list.last_mut().unwrap().1 &mut sub_service_list.last_mut().unwrap().1
@@ -419,14 +420,6 @@ impl ServicesWidget {
if metric.name.ends_with("_status") { if metric.name.ends_with("_status") {
sub_service_info.widget_status = metric.status; sub_service_info.widget_status = metric.status;
} else if metric.name.ends_with("_memory_mb") {
if let Some(memory) = metric.value.as_f32() {
sub_service_info.memory_mb = Some(memory);
}
} else if metric.name.ends_with("_disk_gb") {
if let Some(disk) = metric.value.as_f32() {
sub_service_info.disk_gb = Some(disk);
}
} }
} }
} }
@@ -485,8 +478,8 @@ impl ServicesWidget {
// Header // Header
let header = format!( let header = format!(
"{:<25} {:<10} {:<8} {:<8}", "{:<25} {:<10} {:<8} {:<8} {:<5}",
"Service:", "Status:", "RAM:", "Disk:" "Service:", "Status:", "RAM:", "Uptime:", ":"
); );
let header_para = Paragraph::new(header).style(Typography::muted()); let header_para = Paragraph::new(header).style(Typography::muted());
frame.render_widget(header_para, content_chunks[0]); frame.render_widget(header_para, content_chunks[0]);

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "cm-dashboard-shared" name = "cm-dashboard-shared"
version = "0.1.196" version = "0.1.206"
edition = "2021" edition = "2021"
[dependencies] [dependencies]

View File

@@ -136,11 +136,15 @@ pub struct PoolDriveData {
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ServiceData { pub struct ServiceData {
pub name: String, pub name: String,
pub memory_mb: f32,
pub disk_gb: f32,
pub user_stopped: bool, pub user_stopped: bool,
pub service_status: Status, pub service_status: Status,
pub sub_services: Vec<SubServiceData>, pub sub_services: Vec<SubServiceData>,
/// Memory usage in bytes (from MemoryCurrent)
pub memory_bytes: Option<u64>,
/// Number of service restarts (from NRestarts)
pub restart_count: Option<u32>,
/// Uptime in seconds (calculated from ExecMainStartTimestamp)
pub uptime_seconds: Option<u64>,
} }
/// Sub-service data (nginx sites, docker containers, etc.) /// Sub-service data (nginx sites, docker containers, etc.)