Compare commits

...

9 Commits

Author SHA1 Message Date
db0e41a7d3 Remove blocking CollectNow commands to fix heartbeat stability
All checks were successful
Build and Release / build-and-release (push) Successful in 1m9s
Eliminates automatic CollectNow command sending on host connection which
was blocking the main message processing loop for up to 5 seconds per
command. Since agents transmit cached data every 2 seconds anyway, the
CollectNow optimization provided minimal benefit while causing heartbeat
detection issues. Also removes unused send_command wrapper method.

This should completely resolve intermittent host connection dropping.
2025-11-15 11:41:58 +01:00
ec460496d8 Remove blocking TCP connectivity tests for fast startup
All checks were successful
Build and Release / build-and-release (push) Successful in 1m10s
Eliminates test_tcp_connectivity function that was causing 5-10 second
startup delays. ZMQ connections are non-blocking and we rely entirely
on heartbeat mechanism for connectivity detection. This restores fast
dashboard startup time.
2025-11-15 11:09:49 +01:00
33e700529e Bump version to 0.1.71
All checks were successful
Build and Release / build-and-release (push) Successful in 1m30s
Version bump for release with fixed automated NixOS configuration
update workflow that uses the correct file path.
2025-11-15 10:25:08 +01:00
d644b7d40a Fix NixOS config path in automated release workflow
Update release.yml to use correct path hosts/services/cm-dashboard.nix
instead of hosts/common/cm-dashboard.nix. Also update documentation
in CLAUDE.md and README.md to reflect the correct file location.
2025-11-15 10:21:30 +01:00
f635ba9c75 Remove Tailscale and connection type complexity
Some checks failed
Build and Release / build-and-release (push) Has been cancelled
Simplifies host connection configuration by removing tailscale_ip field,
connection_type preferences, and fallback retry logic. Now uses only the
ip field or hostname as fallback. Eliminates blocking TCP connectivity
tests that interfered with heartbeat processing.

This resolves intermittent host lost/found issues by removing the
connection retry timeouts that blocked the ZMQ message processing loop.
2025-11-15 10:04:47 +01:00
76b6e3373e Change auto connection type to prioritize local IP first
All checks were successful
Build and Release / build-and-release (push) Successful in 2m36s
Update the auto connection type logic to try local network connections
before falling back to Tailscale. This provides better performance by
using faster local connections when available while maintaining Tailscale
as a reliable fallback.

Changes:
- Auto connection priority: local → tailscale → hostname (was tailscale → local)
- Fallback retry order updated to match new priority
- Supports omitting IP field in config for hosts without static local IP
2025-11-13 12:52:46 +01:00
0a13cab897 Add detected IP display in dashboard Agent row
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Display the connection IP address that the dashboard is configured to use
for each host below the Agent version information. Shows which network
path (local/Tailscale) is being used for connections based on host
configuration.

Features:
- Display detected IP below Agent row in system widget
- Uses existing host configuration connection logic
- Shows actual IP being used for dashboard connections
2025-11-13 11:26:58 +01:00
d33ec5d225 Add Tailscale network support for host connections
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Implement configurable network routing for both local and Tailscale networks.
Dashboard now supports intelligent connection selection with automatic fallback
between network types. Add IP configuration fields and connection routing logic
for ZMQ and SSH operations.

Features:
- Host configuration with local and Tailscale IP addresses
- Configurable connection types (local/tailscale/auto)
- Automatic fallback between network connections
- Updated ZMQ connection logic with retry support
- SSH command routing through configured IP addresses
2025-11-13 10:08:17 +01:00
d31c2384df Add configurable maintenance mode file support
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
Implement maintenance_mode_file configuration option in NotificationConfig
to allow customizable file paths for suppressing email notifications.
Updates maintenance mode check to use configured path instead of hardcoded
/tmp/cm-maintenance file.
2025-11-10 07:48:15 +01:00
15 changed files with 114 additions and 81 deletions

View File

@@ -113,13 +113,13 @@ jobs:
NIX_HASH="sha256-$(python3 -c "import base64, binascii; print(base64.b64encode(binascii.unhexlify('$NEW_HASH')).decode())")"
# Update the NixOS configuration
sed -i "s|version = \"v[^\"]*\"|version = \"$VERSION\"|" hosts/common/cm-dashboard.nix
sed -i "s|sha256 = \"sha256-[^\"]*\"|sha256 = \"$NIX_HASH\"|" hosts/common/cm-dashboard.nix
sed -i "s|version = \"v[^\"]*\"|version = \"$VERSION\"|" hosts/services/cm-dashboard.nix
sed -i "s|sha256 = \"sha256-[^\"]*\"|sha256 = \"$NIX_HASH\"|" hosts/services/cm-dashboard.nix
# Commit and push changes
git config user.name "Gitea Actions"
git config user.email "actions@gitea.cmtec.se"
git add hosts/common/cm-dashboard.nix
git add hosts/services/cm-dashboard.nix
git commit -m "Auto-update cm-dashboard to $VERSION
- Update version to $VERSION with automated release

View File

@@ -115,7 +115,7 @@ This automatically:
- Uploads binaries via Gitea API
### NixOS Configuration Updates
Edit `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
Edit `~/projects/nixosbox/hosts/services/cm-dashboard.nix`:
```nix
version = "v0.1.X";

6
Cargo.lock generated
View File

@@ -270,7 +270,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]]
name = "cm-dashboard"
version = "0.1.64"
version = "0.1.72"
dependencies = [
"anyhow",
"chrono",
@@ -292,7 +292,7 @@ dependencies = [
[[package]]
name = "cm-dashboard-agent"
version = "0.1.64"
version = "0.1.72"
dependencies = [
"anyhow",
"async-trait",
@@ -315,7 +315,7 @@ dependencies = [
[[package]]
name = "cm-dashboard-shared"
version = "0.1.64"
version = "0.1.72"
dependencies = [
"chrono",
"serde",

View File

@@ -329,7 +329,7 @@ This triggers automated:
- Tarball upload to Gitea
### NixOS Integration
Update `~/projects/nixosbox/hosts/common/cm-dashboard.nix`:
Update `~/projects/nixosbox/hosts/services/cm-dashboard.nix`:
```nix
version = "v0.1.43";

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard-agent"
version = "0.1.65"
version = "0.1.73"
edition = "2021"
[dependencies]

View File

@@ -351,36 +351,40 @@ impl Agent {
_ => {}
}
let output = tokio::process::Command::new("sudo")
.arg("systemctl")
.arg(action_str)
.arg(format!("{}.service", service_name))
.output()
.await?;
// Spawn the systemctl command asynchronously to avoid blocking the agent
let service_name_clone = service_name.to_string();
let action_str_clone = action_str.to_string();
tokio::spawn(async move {
let result = tokio::process::Command::new("sudo")
.arg("systemctl")
.arg(&action_str_clone)
.arg(format!("{}.service", service_name_clone))
.output()
.await;
match result {
Ok(output) => {
if output.status.success() {
info!("Service {} {} completed successfully", service_name, action_str);
info!("Service {} {} completed successfully", service_name_clone, action_str_clone);
if !output.stdout.is_empty() {
debug!("stdout: {}", String::from_utf8_lossy(&output.stdout));
}
// Note: User-stopped flag will be cleared by systemd collector
// when service actually reaches 'active' state, not here
} else {
let stderr = String::from_utf8_lossy(&output.stderr);
error!("Service {} {} failed: {}", service_name, action_str, stderr);
return Err(anyhow::anyhow!("systemctl {} {} failed: {}", action_str, service_name, stderr));
error!("Service {} {} failed: {}", service_name_clone, action_str_clone, stderr);
}
}
Err(e) => {
error!("Failed to execute systemctl {} {}: {}", action_str_clone, service_name_clone, e);
}
}
});
// Force refresh metrics after service control to update service status
if matches!(action, ServiceAction::Start | ServiceAction::Stop | ServiceAction::UserStart | ServiceAction::UserStop) {
info!("Triggering immediate metric refresh after service control");
if let Err(e) = self.collect_metrics_only().await {
error!("Failed to refresh metrics after service control: {}", e);
} else {
info!("Service status refreshed immediately after {} {}", action_str, service_name);
}
}
info!("Service {} {} command initiated (non-blocking)", service_name, action_str);
// Note: Service status will be updated by the normal metric collection cycle
// once the systemctl operation completes
Ok(())
}

View File

@@ -150,6 +150,9 @@ pub struct NotificationConfig {
/// List of metric names to exclude from email notifications
#[serde(default)]
pub exclude_email_metrics: Vec<String>,
/// Path to maintenance mode file that suppresses email notifications when present
#[serde(default = "default_maintenance_mode_file")]
pub maintenance_mode_file: String,
}
@@ -157,6 +160,10 @@ fn default_heartbeat_interval_seconds() -> u64 {
5
}
fn default_maintenance_mode_file() -> String {
"/tmp/cm-maintenance".to_string()
}
impl AgentConfig {
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
loader::load_config(path)

View File

@@ -59,6 +59,6 @@ impl NotificationManager {
}
fn is_maintenance_mode(&self) -> bool {
std::fs::metadata("/tmp/cm-maintenance").is_ok()
std::fs::metadata(&self.config.maintenance_mode_file).is_ok()
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard"
version = "0.1.65"
version = "0.1.73"
edition = "2021"
[dependencies]

View File

@@ -67,11 +67,8 @@ impl Dashboard {
}
};
// Connect to configured hosts from configuration
let hosts: Vec<String> = config.hosts.keys().cloned().collect();
// Try to connect to hosts but don't fail if none are available
match zmq_consumer.connect_to_predefined_hosts(&hosts).await {
match zmq_consumer.connect_to_predefined_hosts(&config.hosts).await {
Ok(_) => info!("Successfully connected to ZMQ hosts"),
Err(e) => {
warn!(
@@ -137,12 +134,6 @@ impl Dashboard {
})
}
/// Send a command to a specific agent
pub async fn send_command(&mut self, hostname: &str, command: AgentCommand) -> Result<()> {
self.zmq_command_sender
.send_command(hostname, command)
.await
}
pub async fn run(&mut self) -> Result<()> {
info!("Starting dashboard main loop");
@@ -215,35 +206,19 @@ impl Dashboard {
metric_message.metrics.len()
);
// Check if this is the first time we've seen this host
// Track first contact with host (no command needed - agent sends data every 2s)
let is_new_host = !self
.initial_commands_sent
.contains(&metric_message.hostname);
if is_new_host {
info!(
"First contact with host {}, sending initial CollectNow command",
metric_message.hostname
);
// Send CollectNow command for immediate refresh
if let Err(e) = self
.send_command(&metric_message.hostname, AgentCommand::CollectNow)
.await
{
error!(
"Failed to send initial CollectNow command to {}: {}",
metric_message.hostname, e
);
} else {
info!(
"✓ Sent initial CollectNow command to {}",
"First contact with host {} - data will update automatically",
metric_message.hostname
);
self.initial_commands_sent
.insert(metric_message.hostname.clone());
}
}
// Update metric store
self.metric_store

View File

@@ -84,13 +84,14 @@ impl ZmqConsumer {
}
}
/// Connect to predefined hosts
pub async fn connect_to_predefined_hosts(&mut self, hosts: &[String]) -> Result<()> {
/// Connect to predefined hosts using their configuration
pub async fn connect_to_predefined_hosts(&mut self, hosts: &std::collections::HashMap<String, crate::config::HostDetails>) -> Result<()> {
let default_port = self.config.subscriber_ports[0];
for hostname in hosts {
// Try to connect, but don't fail if some hosts are unreachable
if let Err(e) = self.connect_to_host(hostname, default_port).await {
for (hostname, host_details) in hosts {
// Try to connect using configured IP, but don't fail if some hosts are unreachable
if let Err(e) = self.connect_to_host_with_details(hostname, host_details, default_port).await {
warn!("Could not connect to {}: {}", hostname, e);
}
}
@@ -104,6 +105,15 @@ impl ZmqConsumer {
Ok(())
}
/// Connect to a host using its configuration details
pub async fn connect_to_host_with_details(&mut self, hostname: &str, host_details: &crate::config::HostDetails, port: u16) -> Result<()> {
// Get primary connection IP only - no fallbacks
let primary_ip = host_details.get_connection_ip(hostname);
// Connect directly without fallback attempts
self.connect_to_host(&primary_ip, port).await
}
/// Receive command output from any connected agent (non-blocking)
pub async fn receive_command_output(&mut self) -> Result<Option<CommandOutputMessage>> {
match self.subscriber.recv_bytes(zmq::DONTWAIT) {

View File

@@ -29,6 +29,17 @@ fn default_heartbeat_timeout_seconds() -> u64 {
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct HostDetails {
pub mac_address: Option<String>,
/// Primary IP address (local network)
pub ip: Option<String>,
}
impl HostDetails {
/// Get the IP address for connection (uses ip field or hostname as fallback)
pub fn get_connection_ip(&self, hostname: &str) -> String {
self.ip.as_ref().unwrap_or(&hostname.to_string()).clone()
}
}
/// System configuration

View File

@@ -251,12 +251,14 @@ impl TuiApp {
KeyCode::Char('r') => {
// System rebuild command - works on any panel for current host
if let Some(hostname) = self.current_host.clone() {
let connection_ip = self.get_connection_ip(&hostname);
// Create command that shows logo, rebuilds, and waits for user input
let logo_and_rebuild = format!(
"bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {}\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
"bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {} ({})\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
hostname,
connection_ip,
self.config.ssh.rebuild_user,
hostname,
connection_ip,
self.config.ssh.rebuild_alias
);
@@ -289,10 +291,11 @@ impl TuiApp {
KeyCode::Char('J') => {
// Show service logs via journalctl in tmux split window
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
let connection_ip = self.get_connection_ip(&hostname);
let journalctl_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo journalctl -u {}.service -f --no-pager -n 50'; exit\"",
self.config.ssh.rebuild_user,
hostname,
connection_ip,
service_name
);
@@ -312,10 +315,11 @@ impl TuiApp {
// Check if this service has a custom log file configured
if let Some(host_logs) = self.config.service_logs.get(&hostname) {
if let Some(log_config) = host_logs.iter().find(|config| config.service_name == service_name) {
let connection_ip = self.get_connection_ip(&hostname);
let tail_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo tail -n 50 -f {}'; exit\"",
self.config.ssh.rebuild_user,
hostname,
connection_ip,
log_config.log_file_path
);
@@ -368,10 +372,11 @@ impl TuiApp {
KeyCode::Char('t') => {
// Open SSH terminal session in tmux window
if let Some(hostname) = self.current_host.clone() {
let connection_ip = self.get_connection_ip(&hostname);
let ssh_command = format!(
"ssh -tt {}@{}",
self.config.ssh.rebuild_user,
hostname
connection_ip
);
std::process::Command::new("tmux")
@@ -824,8 +829,10 @@ impl TuiApp {
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.system_scroll_offset
};
// Clone the config to avoid borrowing issues
let config = self.config.clone();
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname);
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname, Some(&config));
}
}
@@ -917,6 +924,15 @@ impl TuiApp {
}
/// Parse MAC address string (e.g., "AA:BB:CC:DD:EE:FF") to [u8; 6]
/// Get the connection IP for a hostname based on host configuration
fn get_connection_ip(&self, hostname: &str) -> String {
if let Some(host_details) = self.config.hosts.get(hostname) {
host_details.get_connection_ip(hostname)
} else {
hostname.to_string()
}
}
fn parse_mac_address(mac_str: &str) -> Result<[u8; 6], &'static str> {
let parts: Vec<&str> = mac_str.split(':').collect();
if parts.len() != 6 {

View File

@@ -439,7 +439,7 @@ impl Widget for SystemWidget {
impl SystemWidget {
/// Render with scroll offset support
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str) {
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str, config: Option<&crate::config::DashboardConfig>) {
let mut lines = Vec::new();
// NixOS section
@@ -457,6 +457,16 @@ impl SystemWidget {
Span::styled(format!("Agent: {}", agent_version_text), Typography::secondary())
]));
// Display detected connection IP
if let Some(config) = config {
if let Some(host_details) = config.hosts.get(hostname) {
let detected_ip = host_details.get_connection_ip(hostname);
lines.push(Line::from(vec![
Span::styled(format!("IP: {}", detected_ip), Typography::secondary())
]));
}
}
// CPU section
lines.push(Line::from(vec![

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard-shared"
version = "0.1.65"
version = "0.1.73"
edition = "2021"
[dependencies]