Compare commits

...

8 Commits

Author SHA1 Message Date
76b6e3373e Change auto connection type to prioritize local IP first
All checks were successful
Build and Release / build-and-release (push) Successful in 2m36s
Update the auto connection type logic to try local network connections
before falling back to Tailscale. This provides better performance by
using faster local connections when available while maintaining Tailscale
as a reliable fallback.

Changes:
- Auto connection priority: local → tailscale → hostname (was tailscale → local)
- Fallback retry order updated to match new priority
- Supports omitting IP field in config for hosts without static local IP
2025-11-13 12:52:46 +01:00
0a13cab897 Add detected IP display in dashboard Agent row
All checks were successful
Build and Release / build-and-release (push) Successful in 1m8s
Display the connection IP address that the dashboard is configured to use
for each host below the Agent version information. Shows which network
path (local/Tailscale) is being used for connections based on host
configuration.

Features:
- Display detected IP below Agent row in system widget
- Uses existing host configuration connection logic
- Shows actual IP being used for dashboard connections
2025-11-13 11:26:58 +01:00
d33ec5d225 Add Tailscale network support for host connections
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
Implement configurable network routing for both local and Tailscale networks.
Dashboard now supports intelligent connection selection with automatic fallback
between network types. Add IP configuration fields and connection routing logic
for ZMQ and SSH operations.

Features:
- Host configuration with local and Tailscale IP addresses
- Configurable connection types (local/tailscale/auto)
- Automatic fallback between network connections
- Updated ZMQ connection logic with retry support
- SSH command routing through configured IP addresses
2025-11-13 10:08:17 +01:00
d31c2384df Add configurable maintenance mode file support
All checks were successful
Build and Release / build-and-release (push) Successful in 1m32s
Implement maintenance_mode_file configuration option in NotificationConfig
to allow customizable file paths for suppressing email notifications.
Updates maintenance mode check to use configured path instead of hardcoded
/tmp/cm-maintenance file.
2025-11-10 07:48:15 +01:00
c8db463204 Add interactive SSH terminal session functionality
All checks were successful
Build and Release / build-and-release (push) Successful in 1m31s
- Press 't' to open SSH session to current host in tmux split
- Uses 30% vertical split consistent with logs and rebuild commands
- Auto-closes tmux window when SSH session ends
- Provides direct host administration access from dashboard
- Uses same SSH configuration as rebuild operations

Version 0.1.65
2025-11-09 11:39:43 +01:00
e8e50ef9bb Replace empty panels with offline host message for better UX
All checks were successful
Build and Release / build-and-release (push) Successful in 2m33s
- Hide all system/backup/service panels when host is offline
- Show centered wake-up message with host status
- Display "Press 'w' to wake up host" if MAC address configured
- Provide clear visual indication when hosts are unreachable
- Improve user experience by removing confusing empty panels

Version 0.1.64
2025-11-08 18:28:28 +01:00
0faed9309e Improve host disconnection detection and fix notification exclusions
All checks were successful
Build and Release / build-and-release (push) Successful in 1m34s
- Add dedicated heartbeat transmission every 5 seconds independent of metric collection
- Fix host offline detection by clearing metrics for disconnected hosts
- Move exclude_email_metrics to NotificationConfig for better organization
- Add cleanup_offline_hosts method to remove stale metrics after heartbeat timeout
- Ensure offline hosts show proper status icons and visual indicators

Version 0.1.63
2025-11-08 11:33:32 +01:00
c980346d05 Fix heartbeat detection to properly detect offline hosts
All checks were successful
Build and Release / build-and-release (push) Successful in 2m34s
- Add independent heartbeat checking timer (1 second interval) separate from metric reception
- Move get_connected_hosts() call outside metric receive condition to run periodically
- Remove duplicate update_hosts() call from metric processing to avoid redundancy
- Ensure offline host detection works even when no new metrics are received
- Fix issue where hosts going offline were never detected due to conditional heartbeat check
- Heartbeat timeouts now properly detected within configured timeout + 1 second
- Bump version to 0.1.62
2025-11-07 14:27:03 +01:00
13 changed files with 323 additions and 34 deletions

6
Cargo.lock generated
View File

@@ -270,7 +270,7 @@ checksum = "a1d728cc89cf3aee9ff92b05e62b19ee65a02b5702cff7d5a377e32c6ae29d8d"
[[package]]
name = "cm-dashboard"
version = "0.1.60"
version = "0.1.68"
dependencies = [
"anyhow",
"chrono",
@@ -292,7 +292,7 @@ dependencies = [
[[package]]
name = "cm-dashboard-agent"
version = "0.1.60"
version = "0.1.68"
dependencies = [
"anyhow",
"async-trait",
@@ -315,7 +315,7 @@ dependencies = [
[[package]]
name = "cm-dashboard-shared"
version = "0.1.60"
version = "0.1.68"
dependencies = [
"chrono",
"serde",

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard-agent"
version = "0.1.61"
version = "0.1.69"
edition = "2021"
[dependencies]

View File

@@ -78,10 +78,11 @@ impl Agent {
info!("Initial metric collection completed - all data cached and ready");
}
// Separate intervals for collection, transmission, and email notifications
// Separate intervals for collection, transmission, heartbeat, and email notifications
let mut collection_interval =
interval(Duration::from_secs(self.config.collection_interval_seconds));
let mut transmission_interval = interval(Duration::from_secs(self.config.zmq.transmission_interval_seconds));
let mut heartbeat_interval = interval(Duration::from_secs(self.config.zmq.heartbeat_interval_seconds));
let mut notification_interval = interval(Duration::from_secs(self.config.notifications.aggregation_interval_seconds));
loop {
@@ -98,6 +99,12 @@ impl Agent {
error!("Failed to broadcast metrics: {}", e);
}
}
_ = heartbeat_interval.tick() => {
// Send standalone heartbeat for host connectivity detection
if let Err(e) = self.send_heartbeat().await {
error!("Failed to send heartbeat: {}", e);
}
}
_ = notification_interval.tick() => {
// Process batched email notifications (separate from dashboard updates)
if let Err(e) = self.host_status_manager.process_pending_notifications(&mut self.notification_manager).await {
@@ -206,7 +213,7 @@ impl Agent {
let mut status_changed = false;
for metric in metrics {
// Filter excluded metrics from email notification processing only
if self.config.exclude_email_metrics.contains(&metric.name) {
if self.config.notifications.exclude_email_metrics.contains(&metric.name) {
debug!("Excluding metric '{}' from email notification processing", metric.name);
continue;
}
@@ -252,6 +259,19 @@ impl Agent {
)
}
/// Send standalone heartbeat for connectivity detection
async fn send_heartbeat(&mut self) -> Result<()> {
let heartbeat_metric = self.get_heartbeat_metric();
let message = MetricMessage::new(
self.hostname.clone(),
vec![heartbeat_metric],
);
self.zmq_handler.publish_metrics(&message).await?;
debug!("Sent standalone heartbeat for connectivity detection");
Ok(())
}
async fn handle_commands(&mut self) -> Result<()> {
// Try to receive commands (non-blocking)
match self.zmq_handler.try_receive_command() {

View File

@@ -17,9 +17,6 @@ pub struct AgentConfig {
pub notifications: NotificationConfig,
pub status_aggregation: HostStatusConfig,
pub collection_interval_seconds: u64,
/// List of metric names to exclude from email notifications
#[serde(default)]
pub exclude_email_metrics: Vec<String>,
}
/// ZMQ communication configuration
@@ -29,6 +26,9 @@ pub struct ZmqConfig {
pub command_port: u16,
pub bind_address: String,
pub transmission_interval_seconds: u64,
/// Heartbeat transmission interval in seconds for host connectivity detection
#[serde(default = "default_heartbeat_interval_seconds")]
pub heartbeat_interval_seconds: u64,
}
/// Collector configuration
@@ -147,9 +147,23 @@ pub struct NotificationConfig {
pub rate_limit_minutes: u64,
/// Email notification batching interval in seconds (default: 60)
pub aggregation_interval_seconds: u64,
/// List of metric names to exclude from email notifications
#[serde(default)]
pub exclude_email_metrics: Vec<String>,
/// Path to maintenance mode file that suppresses email notifications when present
#[serde(default = "default_maintenance_mode_file")]
pub maintenance_mode_file: String,
}
fn default_heartbeat_interval_seconds() -> u64 {
5
}
fn default_maintenance_mode_file() -> String {
"/tmp/cm-maintenance".to_string()
}
impl AgentConfig {
pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Self> {
loader::load_config(path)

View File

@@ -59,6 +59,6 @@ impl NotificationManager {
}
fn is_maintenance_mode(&self) -> bool {
std::fs::metadata("/tmp/cm-maintenance").is_ok()
std::fs::metadata(&self.config.maintenance_mode_file).is_ok()
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard"
version = "0.1.61"
version = "0.1.69"
edition = "2021"
[dependencies]

View File

@@ -67,11 +67,8 @@ impl Dashboard {
}
};
// Connect to configured hosts from configuration
let hosts: Vec<String> = config.hosts.keys().cloned().collect();
// Try to connect to hosts but don't fail if none are available
match zmq_consumer.connect_to_predefined_hosts(&hosts).await {
match zmq_consumer.connect_to_predefined_hosts(&config.hosts).await {
Ok(_) => info!("Successfully connected to ZMQ hosts"),
Err(e) => {
warn!(
@@ -149,6 +146,8 @@ impl Dashboard {
let mut last_metrics_check = Instant::now();
let metrics_check_interval = Duration::from_millis(100); // Check for metrics every 100ms
let mut last_heartbeat_check = Instant::now();
let heartbeat_check_interval = Duration::from_secs(1); // Check for host connectivity every 1 second
loop {
// Handle terminal events (keyboard input) only if not headless
@@ -254,14 +253,8 @@ impl Dashboard {
}
}
// Update TUI with new hosts and metrics (only if not headless)
// Update TUI with new metrics (only if not headless)
if let Some(ref mut tui_app) = self.tui_app {
let connected_hosts = self
.metric_store
.get_connected_hosts(Duration::from_secs(self.config.zmq.heartbeat_timeout_seconds));
tui_app.update_hosts(connected_hosts);
tui_app.update_metrics(&self.metric_store);
}
}
@@ -280,6 +273,20 @@ impl Dashboard {
last_metrics_check = Instant::now();
}
// Check for host connectivity changes (heartbeat timeouts) periodically
if last_heartbeat_check.elapsed() >= heartbeat_check_interval {
let timeout = Duration::from_secs(self.config.zmq.heartbeat_timeout_seconds);
// Clean up metrics for offline hosts
self.metric_store.cleanup_offline_hosts(timeout);
if let Some(ref mut tui_app) = self.tui_app {
let connected_hosts = self.metric_store.get_connected_hosts(timeout);
tui_app.update_hosts(connected_hosts);
}
last_heartbeat_check = Instant::now();
}
// Render TUI (only if not headless)
if !self.headless {
if let Some(ref mut terminal) = self.terminal {

View File

@@ -84,13 +84,13 @@ impl ZmqConsumer {
}
}
/// Connect to predefined hosts
pub async fn connect_to_predefined_hosts(&mut self, hosts: &[String]) -> Result<()> {
/// Connect to predefined hosts using their configuration
pub async fn connect_to_predefined_hosts(&mut self, hosts: &std::collections::HashMap<String, crate::config::HostDetails>) -> Result<()> {
let default_port = self.config.subscriber_ports[0];
for hostname in hosts {
// Try to connect, but don't fail if some hosts are unreachable
if let Err(e) = self.connect_to_host(hostname, default_port).await {
for (hostname, host_details) in hosts {
// Try to connect using configured IP, but don't fail if some hosts are unreachable
if let Err(e) = self.connect_to_host_with_details(hostname, host_details, default_port).await {
warn!("Could not connect to {}: {}", hostname, e);
}
}
@@ -104,6 +104,29 @@ impl ZmqConsumer {
Ok(())
}
/// Connect to a host using its configuration details with fallback support
pub async fn connect_to_host_with_details(&mut self, hostname: &str, host_details: &crate::config::HostDetails, port: u16) -> Result<()> {
// Get primary connection IP
let primary_ip = host_details.get_connection_ip(hostname);
// Try primary connection
if let Ok(()) = self.connect_to_host(&primary_ip, port).await {
info!("Connected to {} via primary address: {}", hostname, primary_ip);
return Ok(());
}
// Try fallback IPs if primary fails
let fallbacks = host_details.get_fallback_ips(hostname);
for fallback_ip in fallbacks {
if let Ok(()) = self.connect_to_host(&fallback_ip, port).await {
info!("Connected to {} via fallback address: {}", hostname, fallback_ip);
return Ok(());
}
}
Err(anyhow::anyhow!("Failed to connect to {} using all available addresses", hostname))
}
/// Receive command output from any connected agent (non-blocking)
pub async fn receive_command_output(&mut self) -> Result<Option<CommandOutputMessage>> {
match self.subscriber.recv_bytes(zmq::DONTWAIT) {

View File

@@ -29,6 +29,78 @@ fn default_heartbeat_timeout_seconds() -> u64 {
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct HostDetails {
pub mac_address: Option<String>,
/// Primary IP address (local network)
pub ip: Option<String>,
/// Tailscale network IP address
pub tailscale_ip: Option<String>,
/// Preferred connection type: "local", "tailscale", or "auto" (fallback)
#[serde(default = "default_connection_type")]
pub connection_type: String,
}
fn default_connection_type() -> String {
"auto".to_string()
}
impl HostDetails {
/// Get the preferred IP address for connection based on connection_type
pub fn get_connection_ip(&self, hostname: &str) -> String {
match self.connection_type.as_str() {
"tailscale" => {
if let Some(ref ts_ip) = self.tailscale_ip {
ts_ip.clone()
} else {
// Fallback to local IP or hostname
self.ip.as_ref().unwrap_or(&hostname.to_string()).clone()
}
}
"local" => {
if let Some(ref local_ip) = self.ip {
local_ip.clone()
} else {
hostname.to_string()
}
}
"auto" | _ => {
// Try local first, then tailscale, then hostname
if let Some(ref local_ip) = self.ip {
local_ip.clone()
} else if let Some(ref ts_ip) = self.tailscale_ip {
ts_ip.clone()
} else {
hostname.to_string()
}
}
}
}
/// Get fallback IP addresses for connection retry
pub fn get_fallback_ips(&self, hostname: &str) -> Vec<String> {
let mut fallbacks = Vec::new();
// Add all available IPs except the primary one
let primary = self.get_connection_ip(hostname);
// Add fallbacks in priority order: local first, then tailscale
if let Some(ref local_ip) = self.ip {
if local_ip != &primary {
fallbacks.push(local_ip.clone());
}
}
if let Some(ref ts_ip) = self.tailscale_ip {
if ts_ip != &primary {
fallbacks.push(ts_ip.clone());
}
}
// Always include hostname as final fallback if not already primary
if hostname != primary {
fallbacks.push(hostname.to_string());
}
fallbacks
}
}
/// System configuration

View File

@@ -109,6 +109,28 @@ impl MetricStore {
.collect()
}
/// Clean up data for offline hosts
pub fn cleanup_offline_hosts(&mut self, timeout: Duration) {
let now = Instant::now();
let mut hosts_to_cleanup = Vec::new();
// Find hosts that are offline (no recent heartbeat)
for (hostname, &last_heartbeat) in &self.last_heartbeat {
if now.duration_since(last_heartbeat) > timeout {
hosts_to_cleanup.push(hostname.clone());
}
}
// Clear metrics for offline hosts
for hostname in hosts_to_cleanup {
if let Some(metrics) = self.current_metrics.remove(&hostname) {
info!("Cleared {} metrics for offline host: {}", metrics.len(), hostname);
}
// Keep heartbeat timestamp for reconnection detection
// Don't remove from last_heartbeat to track when host was last seen
}
}
/// Cleanup old data and enforce limits
fn cleanup_host_data(&mut self, hostname: &str) {
let now = Instant::now();

View File

@@ -251,12 +251,14 @@ impl TuiApp {
KeyCode::Char('r') => {
// System rebuild command - works on any panel for current host
if let Some(hostname) = self.current_host.clone() {
let connection_ip = self.get_connection_ip(&hostname);
// Create command that shows logo, rebuilds, and waits for user input
let logo_and_rebuild = format!(
"bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {}\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
"bash -c 'cat << \"EOF\"\nNixOS System Rebuild\nTarget: {} ({})\n\nEOF\nssh -tt {}@{} \"bash -ic {}\"\necho\necho \"========================================\"\necho \"Rebuild completed. Press any key to close...\"\necho \"========================================\"\nread -n 1 -s\nexit'",
hostname,
connection_ip,
self.config.ssh.rebuild_user,
hostname,
connection_ip,
self.config.ssh.rebuild_alias
);
@@ -289,10 +291,11 @@ impl TuiApp {
KeyCode::Char('J') => {
// Show service logs via journalctl in tmux split window
if let (Some(service_name), Some(hostname)) = (self.get_selected_service(), self.current_host.clone()) {
let connection_ip = self.get_connection_ip(&hostname);
let journalctl_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo journalctl -u {}.service -f --no-pager -n 50'; exit\"",
self.config.ssh.rebuild_user,
hostname,
connection_ip,
service_name
);
@@ -312,10 +315,11 @@ impl TuiApp {
// Check if this service has a custom log file configured
if let Some(host_logs) = self.config.service_logs.get(&hostname) {
if let Some(log_config) = host_logs.iter().find(|config| config.service_name == service_name) {
let connection_ip = self.get_connection_ip(&hostname);
let tail_command = format!(
"bash -c \"ssh -tt {}@{} 'sudo tail -n 50 -f {}'; exit\"",
self.config.ssh.rebuild_user,
hostname,
connection_ip,
log_config.log_file_path
);
@@ -365,6 +369,26 @@ impl TuiApp {
}
}
}
KeyCode::Char('t') => {
// Open SSH terminal session in tmux window
if let Some(hostname) = self.current_host.clone() {
let connection_ip = self.get_connection_ip(&hostname);
let ssh_command = format!(
"ssh -tt {}@{}",
self.config.ssh.rebuild_user,
connection_ip
);
std::process::Command::new("tmux")
.arg("split-window")
.arg("-v")
.arg("-p")
.arg("30") // Use 30% like other commands
.arg(&ssh_command)
.spawn()
.ok(); // Ignore errors, tmux will handle them
}
}
KeyCode::Tab => {
// Tab cycles to next host
self.navigate_host(1);
@@ -558,6 +582,21 @@ impl TuiApp {
])
.split(main_chunks[1]); // main_chunks[1] is now the content area (between title and statusbar)
// Check if current host is offline
let current_host_offline = if let Some(hostname) = self.current_host.clone() {
self.calculate_host_status(&hostname, metric_store) == Status::Offline
} else {
true // No host selected is considered offline
};
// If host is offline, render wake-up message instead of panels
if current_host_offline {
self.render_offline_host_message(frame, main_chunks[1]);
self.render_btop_title(frame, main_chunks[0], metric_store);
self.render_statusbar(frame, main_chunks[2]);
return;
}
// Check if backup panel should be shown
let show_backup = if let Some(hostname) = self.current_host.clone() {
let host_widgets = self.get_or_create_host_widgets(&hostname);
@@ -790,8 +829,10 @@ impl TuiApp {
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.system_scroll_offset
};
// Clone the config to avoid borrowing issues
let config = self.config.clone();
let host_widgets = self.get_or_create_host_widgets(&hostname);
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname);
host_widgets.system_widget.render_with_scroll(frame, inner_area, scroll_offset, &hostname, Some(&config));
}
}
@@ -811,7 +852,87 @@ impl TuiApp {
}
}
/// Render offline host message with wake-up option
fn render_offline_host_message(&self, frame: &mut Frame, area: Rect) {
use ratatui::layout::Alignment;
use ratatui::style::Modifier;
use ratatui::text::{Line, Span};
use ratatui::widgets::{Block, Borders, Paragraph};
// Get hostname for message
let hostname = self.current_host.as_ref()
.map(|h| h.as_str())
.unwrap_or("Unknown");
// Check if host has MAC address for wake-on-LAN
let has_mac = self.current_host.as_ref()
.and_then(|hostname| self.config.hosts.get(hostname))
.and_then(|details| details.mac_address.as_ref())
.is_some();
// Create message content
let mut lines = vec![
Line::from(Span::styled(
format!("Host '{}' is offline", hostname),
Style::default().fg(Theme::muted_text()).add_modifier(Modifier::BOLD),
)),
Line::from(""),
];
if has_mac {
lines.push(Line::from(Span::styled(
"Press 'w' to wake up host",
Style::default().fg(Theme::primary_text()).add_modifier(Modifier::BOLD),
)));
} else {
lines.push(Line::from(Span::styled(
"No MAC address configured - cannot wake up",
Style::default().fg(Theme::muted_text()),
)));
}
// Create centered message
let message = Paragraph::new(lines)
.block(Block::default()
.borders(Borders::ALL)
.border_style(Style::default().fg(Theme::muted_text()))
.title(" Offline Host ")
.title_style(Style::default().fg(Theme::muted_text()).add_modifier(Modifier::BOLD)))
.style(Style::default().bg(Theme::background()).fg(Theme::primary_text()))
.alignment(Alignment::Center);
// Center the message in the available area
let popup_area = ratatui::layout::Layout::default()
.direction(Direction::Vertical)
.constraints([
Constraint::Percentage(40),
Constraint::Length(6),
Constraint::Percentage(40),
])
.split(area)[1];
let popup_area = ratatui::layout::Layout::default()
.direction(Direction::Horizontal)
.constraints([
Constraint::Percentage(25),
Constraint::Percentage(50),
Constraint::Percentage(25),
])
.split(popup_area)[1];
frame.render_widget(message, popup_area);
}
/// Parse MAC address string (e.g., "AA:BB:CC:DD:EE:FF") to [u8; 6]
/// Get the connection IP for a hostname based on host configuration
fn get_connection_ip(&self, hostname: &str) -> String {
if let Some(host_details) = self.config.hosts.get(hostname) {
host_details.get_connection_ip(hostname)
} else {
hostname.to_string()
}
}
fn parse_mac_address(mac_str: &str) -> Result<[u8; 6], &'static str> {
let parts: Vec<&str> = mac_str.split(':').collect();
if parts.len() != 6 {

View File

@@ -439,7 +439,7 @@ impl Widget for SystemWidget {
impl SystemWidget {
/// Render with scroll offset support
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str) {
pub fn render_with_scroll(&mut self, frame: &mut Frame, area: Rect, scroll_offset: usize, hostname: &str, config: Option<&crate::config::DashboardConfig>) {
let mut lines = Vec::new();
// NixOS section
@@ -457,6 +457,16 @@ impl SystemWidget {
Span::styled(format!("Agent: {}", agent_version_text), Typography::secondary())
]));
// Display detected connection IP
if let Some(config) = config {
if let Some(host_details) = config.hosts.get(hostname) {
let detected_ip = host_details.get_connection_ip(hostname);
lines.push(Line::from(vec![
Span::styled(format!("IP: {}", detected_ip), Typography::secondary())
]));
}
}
// CPU section
lines.push(Line::from(vec![

View File

@@ -1,6 +1,6 @@
[package]
name = "cm-dashboard-shared"
version = "0.1.61"
version = "0.1.69"
edition = "2021"
[dependencies]