Updated readme
This commit is contained in:
parent
0417e2c1f1
commit
1315ba1315
91
README.md
91
README.md
@ -5,6 +5,7 @@ A real-time infrastructure monitoring system with intelligent status aggregation
|
|||||||
## Current Implementation
|
## Current Implementation
|
||||||
|
|
||||||
This is a complete rewrite implementing an **individual metrics architecture** where:
|
This is a complete rewrite implementing an **individual metrics architecture** where:
|
||||||
|
|
||||||
- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status
|
- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status
|
||||||
- **Dashboard** subscribes to specific metrics and composes widgets
|
- **Dashboard** subscribes to specific metrics and composes widgets
|
||||||
- **Status Aggregation** provides intelligent email notifications with batching
|
- **Status Aggregation** provides intelligent email notifications with batching
|
||||||
@ -14,39 +15,39 @@ This is a complete rewrite implementing an **individual metrics architecture** w
|
|||||||
|
|
||||||
```
|
```
|
||||||
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
||||||
┌system───────────────────────────────────────────┐┌services────────────────────────────────────────────────────┐
|
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
|
||||||
│CPU: ││Service: Status: RAM: Disk: │
|
│CPU: ││Service: Status: RAM: Disk: │
|
||||||
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │
|
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │
|
||||||
│RAM: ││● docker-registry active 19M 496MB │
|
│RAM: ││● docker-registry active 19M 496MB │
|
||||||
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │
|
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │
|
||||||
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │
|
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │
|
||||||
│Disk nvme0n1: ││● haasp-core active 9M 1MB │
|
│Disk nvme0n1: ││● haasp-core active 9M 1MB │
|
||||||
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │
|
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │
|
||||||
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │
|
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │
|
||||||
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │
|
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │
|
||||||
│ ││● mosquitto active 1M 1MB │
|
│ ││● mosquitto active 1M 1MB │
|
||||||
│ ││● mysql active 38M 225MB │
|
│ ││● mysql active 38M 225MB │
|
||||||
│ ││● nginx active 28M 24MB │
|
│ ││● nginx active 28M 24MB │
|
||||||
│ ││ ├─ ● gitea.cmtec.se 51ms │
|
│ ││ ├─ ● gitea.cmtec.se 51ms │
|
||||||
│ ││ ├─ ● haasp.cmtec.se 43ms │
|
│ ││ ├─ ● haasp.cmtec.se 43ms │
|
||||||
│ ││ ├─ ● haasp.net 43ms │
|
│ ││ ├─ ● haasp.net 43ms │
|
||||||
│ ││ ├─ ● pages.cmtec.se 45ms │
|
│ ││ ├─ ● pages.cmtec.se 45ms │
|
||||||
└─────────────────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
|
└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
|
||||||
┌backup───────────────────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
|
┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
|
||||||
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
|
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
|
||||||
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
|
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
|
||||||
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
|
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
|
||||||
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
|
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
|
||||||
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
|
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
|
||||||
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
|
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
|
||||||
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
|
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
|
||||||
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
|
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
|
||||||
│● kryddorten 2 archives 67.6MB ││ │
|
│● kryddorten 2 archives 67.6MB ││ │
|
||||||
│● mariehall2 2 archives 321.8MB ││ │
|
│● mariehall2 2 archives 321.8MB ││ │
|
||||||
│● nixosbox 2 archives 4.5MB ││ │
|
│● nixosbox 2 archives 4.5MB ││ │
|
||||||
│● unifi 2 archives 2.9MB ││ │
|
│● unifi 2 archives 2.9MB ││ │
|
||||||
│● vaultwarden 2 archives 305kB ││ │
|
│● vaultwarden 2 archives 305kB ││ │
|
||||||
└─────────────────────────────────────────────────┘└────────────────────────────────────────────────────────────┘
|
└────────────────────────────────────┘└─────────────────────────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
**Navigation**: `←→` switch hosts, `r` refresh, `q` quit
|
**Navigation**: `←→` switch hosts, `r` refresh, `q` quit
|
||||||
@ -54,7 +55,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
|||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Real-time monitoring** - Dashboard updates every 1-2 seconds
|
- **Real-time monitoring** - Dashboard updates every 1-2 seconds
|
||||||
- **Individual metric collection** - Granular data for flexible dashboard composition
|
- **Individual metric collection** - Granular data for flexible dashboard composition
|
||||||
- **Intelligent status aggregation** - Host-level status calculated from all services
|
- **Intelligent status aggregation** - Host-level status calculated from all services
|
||||||
- **Smart email notifications** - Batched, detailed alerts with service groupings
|
- **Smart email notifications** - Batched, detailed alerts with service groupings
|
||||||
- **Persistent state** - Prevents false notifications on restarts
|
- **Persistent state** - Prevents false notifications on restarts
|
||||||
@ -66,7 +67,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
|||||||
### Core Components
|
### Core Components
|
||||||
|
|
||||||
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
|
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
|
||||||
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
|
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
|
||||||
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
|
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
|
||||||
- **Status Aggregation** - Intelligent batching and notification management
|
- **Status Aggregation** - Intelligent batching and notification management
|
||||||
- **Persistent Cache** - Maintains state across restarts
|
- **Persistent Cache** - Maintains state across restarts
|
||||||
@ -74,7 +75,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
|||||||
### Status Levels
|
### Status Levels
|
||||||
|
|
||||||
- **🟢 Ok** - Service running normally
|
- **🟢 Ok** - Service running normally
|
||||||
- **🔵 Pending** - Service starting/stopping/reloading
|
- **🔵 Pending** - Service starting/stopping/reloading
|
||||||
- **🟡 Warning** - Service issues (high load, memory, disk usage)
|
- **🟡 Warning** - Service issues (high load, memory, disk usage)
|
||||||
- **🔴 Critical** - Service failed or critical thresholds exceeded
|
- **🔴 Critical** - Service failed or critical thresholds exceeded
|
||||||
- **❓ Unknown** - Service state cannot be determined
|
- **❓ Unknown** - Service state cannot be determined
|
||||||
@ -98,7 +99,7 @@ cargo build --workspace
|
|||||||
# Start agent (requires configuration file)
|
# Start agent (requires configuration file)
|
||||||
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
|
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
|
||||||
|
|
||||||
# Start dashboard
|
# Start dashboard
|
||||||
./target/debug/cm-dashboard --config /path/to/dashboard.toml
|
./target/debug/cm-dashboard --config /path/to/dashboard.toml
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -199,30 +200,35 @@ theme = "dark"
|
|||||||
The agent implements several specialized collectors:
|
The agent implements several specialized collectors:
|
||||||
|
|
||||||
### CPU Collector (`cpu.rs`)
|
### CPU Collector (`cpu.rs`)
|
||||||
|
|
||||||
- Load average (1, 5, 15 minute)
|
- Load average (1, 5, 15 minute)
|
||||||
- CPU temperature monitoring
|
- CPU temperature monitoring
|
||||||
- Real-time process monitoring (top CPU consumers)
|
- Real-time process monitoring (top CPU consumers)
|
||||||
- Status calculation with configurable thresholds
|
- Status calculation with configurable thresholds
|
||||||
|
|
||||||
### Memory Collector (`memory.rs`)
|
### Memory Collector (`memory.rs`)
|
||||||
|
|
||||||
- RAM usage (total, used, available)
|
- RAM usage (total, used, available)
|
||||||
- Swap monitoring
|
- Swap monitoring
|
||||||
- Real-time process monitoring (top RAM consumers)
|
- Real-time process monitoring (top RAM consumers)
|
||||||
- Memory pressure detection
|
- Memory pressure detection
|
||||||
|
|
||||||
### Disk Collector (`disk.rs`)
|
### Disk Collector (`disk.rs`)
|
||||||
|
|
||||||
- Filesystem usage per mount point
|
- Filesystem usage per mount point
|
||||||
- SMART health monitoring
|
- SMART health monitoring
|
||||||
- Temperature and wear tracking
|
- Temperature and wear tracking
|
||||||
- Configurable filesystem monitoring
|
- Configurable filesystem monitoring
|
||||||
|
|
||||||
### Systemd Collector (`systemd.rs`)
|
### Systemd Collector (`systemd.rs`)
|
||||||
|
|
||||||
- Service status monitoring (`active`, `inactive`, `failed`)
|
- Service status monitoring (`active`, `inactive`, `failed`)
|
||||||
- Memory usage per service
|
- Memory usage per service
|
||||||
- Service filtering and exclusions
|
- Service filtering and exclusions
|
||||||
- Handles transitional states (`Status::Pending`)
|
- Handles transitional states (`Status::Pending`)
|
||||||
|
|
||||||
### Backup Collector (`backup.rs`)
|
### Backup Collector (`backup.rs`)
|
||||||
|
|
||||||
- Reads TOML status files from backup systems
|
- Reads TOML status files from backup systems
|
||||||
- Archive age verification
|
- Archive age verification
|
||||||
- Disk usage tracking
|
- Disk usage tracking
|
||||||
@ -270,6 +276,7 @@ Generated at 2025-10-21 19:42:42 CET
|
|||||||
The system follows a **metrics-first architecture**:
|
The system follows a **metrics-first architecture**:
|
||||||
|
|
||||||
### Agent Side
|
### Agent Side
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
// Agent collects individual metrics
|
// Agent collects individual metrics
|
||||||
vec![
|
vec![
|
||||||
@ -280,6 +287,7 @@ vec![
|
|||||||
```
|
```
|
||||||
|
|
||||||
### Dashboard Side
|
### Dashboard Side
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
// Widgets subscribe to specific metrics
|
// Widgets subscribe to specific metrics
|
||||||
impl Widget for CpuWidget {
|
impl Widget for CpuWidget {
|
||||||
@ -337,7 +345,7 @@ cm-dashboard/
|
|||||||
# Debug build
|
# Debug build
|
||||||
cargo build --workspace
|
cargo build --workspace
|
||||||
|
|
||||||
# Release build
|
# Release build
|
||||||
cargo build --workspace --release
|
cargo build --workspace --release
|
||||||
|
|
||||||
# Run tests
|
# Run tests
|
||||||
@ -395,11 +403,12 @@ sudo nixos-rebuild switch --flake .
|
|||||||
- **CPU/Memory**: 2 seconds (real-time monitoring)
|
- **CPU/Memory**: 2 seconds (real-time monitoring)
|
||||||
- **Disk usage**: 300 seconds (5 minutes)
|
- **Disk usage**: 300 seconds (5 minutes)
|
||||||
- **Systemd services**: 10 seconds
|
- **Systemd services**: 10 seconds
|
||||||
- **SMART health**: 600 seconds (10 minutes)
|
- **SMART health**: 600 seconds (10 minutes)
|
||||||
- **Backup status**: 60 seconds (1 minute)
|
- **Backup status**: 60 seconds (1 minute)
|
||||||
- **Email notifications**: 30 seconds (batched)
|
- **Email notifications**: 30 seconds (batched)
|
||||||
- **Dashboard updates**: 1 second (real-time display)
|
- **Dashboard updates**: 1 second (real-time display)
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT License - see LICENSE file for details
|
MIT License - see LICENSE file for details
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user