Updated readme
This commit is contained in:
parent
0417e2c1f1
commit
1315ba1315
91
README.md
91
README.md
@ -5,6 +5,7 @@ A real-time infrastructure monitoring system with intelligent status aggregation
|
||||
## Current Implementation
|
||||
|
||||
This is a complete rewrite implementing an **individual metrics architecture** where:
|
||||
|
||||
- **Agent** collects individual metrics (e.g., `cpu_load_1min`, `memory_usage_percent`) and calculates status
|
||||
- **Dashboard** subscribes to specific metrics and composes widgets
|
||||
- **Status Aggregation** provides intelligent email notifications with batching
|
||||
@ -14,39 +15,39 @@ This is a complete rewrite implementing an **individual metrics architecture** w
|
||||
|
||||
```
|
||||
cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
||||
┌system───────────────────────────────────────────┐┌services────────────────────────────────────────────────────┐
|
||||
│CPU: ││Service: Status: RAM: Disk: │
|
||||
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │
|
||||
│RAM: ││● docker-registry active 19M 496MB │
|
||||
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │
|
||||
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │
|
||||
│Disk nvme0n1: ││● haasp-core active 9M 1MB │
|
||||
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │
|
||||
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │
|
||||
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │
|
||||
│ ││● mosquitto active 1M 1MB │
|
||||
│ ││● mysql active 38M 225MB │
|
||||
│ ││● nginx active 28M 24MB │
|
||||
│ ││ ├─ ● gitea.cmtec.se 51ms │
|
||||
│ ││ ├─ ● haasp.cmtec.se 43ms │
|
||||
│ ││ ├─ ● haasp.net 43ms │
|
||||
│ ││ ├─ ● pages.cmtec.se 45ms │
|
||||
└─────────────────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
|
||||
┌backup───────────────────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
|
||||
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
|
||||
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
|
||||
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
|
||||
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
|
||||
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
|
||||
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
|
||||
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
|
||||
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
|
||||
│● kryddorten 2 archives 67.6MB ││ │
|
||||
│● mariehall2 2 archives 321.8MB ││ │
|
||||
│● nixosbox 2 archives 4.5MB ││ │
|
||||
│● unifi 2 archives 2.9MB ││ │
|
||||
│● vaultwarden 2 archives 305kB ││ │
|
||||
└─────────────────────────────────────────────────┘└────────────────────────────────────────────────────────────┘
|
||||
┌system──────────────────────────────┐┌services─────────────────────────────────────────┐
|
||||
│CPU: ││Service: Status: RAM: Disk: │
|
||||
│● Load: 0.10 0.52 0.88 • 400.0 MHz ││● docker active 27M 496MB │
|
||||
│RAM: ││● docker-registry active 19M 496MB │
|
||||
│● Used: 30% 2.3GB/7.6GB ││● gitea active 579M 2.6GB │
|
||||
│● tmp: 0.0% 0B/2.0GB ││● gitea-runner-default active 11M 2.6GB │
|
||||
│Disk nvme0n1: ││● haasp-core active 9M 1MB │
|
||||
│● Health: PASSED ││● haasp-mqtt active 3M 1MB │
|
||||
│● Usage @root: 8.3% • 75.4/906.2 GB ││● haasp-webgrid active 10M 1MB │
|
||||
│● Usage @boot: 5.9% • 0.1/1.0 GB ││● immich-server active 240M 45.1GB │
|
||||
│ ││● mosquitto active 1M 1MB │
|
||||
│ ││● mysql active 38M 225MB │
|
||||
│ ││● nginx active 28M 24MB │
|
||||
│ ││ ├─ ● gitea.cmtec.se 51ms │
|
||||
│ ││ ├─ ● haasp.cmtec.se 43ms │
|
||||
│ ││ ├─ ● haasp.net 43ms │
|
||||
│ ││ ├─ ● pages.cmtec.se 45ms │
|
||||
└────────────────────────────────────┘│ ├─ ● photos.cmtec.se 41ms │
|
||||
┌backup──────────────────────────────┐│ ├─ ● unifi.cmtec.se 46ms │
|
||||
│Latest backup: ││ ├─ ● vault.cmtec.se 47ms │
|
||||
│● Status: OK ││ ├─ ● www.kryddorten.se 81ms │
|
||||
│Duration: 54s • Last: 4h ago ││ ├─ ● www.mariehall2.se 86ms │
|
||||
│Disk usage: 48.2GB/915.8GB ││● postgresql active 112M 357MB │
|
||||
│P/N: Samsung SSD 870 QVO 1TB ││● redis-immich active 8M 45.1GB │
|
||||
│S/N: S5RRNF0W800639Y ││● sshd active 2M 0 │
|
||||
│● gitea 2 archives 2.7GB ││● unifi active 594M 495MB │
|
||||
│● immich 2 archives 45.0GB ││● vaultwarden active 12M 1MB │
|
||||
│● kryddorten 2 archives 67.6MB ││ │
|
||||
│● mariehall2 2 archives 321.8MB ││ │
|
||||
│● nixosbox 2 archives 4.5MB ││ │
|
||||
│● unifi 2 archives 2.9MB ││ │
|
||||
│● vaultwarden 2 archives 305kB ││ │
|
||||
└────────────────────────────────────┘└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Navigation**: `←→` switch hosts, `r` refresh, `q` quit
|
||||
@ -54,7 +55,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
||||
## Features
|
||||
|
||||
- **Real-time monitoring** - Dashboard updates every 1-2 seconds
|
||||
- **Individual metric collection** - Granular data for flexible dashboard composition
|
||||
- **Individual metric collection** - Granular data for flexible dashboard composition
|
||||
- **Intelligent status aggregation** - Host-level status calculated from all services
|
||||
- **Smart email notifications** - Batched, detailed alerts with service groupings
|
||||
- **Persistent state** - Prevents false notifications on restarts
|
||||
@ -66,7 +67,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
||||
### Core Components
|
||||
|
||||
- **Agent** (`cm-dashboard-agent`) - Collects metrics and sends via ZMQ
|
||||
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
|
||||
- **Dashboard** (`cm-dashboard`) - Real-time TUI display consuming metrics
|
||||
- **Shared** (`cm-dashboard-shared`) - Common types and protocol
|
||||
- **Status Aggregation** - Intelligent batching and notification management
|
||||
- **Persistent Cache** - Maintains state across restarts
|
||||
@ -74,7 +75,7 @@ cm-dashboard • ● cmbox ● srv01 ● srv02 ● steambox
|
||||
### Status Levels
|
||||
|
||||
- **🟢 Ok** - Service running normally
|
||||
- **🔵 Pending** - Service starting/stopping/reloading
|
||||
- **🔵 Pending** - Service starting/stopping/reloading
|
||||
- **🟡 Warning** - Service issues (high load, memory, disk usage)
|
||||
- **🔴 Critical** - Service failed or critical thresholds exceeded
|
||||
- **❓ Unknown** - Service state cannot be determined
|
||||
@ -98,7 +99,7 @@ cargo build --workspace
|
||||
# Start agent (requires configuration file)
|
||||
./target/debug/cm-dashboard-agent --config /etc/cm-dashboard/agent.toml
|
||||
|
||||
# Start dashboard
|
||||
# Start dashboard
|
||||
./target/debug/cm-dashboard --config /path/to/dashboard.toml
|
||||
```
|
||||
|
||||
@ -199,30 +200,35 @@ theme = "dark"
|
||||
The agent implements several specialized collectors:
|
||||
|
||||
### CPU Collector (`cpu.rs`)
|
||||
|
||||
- Load average (1, 5, 15 minute)
|
||||
- CPU temperature monitoring
|
||||
- Real-time process monitoring (top CPU consumers)
|
||||
- Status calculation with configurable thresholds
|
||||
|
||||
### Memory Collector (`memory.rs`)
|
||||
### Memory Collector (`memory.rs`)
|
||||
|
||||
- RAM usage (total, used, available)
|
||||
- Swap monitoring
|
||||
- Real-time process monitoring (top RAM consumers)
|
||||
- Memory pressure detection
|
||||
|
||||
### Disk Collector (`disk.rs`)
|
||||
|
||||
- Filesystem usage per mount point
|
||||
- SMART health monitoring
|
||||
- Temperature and wear tracking
|
||||
- Configurable filesystem monitoring
|
||||
|
||||
### Systemd Collector (`systemd.rs`)
|
||||
|
||||
- Service status monitoring (`active`, `inactive`, `failed`)
|
||||
- Memory usage per service
|
||||
- Service filtering and exclusions
|
||||
- Handles transitional states (`Status::Pending`)
|
||||
|
||||
### Backup Collector (`backup.rs`)
|
||||
|
||||
- Reads TOML status files from backup systems
|
||||
- Archive age verification
|
||||
- Disk usage tracking
|
||||
@ -270,6 +276,7 @@ Generated at 2025-10-21 19:42:42 CET
|
||||
The system follows a **metrics-first architecture**:
|
||||
|
||||
### Agent Side
|
||||
|
||||
```rust
|
||||
// Agent collects individual metrics
|
||||
vec![
|
||||
@ -280,6 +287,7 @@ vec![
|
||||
```
|
||||
|
||||
### Dashboard Side
|
||||
|
||||
```rust
|
||||
// Widgets subscribe to specific metrics
|
||||
impl Widget for CpuWidget {
|
||||
@ -337,7 +345,7 @@ cm-dashboard/
|
||||
# Debug build
|
||||
cargo build --workspace
|
||||
|
||||
# Release build
|
||||
# Release build
|
||||
cargo build --workspace --release
|
||||
|
||||
# Run tests
|
||||
@ -395,11 +403,12 @@ sudo nixos-rebuild switch --flake .
|
||||
- **CPU/Memory**: 2 seconds (real-time monitoring)
|
||||
- **Disk usage**: 300 seconds (5 minutes)
|
||||
- **Systemd services**: 10 seconds
|
||||
- **SMART health**: 600 seconds (10 minutes)
|
||||
- **SMART health**: 600 seconds (10 minutes)
|
||||
- **Backup status**: 60 seconds (1 minute)
|
||||
- **Email notifications**: 30 seconds (batched)
|
||||
- **Dashboard updates**: 1 second (real-time display)
|
||||
|
||||
## License
|
||||
|
||||
MIT License - see LICENSE file for details
|
||||
MIT License - see LICENSE file for details
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user