Testing
This commit is contained in:
106
CLAUDE.md
106
CLAUDE.md
@@ -184,25 +184,103 @@ Keys: [←→] hosts [r]efresh [q]uit
|
||||
Keys: [Enter] details [r]efresh [s]ort [f]ilter [q]uit
|
||||
```
|
||||
|
||||
## Development Status
|
||||
## Architecture Principles - CRITICAL
|
||||
|
||||
### Immediate TODOs
|
||||
### Agent-Dashboard Separation of Concerns
|
||||
|
||||
- Refactor all dashboard widgets to use a shared table/layout helper so icons, padding, and titles remain consistent across panels
|
||||
**AGENT IS SINGLE SOURCE OF TRUTH FOR ALL STATUS CALCULATIONS**
|
||||
- Agent calculates status ("ok"/"warning"/"critical"/"unknown") using defined thresholds
|
||||
- Agent sends status to dashboard via ZMQ
|
||||
- Dashboard NEVER calculates status - only displays what agent provides
|
||||
|
||||
- Investigate why the backup metrics agent is not publishing data to the dashboard
|
||||
- Resize the services widget so it can display more services without truncation
|
||||
- Remove the dedicated status widget and redistribute the layout space
|
||||
- Add responsive scaling within each widget so columns and content adapt dynamically
|
||||
**Data Flow Architecture:**
|
||||
```
|
||||
Agent (calculations + thresholds) → Status → Dashboard (display only) → TableBuilder (colors)
|
||||
```
|
||||
|
||||
### Phase 3: Advanced Features 🚧 IN PROGRESS
|
||||
**Status Handling Rules:**
|
||||
- Agent provides status → Dashboard uses agent status
|
||||
- Agent doesn't provide status → Dashboard shows "unknown" (NOT "ok")
|
||||
- Dashboard widgets NEVER contain hardcoded thresholds
|
||||
- TableBuilder converts status to colors for display
|
||||
|
||||
- [x] ZMQ gossip network implementation
|
||||
- [x] Comprehensive error handling
|
||||
- [x] Performance optimizations
|
||||
- [ ] Predictive analytics for wear levels
|
||||
- [ ] Custom alert rules engine
|
||||
- [ ] Historical data export capabilities
|
||||
### Current Agent Thresholds (as of 2025-10-12)
|
||||
|
||||
**CPU Load (service.rs:392-400):**
|
||||
- Warning: ≥ 2.0 (testing value, was 5.0)
|
||||
- Critical: ≥ 4.0 (testing value, was 8.0)
|
||||
|
||||
**CPU Temperature (service.rs:412-420):**
|
||||
- Warning: ≥ 70.0°C
|
||||
- Critical: ≥ 80.0°C
|
||||
|
||||
**Memory Usage (service.rs:402-410):**
|
||||
- Warning: ≥ 80%
|
||||
- Critical: ≥ 95%
|
||||
|
||||
### Email Notifications
|
||||
|
||||
**System Configuration:**
|
||||
- From: `{hostname}@cmtec.se` (e.g., cmbox@cmtec.se)
|
||||
- To: `cm@cmtec.se`
|
||||
- SMTP: localhost:25 (postfix)
|
||||
- Timezone: Europe/Stockholm (not UTC)
|
||||
|
||||
**Notification Triggers:**
|
||||
- Status degradation: any → "warning" or "critical"
|
||||
- Recovery: "warning"/"critical" → "ok"
|
||||
- Rate limiting: configurable (set to 0 for testing, 30 minutes for production)
|
||||
|
||||
**Monitored Components:**
|
||||
- system.cpu (load status)
|
||||
- system.cpu_temp (temperature status)
|
||||
- system.memory (usage status)
|
||||
- system.services (service health status)
|
||||
- storage.smart (drive health)
|
||||
- backup.overall (backup status)
|
||||
|
||||
### Pure Auto-Discovery Implementation
|
||||
|
||||
**Agent Configuration:**
|
||||
- No config files required
|
||||
- Auto-detects storage devices, services, backup systems
|
||||
- Runtime discovery of system capabilities
|
||||
- CLI: `cm-dashboard-agent [-v]` (only verbose flag)
|
||||
|
||||
**Service Discovery:**
|
||||
- Scans running systemd services
|
||||
- Filters by predefined interesting patterns (gitea, nginx, docker, etc.)
|
||||
- No host-specific hardcoded service lists
|
||||
|
||||
### Current Implementation Status
|
||||
|
||||
**Completed:**
|
||||
- [x] Pure auto-discovery agent (no config files)
|
||||
- [x] Agent-side status calculations with defined thresholds
|
||||
- [x] Dashboard displays agent status (no dashboard calculations)
|
||||
- [x] Email notifications with Stockholm timezone
|
||||
- [x] CPU temperature monitoring and notifications
|
||||
- [x] ZMQ message format standardization
|
||||
- [x] Removed all hardcoded dashboard thresholds
|
||||
|
||||
**Testing Configuration (REVERT FOR PRODUCTION):**
|
||||
- CPU thresholds lowered to 2.0/4.0 for easy testing
|
||||
- Email rate limiting disabled (0 minutes)
|
||||
|
||||
### Development Guidelines
|
||||
|
||||
**When Adding New Metrics:**
|
||||
1. Agent calculates status with thresholds
|
||||
2. Agent adds `{metric}_status` field to JSON output
|
||||
3. Dashboard data structure adds `{metric}_status: Option<String>`
|
||||
4. Dashboard uses `status_level_from_agent_status()` for display
|
||||
5. Agent adds notification monitoring for status changes
|
||||
|
||||
**NEVER:**
|
||||
- Add hardcoded thresholds to dashboard widgets
|
||||
- Calculate status in dashboard with different thresholds than agent
|
||||
- Use "ok" as default when agent status is missing (use "unknown")
|
||||
- Calculate colors in widgets (TableBuilder's responsibility)
|
||||
|
||||
# Important Communication Guidelines
|
||||
|
||||
|
||||
Reference in New Issue
Block a user