Running a hosting business without monitoring is guessing. You need to know when servers are overloaded, when disks are filling up, and when customers are having problems - before they tell you.
What to Monitor
Infrastructure Level
- CPU usage - Per-core utilization, load averages
- Memory - Used, cached, swap usage
- Disk - Space usage, I/O throughput and latency
- Network - Bandwidth usage, packet loss, latency
- Uptime - Service availability checks
Application Level
- Pterodactyl Panel - Web response time, queue processing
- Game servers - Process status, player counts, tick rates
- Database - Query performance, connection counts
- WHMCS - Cron execution, payment processing
Business Level
- Active customers - Signups, cancellations, churn rate
- Support volume - Ticket count, response times
- Revenue - MRR, average revenue per user
Monitoring Tools
Uptime Kuma (Free, Self-Hosted)
Lightweight uptime monitor with a clean dashboard:
- HTTP/HTTPS monitoring
- TCP port checking
- DNS monitoring
- Alert notifications (Discord, Telegram, email)
Installation is simple via Docker:
docker run -d --name uptime-kuma -p 3001:3001 -v uptime-kuma:/app/data louislam/uptime-kuma
Netdata (Free, Self-Hosted)
Real-time system monitoring with beautiful dashboards:
- Per-second metrics for CPU, memory, disk, network
- Automatic anomaly detection
- Zero configuration for basic monitoring
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
Grafana + Prometheus (Free, Self-Hosted)
For advanced monitoring:
- Prometheus collects and stores metrics
- Grafana visualizes them in custom dashboards
- node_exporter provides system metrics
- Custom exporters for application-specific metrics
Alerting
Alert Channels
Set up multiple alert channels:
- Discord webhook - For team notifications
- Email - For business-critical alerts
- SMS/phone - For emergency alerts (server down)
Alert Thresholds
Configure sensible thresholds:
| Metric | Warning | Critical | |--------|---------|----------| | CPU usage | >80% for 5min | >95% for 2min | | Memory | >85% used | >95% used | | Disk space | >80% full | >90% full | | Disk I/O | >80% utilization | >95% utilization | | HTTP response | >2 seconds | >5 seconds or down |
Alert Fatigue
Too many alerts is worse than too few - you'll start ignoring them all. Only alert on actionable conditions that need immediate attention.
Automation
Auto-Restart
If a service crashes, restart it automatically:
# Systemd service with auto-restart
[Service]
Restart=always
RestartSec=5
Disk Cleanup
Automate log rotation and temporary file cleanup:
# Cron job for Pterodactyl backup cleanup
0 3 * * * find /var/lib/pterodactyl/backups -mtime +7 -delete
Certificate Renewal
Use Certbot with auto-renewal:
certbot renew --deploy-hook "systemctl reload nginx"
Proactive Maintenance
Weekly Checks
- Review resource usage trends
- Check for pending OS updates
- Verify backup completion
- Review error logs for recurring issues
Monthly Tasks
- Apply security updates
- Test backup restoration
- Review capacity and plan for growth
- Analyze support ticket patterns
Quarterly Reviews
- Performance baseline comparison
- Infrastructure cost review
- Tool and software updates
- Full disaster recovery test
Status Page
A public status page builds customer trust:
- Shows real-time infrastructure status
- Displays planned maintenance windows
- Documents incident history and post-mortems
- Reduces "is it just me?" support tickets
Use Uptime Kuma's built-in status page or a dedicated solution like Instatus.
Monitoring is insurance for your hosting business. The setup takes a few hours, but it prevents the kind of problems that cost days of customers and revenue.
