The worst kind of bot failure is the silent one. Your bot crashes at 3 AM, no errors in the logs, and you don't notice until someone messages you at noon asking why the bot is down.
Why Bots Die Silently
| Cause | Frequency | Detection | |-------|-----------|-----------| | Memory leak (OOM kill) | Common | System logs only | | Discord gateway disconnect | Common | No error if not caught | | Unhandled promise rejection | Very common | Process exits silently | | Host restarts | Occasional | No notification | | Rate limiting | Rare | Bot stops responding but stays "alive" |
The most insidious is the memory leak. Your Node.js or Python bot slowly consumes more RAM until the OS kills it. The process just disappears. No crash log, no error output, nothing.
Process Managers
PM2 (Node.js)
npm install -g pm2
pm2 start bot.js --name "my-bot"
pm2 save
pm2 startup
PM2 automatically:
- Restarts the bot if it crashes
- Starts the bot on system boot
- Logs all output (including crash reasons)
- Monitors memory usage
Set a memory limit to prevent OOM kills:
pm2 start bot.js --max-memory-restart 200M
When the bot exceeds 200MB RAM, PM2 restarts it cleanly instead of letting the OS kill it violently.
Systemd (Any Language)
[Unit]
Description=My Discord Bot
After=network.target
[Service]
Type=simple
User=botuser
WorkingDirectory=/home/botuser/bot
ExecStart=/usr/bin/node bot.js
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Restart=always means the bot restarts after any exit, including crashes, OOM kills, and manual stops.
Health Checks
Internal Health Check
Your bot should report its own status:
// Every 5 minutes, check if the bot is actually working
setInterval(() => {
if (!client.ws.ping) {
console.error('WebSocket connection lost, restarting...');
process.exit(1); // PM2 will restart
}
const memUsage = process.memoryUsage().heapUsed / 1024 / 1024;
console.log(`Health: OK | Memory: ${memUsage.toFixed(1)}MB | Ping: ${client.ws.ping}ms`);
}, 300000);
External Health Check
Run a separate monitoring script that checks if the bot is responsive:
import discord
import aiohttp
import asyncio
async def check_bot():
try:
async with aiohttp.ClientSession() as session:
# Check if bot is in the expected guild
headers = {'Authorization': f'Bot {BOT_TOKEN}'}
async with session.get('https://discord.com/api/v10/users/@me', headers=headers) as resp:
if resp.status == 200:
return True
except Exception:
pass
return False
Notification System
When the bot goes down, you need to know immediately:
Discord Webhook Alert
#!/bin/bash
# check_bot.sh - Run via cron every 2 minutes
BOT_PROCESS="bot.js"
WEBHOOK="https://discord.com/api/webhooks/YOUR_WEBHOOK"
if ! pgrep -f "$BOT_PROCESS" > /dev/null; then
curl -H "Content-Type: application/json" -d '{"content":"Bot is DOWN! Process not found. Auto-restart triggered."}' "$WEBHOOK"
cd /home/botuser/bot && pm2 restart my-bot
fi
Uptime Monitoring
If your bot has a web dashboard or health endpoint:
- UptimeRobot (free) pings your endpoint every 5 minutes
- Sends notifications via email, SMS, or webhook
Hosting Matters
Silent deaths are more common on:
- Free hosting (Replit, Glitch) that kills idle processes
- Shared hosting with aggressive resource limits
- Home computers that sleep or restart
Space-Node's Discord bot hosting starts FREE for small bots. The always-on infrastructure means your bot's process manager keeps running even when you're asleep.
The difference between a bot that's "usually online" and one that has 99.9% uptime is entirely about monitoring and automatic recovery. Set up PM2, health checks, and alerts, and your bot will recover from crashes before anyone notices.
