Preventing Silent Bot Deaths: 24/7 Supervision Strategies

Published on

How to keep your Discord bot running without mysterious crashes. Covers process managers, health checks, automatic restarts, and monitoring for bots that die silently.

Written by Space-Node Team – Infrastructure Team – 15+ years combined experience in game server hosting, VPS infrastructure, and 24/7 streaming solutions. Read author bio →

The worst kind of bot failure is the silent one. Your bot crashes at 3 AM, no errors in the logs, and you don't notice until someone messages you at noon asking why the bot is down.

Discord bot monitoring dashboard

Why Bots Die Silently

| Cause | Frequency | Detection | |-------|-----------|-----------| | Memory leak (OOM kill) | Common | System logs only | | Discord gateway disconnect | Common | No error if not caught | | Unhandled promise rejection | Very common | Process exits silently | | Host restarts | Occasional | No notification | | Rate limiting | Rare | Bot stops responding but stays "alive" |

The most insidious is the memory leak. Your Node.js or Python bot slowly consumes more RAM until the OS kills it. The process just disappears. No crash log, no error output, nothing.

Process Managers

PM2 (Node.js)

npm install -g pm2
pm2 start bot.js --name "my-bot"
pm2 save
pm2 startup

PM2 automatically:

  • Restarts the bot if it crashes
  • Starts the bot on system boot
  • Logs all output (including crash reasons)
  • Monitors memory usage

Set a memory limit to prevent OOM kills:

pm2 start bot.js --max-memory-restart 200M

When the bot exceeds 200MB RAM, PM2 restarts it cleanly instead of letting the OS kill it violently.

Systemd (Any Language)

[Unit]
Description=My Discord Bot
After=network.target

[Service]
Type=simple
User=botuser
WorkingDirectory=/home/botuser/bot
ExecStart=/usr/bin/node bot.js
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Restart=always means the bot restarts after any exit, including crashes, OOM kills, and manual stops.

Health Checks

Internal Health Check

Your bot should report its own status:

// Every 5 minutes, check if the bot is actually working
setInterval(() => {
    if (!client.ws.ping) {
        console.error('WebSocket connection lost, restarting...');
        process.exit(1); // PM2 will restart
    }
    
    const memUsage = process.memoryUsage().heapUsed / 1024 / 1024;
    console.log(`Health: OK | Memory: ${memUsage.toFixed(1)}MB | Ping: ${client.ws.ping}ms`);
}, 300000);

External Health Check

Run a separate monitoring script that checks if the bot is responsive:

import discord
import aiohttp
import asyncio

async def check_bot():
    try:
        async with aiohttp.ClientSession() as session:
            # Check if bot is in the expected guild
            headers = {'Authorization': f'Bot {BOT_TOKEN}'}
            async with session.get('https://discord.com/api/v10/users/@me', headers=headers) as resp:
                if resp.status == 200:
                    return True
    except Exception:
        pass
    return False

Notification System

When the bot goes down, you need to know immediately:

Discord Webhook Alert

#!/bin/bash
# check_bot.sh - Run via cron every 2 minutes

BOT_PROCESS="bot.js"
WEBHOOK="https://discord.com/api/webhooks/YOUR_WEBHOOK"

if ! pgrep -f "$BOT_PROCESS" > /dev/null; then
    curl -H "Content-Type: application/json"         -d '{"content":"Bot is DOWN! Process not found. Auto-restart triggered."}'         "$WEBHOOK"
    
    cd /home/botuser/bot && pm2 restart my-bot
fi

Uptime Monitoring

If your bot has a web dashboard or health endpoint:

  • UptimeRobot (free) pings your endpoint every 5 minutes
  • Sends notifications via email, SMS, or webhook

Hosting Matters

Silent deaths are more common on:

  • Free hosting (Replit, Glitch) that kills idle processes
  • Shared hosting with aggressive resource limits
  • Home computers that sleep or restart

Space-Node's Discord bot hosting starts FREE for small bots. The always-on infrastructure means your bot's process manager keeps running even when you're asleep.

The difference between a bot that's "usually online" and one that has 99.9% uptime is entirely about monitoring and automatic recovery. Set up PM2, health checks, and alerts, and your bot will recover from crashes before anyone notices.

Space-Node Team

About the Author

Space-Node Team – Infrastructure Team – Experts in game server hosting, VPS infrastructure, and 24/7 streaming solutions with 15+ years combined experience.

Since 2023
500+ servers hosted
4.8/5 avg rating

Our team specializes in Minecraft, FiveM, Rust, and 24/7 streaming infrastructure, operating enterprise-grade AMD Ryzen 9 hardware in Netherlands datacenters. We maintain GDPR compliance and ISO 27001-aligned security standards.

View Space-Node's full team bio and credentials →

Launch Your VPS Today

Get started with professional VPS hosting powered by enterprise hardware. Instant deployment and 24/7 support included.

Preventing Silent Bot Deaths: 24/7 Supervision Strategies