VPS for Web Scraping: Ethical Setup, Rate Limiting, and Legal Considerations 2026
Web scraping is legitimate, widely used, and legally complex. Getting it right from a VPS means understanding both the technical and ethical constraints.
robots.txt: The Social Contract
robots.txt is not legally binding but is widely considered the ethical baseline. Respect it:
const robotsParser = require('robots-parser');
const fetch = require('node-fetch');
const robotsTxt = await fetch('https://target-site.com/robots.txt').then(r => r.text());
const robots = robotsParser('https://target-site.com/robots.txt', robotsTxt);
const allowed = robots.isAllowed('https://target-site.com/page-to-scrape', 'MyScraperBot');
if (!allowed) {
console.log('robots.txt disallows this path — skipping');
return;
}
Rate Limiting: Be a Polite Scraper
Aggressive scraping that saturates a target server is harmful and gets your VPS IP banned. Implement delays:
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
async function scrapePage(url) {
await delay(1000 + Math.random() * 1000); // 1–2 second random delay between requests
const response = await fetch(url, {
headers: {
'User-Agent': 'YourBot/1.0 (contact@yourdomain.com)', // Identify yourself
'Accept': 'text/html'
}
});
// Respect Retry-After headers
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
await delay(retryAfter * 1000);
return scrapePage(url); // Retry after waiting
}
return response.text();
}
Caching to Reduce Load
Cache results aggressively to avoid hitting the same URL repeatedly:
const cache = new Map();
async function cachedFetch(url) {
if (cache.has(url)) return cache.get(url);
const data = await scrapePage(url);
cache.set(url, data);
// Or use Redis for persistent caching across restart
return data;
}
Legal Awareness (2026)
- Public data is generally scrapeable — Google scrapes everything publicly accessible
- Terms of Service violations — Many sites prohibit scraping in ToS. Violating ToS is a ToS violation, not automatically illegal in most EU jurisdictions, but can lead to IP bans and legal letters
- Personal data (GDPR): Collecting personal data of EU residents requires legal basis even if publicly accessible. Product prices and reviews are fine; names, emails, and user profiles require more care