Bot Blocklist
ScraperBlock maintains a built-in list of 50+ AI and scraper user-agent signatures. User-agent matching is case-insensitive substring matching.
AI Training Bots
| Bot | Organisation |
|---|---|
GPTBot | OpenAI |
ChatGPT-User | OpenAI |
OAI-SearchBot | OpenAI |
Google-Extended | Google (AI training opt-out) |
CCBot | Common Crawl |
ClaudeBot | Anthropic |
Claude-Web | Anthropic |
anthropic-ai | Anthropic |
PerplexityBot | Perplexity AI |
Bytespider | ByteDance/TikTok |
FacebookBot | Meta |
Amazonbot | Amazon |
DataForSeoBot | DataForSEO |
ImagesiftBot | Imagesift |
YouBot | You.com |
SEO & Commercial Scrapers
| Bot | Organisation |
|---|---|
SemrushBot | Semrush |
AhrefsBot | Ahrefs |
MJ12bot | Majestic |
DotBot | Moz |
BLEXBot | BLEXBot |
PetalBot | Huawei |
Automation & Headless Browsers
| Bot | Description |
|---|---|
Scrapy | Python scraping framework |
python-requests | Python HTTP library |
HeadlessChrome | Headless Chrome browser |
PhantomJS | Headless browser (legacy) |
Puppeteer | Chrome automation library |
Playwright | Cross-browser automation |
selenium | Browser automation |
scraperapi | ScraperAPI service |
curl/ | cURL command line |
wget/ | Wget command line |
The full list is returned by ScraperBlock_Rules::get_bot_blocklist(). You can extend it via the scraperblock_bot_blocklist filter - see the Developer Reference page.
Blocking Google-Extended does not affect Google Search rankings. Google-Extended is Google's AI training opt-out signal and is separate from Googlebot. Common search crawlers (Googlebot, Bingbot, Slurp, DuckDuckBot) are NOT on the blocklist.