Documentation
ScraperBlock
Bot Blocklist

Bot Blocklist

ScraperBlock maintains a built-in list of 50+ AI and scraper user-agent signatures. User-agent matching is case-insensitive substring matching.

AI Training Bots

BotOrganisation
GPTBotOpenAI
ChatGPT-UserOpenAI
OAI-SearchBotOpenAI
Google-ExtendedGoogle (AI training opt-out)
CCBotCommon Crawl
ClaudeBotAnthropic
Claude-WebAnthropic
anthropic-aiAnthropic
PerplexityBotPerplexity AI
BytespiderByteDance/TikTok
FacebookBotMeta
AmazonbotAmazon
DataForSeoBotDataForSEO
ImagesiftBotImagesift
YouBotYou.com

SEO & Commercial Scrapers

BotOrganisation
SemrushBotSemrush
AhrefsBotAhrefs
MJ12botMajestic
DotBotMoz
BLEXBotBLEXBot
PetalBotHuawei

Automation & Headless Browsers

BotDescription
ScrapyPython scraping framework
python-requestsPython HTTP library
HeadlessChromeHeadless Chrome browser
PhantomJSHeadless browser (legacy)
PuppeteerChrome automation library
PlaywrightCross-browser automation
seleniumBrowser automation
scraperapiScraperAPI service
curl/cURL command line
wget/Wget command line

The full list is returned by ScraperBlock_Rules::get_bot_blocklist(). You can extend it via the scraperblock_bot_blocklist filter - see the Developer Reference page.

Blocking Google-Extended does not affect Google Search rankings. Google-Extended is Google's AI training opt-out signal and is separate from Googlebot. Common search crawlers (Googlebot, Bingbot, Slurp, DuckDuckBot) are NOT on the blocklist.