Blocked by robots.txt - AI Can't Crawl You | AiVIS Cite Ledger
One line in your robots.txt can make your entire site invisible to AI models. Many security plugins and default configurations block AI crawlers silently.
AI-Specific User-Agents to Know
GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Googlebot (Google AI), and others each respect robots.txt rules for their user-agent.
A blanket 'User-agent: * / Disallow: /' blocks everything, including AI crawlers. More targeted blocks may only affect specific models.
How to Check Your robots.txt
Visit yourdomain.com/robots.txt and look for Disallow rules. Check for both wildcard rules (User-agent: *) and AI-specific blocks (User-agent: GPTBot).
Also check for noindex meta tags in your HTML, these tell crawlers to parse but not index, effectively hiding content from AI model training and citation.
Fixing robots.txt for AI
If you want AI visibility, explicitly allow AI crawlers. Add specific Allow rules for GPTBot, ClaudeBot, and PerplexityBot.
Keep blocking admin paths, login pages, and private content. AI visibility doesn't mean opening everything.
After updating robots.txt, run an AiVIS Cite Ledger audit to confirm AI crawlers can now reach your key content pages.
Frequently Asked Questions
- Should I allow all AI crawlers in robots.txt?
- Allow the ones you want citations from. For most businesses, allowing GPTBot, ClaudeBot, and PerplexityBot while blocking others based on your data policy works well.
- Do AI crawlers respect robots.txt?
- The major AI crawlers (GPTBot, ClaudeBot, PerplexityBot) respect robots.txt. Some smaller or training crawlers may not.
- Can I block training but allow citations?
- This is tricky. Currently, there's no standard robots.txt directive that distinguishes training from citation. Some providers support separate user-agents for different purposes.