robots.txt Configuration for AI Crawlers | AiVIS Cite Ledger

Your robots.txt file controls which AI crawlers can access your content. Get it wrong and you're invisible; get it right and you control exactly what gets cited.

AI Crawler User-Agents

GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI), and Bytespider (ByteDance) are the major AI crawlers.

Each respects robots.txt rules for their specific user-agent. You can allow some while blocking others based on your data policy.

For maximum AI visibility: allow GPTBot, ClaudeBot, and PerplexityBot on all public content pages.

Block admin paths, login pages, user account pages, and checkout flows. AI crawlers don't need access to these.

Add a Sitemap directive pointing to your XML sitemap so crawlers know where to find your content.

Testing Your robots.txt

Visit yourdomain.com/robots.txt to verify the file is accessible and contains the rules you expect.

Run an AiVIS Cite Ledger audit to validate that your robots.txt rules actually allow AI access to your important content pages.

Frequently Asked Questions

Should I block AI crawlers?
Only if you have a specific reason to (data licensing, privacy). Blocking AI crawlers means your content won't appear in AI-generated answers.
Can I allow citation but block training?
Some providers respect separate user-agents for citation vs training. Check each provider's documentation for the latest guidance.
What if my hosting platform controls robots.txt?
Some platforms (Shopify, Squarespace) limit robots.txt editing. Check your platform's documentation or use a CDN/proxy layer to serve a custom file.