How ChatGPT Decides What Websites to Use | AiVIS.biz
ChatGPT does not have a whitelist of preferred sites. It uses whatever it can extract. What it can extract depends entirely on how your site is structured.
The four-stage source selection process
Stage 1 — Crawl: GPTBot (and browsing) crawls candidate pages. Pages blocked by robots.txt, CDN, or rendering failures are eliminated here.
Stage 2 — Extraction: The crawler parses the HTML. Pages with clear structure, semantic headings, and JSON-LD produce high-quality extraction. Pages without produce noise or nothing.
Stage 3 — Attribution: The model checks for entity signals. Organization, Author, datePublished. Sources without these are used anonymously or not at all.
Stage 4 — Selection: Among all candidate extractions for a given query, the model weighs content specificity, structural quality, and temporal recency. The most attributable, specific, extractable source wins.
What does NOT affect ChatGPT source selection
Backlinks: ChatGPT does not process link graphs the way Google does.
Domain authority: Moz DA and similar metrics have no bearing on AI extraction.
Keyword density: AI models extract semantic meaning, not keyword patterns.
Social signals: Follower counts and engagement metrics are not extraction inputs.
That means these factors also do not protect you — a high-authority site with poor extraction readiness is still skipped.
Frequently Asked Questions
- Can I pay OpenAI to be included in ChatGPT answers?
- No. ChatGPT source selection is not a paid placement. It is a function of structural extraction readiness. The path to inclusion is technical, not commercial.
- Does ChatGPT use different criteria for different types of queries?
- Yes. Informational queries prefer content with FAQ schema and clear headings. Commercial queries may favor sites with Product or Service schema. Brand queries depend heavily on Organization schema and sameAs links.