How ChatGPT Decides What Websites to Use | AiVIS.biz

ChatGPT does not have a whitelist of preferred sites. It uses whatever it can extract. What it can extract depends entirely on how your site is structured.

The four-stage source selection process

Stage 1 — Crawl: GPTBot (and browsing) crawls candidate pages. Pages blocked by robots.txt, CDN, or rendering failures are eliminated here.

Stage 2 — Extraction: The crawler parses the HTML. Pages with clear structure, semantic headings, and JSON-LD produce high-quality extraction. Pages without produce noise or nothing.

Stage 3 — Attribution: The model checks for entity signals. Organization, Author, datePublished. Sources without these are used anonymously or not at all.

Stage 4 — Selection: Among all candidate extractions for a given query, the model weighs content specificity, structural quality, and temporal recency. The most attributable, specific, extractable source wins.

What does NOT affect ChatGPT source selection

Backlinks: ChatGPT does not process link graphs the way Google does.

Domain authority: Moz DA and similar metrics have no bearing on AI extraction.

Keyword density: AI models extract semantic meaning, not keyword patterns.

Social signals: Follower counts and engagement metrics are not extraction inputs.

That means these factors also do not protect you — a high-authority site with poor extraction readiness is still skipped.

Frequently Asked Questions

Can I pay OpenAI to be included in ChatGPT answers?
No. ChatGPT source selection is not a paid placement. It is a function of structural extraction readiness. The path to inclusion is technical, not commercial.
Does ChatGPT use different criteria for different types of queries?
Yes. Informational queries prefer content with FAQ schema and clear headings. Commercial queries may favor sites with Product or Service schema. Brand queries depend heavily on Organization schema and sameAs links.