After 968 Website Audits: Why AI Doesn't Cite Most Sites (And What Actually Fixes It) | AiVIS Cite Ledger Blogs

By · · 9 min read · AEO

Good rankings, low inclusion. This dataset shows where citation systems break and which structural repairs move inclusion fastest.

Key Takeaways

  • Technical SEO passes on most audited sites, yet those same sites score lowest on structured data and content structure.
  • Schema markup is the weakest category across all 968 audits: missing, incomplete, or generically present with no entity clarity.
  • AI does not rank pages, it extracts them. Extraction fails when heading hierarchy is broken or schema declares no meaningful relationships.
  • Backlinks, domain authority, and keyword density showed no direct correlation with AI citation rates in the audited dataset.
  • Pages scoring 65+ share four consistent traits: direct H2/H3 answers, valid FAQPage schema, complete Organization JSON-LD, and consistent entity naming.
  • The path from SEO to AEO to GEO is structural, not technical, and most sites are still at step one.

Article

Technical SEO Is Not the Problem

Across 968+ audits, most sites passed core technical requirements: page speed, mobile responsiveness, indexability, and meta tag completeness. These signals help search engines rank pages. They do not guarantee that AI systems can understand, extract, or reuse the content.

> The gap between strong SEO performance and strong AI visibility is structural, not technical.

Schema Markup Is the Weakest Layer

The lowest-scoring category across all audits was structured data. This single gap accounts for the majority of AI citation failures in sites that otherwise have solid SEO fundamentals.

Common issues found across audits:

  • Schema entirely absent
  • Incomplete JSON-LD blocks with required fields missing
  • Generic schema that provides no entity clarity
  • FAQPage schema containing vague or empty answers
  • No declared relationships between organisation, product, and article entities

In many cases schema existed on the page but contributed no usable meaning to AI extraction. For AI systems, weak schema produces low entity-confidence, incomplete extraction, and no attribution.

AI Does Not Rank Pages, It Extracts Them

AI systems like ChatGPT, Perplexity, and Google AI Overview follow a structured extraction process on every page they process:

  • Parse page structure and heading hierarchy
  • Identify named entities and their declared relationships
  • Extract segments that answer specific query patterns
  • Compress extracted segments into a coherent answer
  • Decide whether the source is trustworthy enough to cite

If the structure breaks at any point in that chain, the page is skipped. AI systems do not partially cite broken pages, they move to the next available source.

Structural Failures That Cause Invisibility

Across the 968-audit dataset, the same failure patterns appear regardless of industry, domain authority, or SEO investment:

  • Headings that are generic rather than meaning-mapped to a specific question or claim
  • Thin or empty c

Enable JavaScript for the full interactive reading experience with related articles and discussion.