Structured Data & AI Crawlers 2026: Schema Hype Debunked

If there is ever a so often misunderstood concept in the modern digital ecosystem, it is the role of structured data. The inability for people to see outside the box, the fog, the pre-sold, broken programming that spawned the industry we know today as SEO, is staggering. The current hype? That JSON-LD schema is the absolute 'dogs bollocks' for getting your content noticed by AI Overviews and agentic systems.

Studies in late 2025 and early 2026, one pattern emerges clearly. Most AI crawlers—your GPTBots, PerplexityBots, and ClaudeBots—do not parse JSON-LD or other structured data semantically [1]. They tokenise the entire page, including that precious <script type="application/ld+json"> block, as ordinary, everyday text.

Search engines still use schema during indexing and entity building. But when their LLMs generate answers, they draw from algorithmic summaries and grounding snippets, not raw structured data feeds. The result? Visible, semantically clear HTML structure beats hidden schema for AI extraction in almost every case. Schema remains useful, but not for the reasons the unenlightened 'GEO experts' of 2026 claim.

The Evidence: What Controlled Experiments Actually Show

Let us look at the reality, rather than the culture soup of marketing buzzwords. Mark Williams-Cook ran a brilliant 'Duck Test' in February 2026 [3]. He placed a fake company address exclusively in invalid, made-up JSON-LD schema. Both ChatGPT and Perplexity extracted the address anyway. Why? Because they tokenized the <script> block as plain text. They did not parse the structure.

As Williams-Cook concluded: "Schema is not being used in the explicit sense it was designed for within LLMs" [4]. The takeaway is clear—schema markup is good practice, but repackaging it as some magical new 'GEO formula' is entirely misguided.

A SearchVIU Study from October 2025 tested eight scenarios across ChatGPT, Claude, Perplexity, Gemini, and Google AI Mode [2]. Pages with data only in JSON-LD, Microdata, or RDFa saw near-zero extraction. Visible, well-structured HTML, however, achieved consistent success. Schema-only content failed across the board. AI crawlers often strip or ignore <head> elements entirely, and tokenization simply breaks semantic meaning.

The results were damning: Claude extracted zero prices from any schema format. ChatGPT managed 37.5% success, primarily from visible HTML. Gemini performed best at 50%, but still failed on all pure schema-only tests [2]. The pattern is unmistakable.

There is a nuance, of course. Microsoft Copilot shows slightly better schema awareness due to its deeper integration with Bing's search indexing. But the overarching truth remains: AI agents read what is on the page, not what is hidden in the code.

How Search Engines Actually Handle Schema Today

Do not misunderstand me. I am not saying schema is dead. Google and Bing still parse JSON-LD during crawling and indexing for entity understanding, Knowledge Graph signals, and rich results. John Mueller and Danny Sullivan have repeatedly confirmed that schema helps search engines understand content [5].

However, when generating AI Overviews or answers, LLMs work from processed index summaries and grounding passages. They do not use a direct feed of raw JSON-LD. Danny Sullivan's position is clear: 'SEO for AI is still SEO.' Schema supports the same foundational signals it always has. There is no special 'AI schema pipeline.'

The indirect benefit is that good schema can improve how pages are indexed and surfaced as sources, increasing the chance they reach the LLM layer. But it is a supporting actor, not the star of the show.

SEO Stop chasing algorithmic ghosts. Build a foundation that withstands the agentic shift with clear, structured, and authoritative content. Expand your knowledge with the SEO Canon. A primary source resouce that cuts through the hype.

Priorities for Real-World Optimisation in 2026

If you want to survive the shift to AI-driven search, you need to rethink your priorities. Here is what actually matters, based on the reality of the ground zero ecosystem.

Tier 1 (Highest impact for AI extraction): Visible semantic HTML. This means clear headings, definition lists, tables, bullet points, explicit entity mentions, and primary-source citations. Make it easy for the machine to read the human-facing content.

Tier 2: Strong internal linking, entity disambiguation, and E-E-A-T signals. Focus on Content Depth, Entity Search & Semantics, and establishing genuine authority. This is where proper SEO strategy truly pays off.

Tier 3: Schema markup. It is still worth implementing Article, Organization, FAQPage, HowTo, and BreadcrumbList for traditional search benefits and indirect AI help. But treat it as a supporting signal, not a primary lever.

The most common mistake I see is business owners over-investing in complex or hidden schema while completely neglecting readable, human-first structure. It is like putting a bespoke suit on a mannequin and expecting it to win a marathon.

Better Alternatives Emerging in the Agentic Era

We are moving beyond simple crawling. Content negotiation and Markdown for Agents are becoming crucial. When AI agents request Accept: text/markdown, providing cleaner, lower-token versions of your content is far more effective than relying on schema parsing.

Using Content-Signals in robots.txt and Link response headers for agent discovery are also powerful, emerging tactics. Clean, chunkable content architecture aligns perfectly with the need for depth and scannability. These methods beat 'schema-only' tactics for pure AI and agent visibility every single time.

The Bottom Line

Build for humans with machine-readable clarity as a natural byproduct. Schema is not dead, but over-reliance on it as an 'AI hack' is entirely misguided.

Focus on the timeless principles: depth, accuracy, trust, and clear structure. Embrace emerging agent-friendly delivery methods. The sites that win in 2026 and beyond will be those that remain genuinely useful, not those endlessly chasing the latest markup trend. Do not be a 'suckhole' to the algorithm; be a conduit for actual value.

References

[1] SearchVIU (2025). Schema Markup and AI in 2025: What ChatGPT, Claude, Perplexity & Gemini Really See. Retrieved from https://www.searchviu.com/en/schema-markup-and-ai-in-2025-what-chatgpt-claude-perplexity-gemini-really-see/

[2] SearchVIU (2025). Test Results Summary: 8 scenarios across 5 AI systems showing JSON-LD extraction rates of 0-50%, with visible HTML consistently outperforming all schema formats.

[3] Schwartz, B. (2026, February 6). ChatGPT & Perplexity Treat Structured Data As Text On A Page. Search Engine Roundtable. Retrieved from https://www.seroundtable.com/chatgpt-perplexity-structured-data-text-40862.html

[4] Williams-Cook, M. (2026). Schema markup experiment demonstrating AI crawlers tokenize JSON-LD as plain text rather than parsing semantic structure. LinkedIn discussion.

[5] Mueller, J. & Sullivan, D. (Google). Statements confirming schema markup helps search engines understand content during crawling and indexing phases, though not directly used in LLM answer generation.

Structured Data & AI Crawlers 2026: Schema Hype Debunked

Structured Data & AI Crawlers in 2026: Why Most Schema Hype Is Misplaced

The Evidence: What Controlled Experiments Actually Show

How Search Engines Actually Handle Schema Today

Priorities for Real-World Optimisation in 2026

Better Alternatives Emerging in the Agentic Era

The Bottom Line

References

You might also like

Lost in Semantics: EEAT and the Problem of Artificial Ignorance

Why ChatGPT Cites Your Content (And Why Your Schema Doesn't Matter)

Google Is Right About Google. That Is Not the Whole Map.