SEO Troubleshooter
Diagnose common SEO problems. No marketing language — just the issue, the cause, and the fix.
AI Overview Not Citing My Content
Your content may rank in ordinary organic results, but Google’s AI Overview or AI Mode does not cite it, because the page is not the best retrievable, eligible, useful, or trusted source for the specific answer being generated.
AI Overview Not Citing My Content
Your content may rank in ordinary organic results, but Google’s AI Overview or AI Mode does not cite it, because the page is not the best retrievable, eligible, useful, or trusted source for the specific answer being generated.
Your content may rank in ordinary organic results, but Google’s AI Overview or AI Mode does not cite it, because the page is not the best retrievable, eligible, useful, or trusted source for the specific answer being generated.
Reality Check
Google says AI Overviews and AI Mode are rooted in core Search ranking and quality systems. There is no special `llms.txt`, Markdown file, AI schema, tiny chunking format, or machine-readable hack required for Google generative AI Search. Diagnose the fundamentals first: crawlability, indexation, snippet eligibility, usefulness, originality, and trust.
Symptoms
- You hold a strong blue-link position for a query, but the AI Overview cites a lower-ranking competitor or a third-party source.
- Traffic drops for a query where an AI Overview appears, even though your ordinary organic ranking has not moved.
- Google mentions your brand, product, service, or topic, but links to a review site, forum, publisher, marketplace, or aggregator instead of your own page.
- Your page is indexed, but the specific answer cited in the AI Overview is clearer, more current, or more directly evidenced on another site.
Likely Causes
Ranked by probability. Highest probability cause first.
- High **Search Eligibility Problem:** Google’s guide states that a page must be indexed and eligible to appear in Google Search with a snippet to be eligible for generative AI features. If the page is noindexed, blocked, canonicalised away, not indexed, or not eligible for snippets, AI Overview citation is unlikely.
- High **Commodity Content Problem:** Google says unique, compelling, useful, non-commodity content will likely influence generative AI Search presence more than any other suggestion in its guide. If the page mostly summarises common knowledge, Google may choose a source with clearer first-hand experience, evidence, or authority.
- Medium **Technical Accessibility Problem:** Google’s AI features use content from the Search index. If the main content is hard to crawl, hidden behind problematic JavaScript, blocked resources, or duplicate URL patterns, Google may not rely on it for a grounded answer.
- Medium **Answer Clarity Problem:** Google does not require artificial chunking, but the relevant answer still needs to be understandable. If the answer is buried under vague introductions, sales copy, or unclear headings, a competitor may be a better source for that specific information need.
- Medium **Authority or Source Fit Problem:** Google’s AI features may draw on what is said across the web about products, services, and topics. For some facts, an official source, regulator, manufacturer, primary dataset, or highly trusted publisher may be a stronger fit than your page.
- Low **Unsupported Hack Dependence:** Adding `llms.txt`, special Markdown, AI-only markup, or schema created solely for AI Search will not fix a Google AI Overview citation problem. Google explicitly says these are not required for generative AI Search.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the page indexed and eligible to appear with a snippet in Google Search?
- Does the page provide original, non-commodity value for the exact topic the AI Overview is answering?
- Can Google access the main answer in crawlable HTML without depending on fragile client-side rendering?
- Is the specific answer clear enough for a human to understand quickly without artificial AI formatting?
- Is another source simply a better authority for the fact being cited?
Fixes
Confirm that the page is crawlable, indexable, canonicalised correctly, internally linked, and eligible to appear with a snippet. Use Google Search Console to inspect the URL and diagnose indexing or serving problems.
Replace generic summaries with first-hand experience, original research, unique product or service detail, clear expert judgement, practical examples, and evidence that could not be produced by simply rewriting existing search results.
Ensure the main content is present in accessible HTML or reliably rendered for Google. Follow JavaScript SEO best practices, reduce duplicate URL waste, maintain crawl budget for large sites, and avoid burying important information in inaccessible components.
Use descriptive headings, coherent sections, concise explanatory paragraphs, useful tables, and media that genuinely help the user. Do not split content into tiny artificial chunks solely for AI systems; Google says there is no such requirement.
Strengthen the page’s role as the best source. Cite primary sources, maintain accurate business or product data, earn genuine mentions, publish original data, and make authorship, expertise, and trust signals visible where relevant.
Do not treat `llms.txt`, special Markdown, AI-only files, inauthentic mentions, or special AI schema as Google AI Overview fixes. They may serve other documentation or agent workflows, but they are not Google AI Search requirements.
AI Context
Google (Googlebot / Search Console)
Google’s AI Overviews and AI Mode are grounded in core Search ranking and quality systems. Its guide describes retrieval-augmented generation as relying on Search ranking systems to retrieve relevant, up-to-date pages from the Search index, and query fan-out as generating related searches around the user’s original task. Preparation: Prepare the page for Search first. Make it crawlable, indexable, useful, technically clear, satisfying for users, and eligible for snippets. Then improve the substance so it is a better source than the alternatives.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Do not infer that Google requires `llms.txt`, AI-specific Markdown, special schema, artificial chunking, or long-tail query pages for every fan-out variation. Google’s guidance explicitly rejects those as requirements for generative AI Search.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- **Seen in the wild:** A clinic service page ranked organically but was not cited because it offered a generic description of the treatment while a competitor included clinician-authored aftercare guidance, patient suitability criteria, contraindications, and first-hand procedural detail.
- **Seen in the wild:** An ecommerce category page was ignored for AI-style product answers because Merchant Center data was incomplete and product information on the page was duplicated from manufacturer feeds.
- **Seen in the wild:** A publisher tried adding `llms.txt` and extra schema after losing AI Overview visibility, but the real issue was that the cited competitor had original test data and clearer evidence for the specific claim.
Backlink Profile Not Growing
Your site's domain authority is stagnant because you aren't naturally earning new, high-quality links from other websites.
Backlink Profile Not Growing
Your site's domain authority is stagnant because you aren't naturally earning new, high-quality links from other websites.
Your site's domain authority is stagnant because you aren't naturally earning new, high-quality links from other websites.
Reality Check
"Link building" is a broken mental model. If you have to ask or pay for a link, it's probably not worth having. The sites that win in 2026 are the ones that create "linkable assets" — content so useful, original, or controversial that other people naturally cite it as a source.
Symptoms
- Your Ahrefs/Semrush Domain Rating (DR) has been flat for months or years.
- You are stuck on page 2 for competitive head terms despite having excellent on-page content.
- The only links you get are from scraper sites, directories, or low-quality guest posts.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Lack of Original Data/Research: Your content is purely derivative. You summarise what others have said, so nobody has a reason to cite you as the primary source.
- Medium Poor Outreach Strategy: You are sending generic "I saw you linked to X, please link to my better article Y" emails that everyone ignores.
- Medium Unlinkable Content Formats: You are trying to build links to product pages or sales landing pages. People link to resources, not cash registers.
- Low Toxic Link Profile: You previously engaged in manipulative link building and Google is ignoring your entire link graph.
Diagnostic Steps
Work through each question to identify the root cause.
- Look at your top 5 most linked-to pages (excluding the homepage). Are they informational or transactional?
- Do those informational pages contain any proprietary data, original surveys, or unique tools?
Fixes
Conduct a survey, analyse your own customer data, or build a free tool or calculator. Become the primary source that others must cite when discussing the topic in your industry.
Stop sending cold emails. Build genuine relationships with journalists, bloggers, and industry peers on social media. Share your original data with them before you publish — give them the exclusive.
Create a dedicated Resource Centre or Industry Statistics page. Build links to that page, then use internal linking to pass the authority to your product and sales pages.
If you have a manual action, use the Disavow Tool. Otherwise, focus entirely on earning high-quality, editorially given links to dilute the toxic ones over time.
AI Context
Google (Googlebot / Search Console)
A measure of trust and consensus. If nobody is citing your work, you are not an authority in your field — regardless of how good your on-page content is.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs, especially RAG systems, use the link graph to determine which sources are the most credible when synthesising answers. A strong link profile increases the likelihood of being cited as the definitive source when an AI agent needs to ground a claim.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A SaaS company couldn't get links to their pricing page. They created a "State of the Industry Salary Report" using their own anonymised user data. It earned 500+ referring domains in a month, and they internally linked that authority to the pricing page.
- An SEO agency sent 1,000 automated outreach emails begging for links to their "Ultimate Guide to SEO." They got zero links and had their domain blacklisted by several major publishers.
Broken Internal Links
Internal links pointing to pages that return 4xx errors damage user experience, waste crawl budget, and break the internal PageRank flow of your site.
Broken Internal Links
Internal links pointing to pages that return 4xx errors damage user experience, waste crawl budget, and break the internal PageRank flow of your site.
Internal links pointing to pages that return 4xx errors damage user experience, waste crawl budget, and break the internal PageRank flow of your site.
Reality Check
Broken internal links are like potholes in your website's roadways — annoying, avoidable, and damaging in ways that compound over time. Every broken internal link is a dead end for both users and Googlebot. Beyond the user experience damage, broken links stop the flow of internal PageRank to the pages that need it. A site with hundreds of broken internal links is one where the link equity that should be distributed throughout the site is simply evaporating.
Symptoms
- 404 errors in Google Search Console Crawl Stats or the Page Indexing report showing "Not found (404)" as an exclusion reason.
- Users reporting dead links or encountering 404 pages during normal site navigation.
- Crawl tools (Screaming Frog, Sitebulb, Ahrefs) reporting a high number of internal URLs returning 4xx status codes.
- Decreased crawl efficiency — Googlebot spending time on dead URLs that could be used to discover live content.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Deleted or moved pages without redirect implementation: content is removed or restructured but the old URLs are not redirected to the new location.
- Medium Typos or incorrect URL formatting in link href attributes: manually authored links with misspellings, missing slashes, or wrong subdirectories.
- Low CMS or plugin generating faulty URLs: dynamic link generation broken by a plugin update, template change, or database issue.
Diagnostic Steps
Work through each question to identify the root cause.
- Run a site crawl using Screaming Frog or Ahrefs Site Audit. Filter results to show internal URLs returning 4xx status codes. How many broken internal links does the crawl report?
- For broken URLs that previously had content: does the content still exist at a different URL, or has it been permanently removed?
Fixes
Implement 301 permanent redirects for all URLs that have moved to a new location. Maintain a redirect map as a living document — every time a URL is changed or content is removed, a redirect entry must be added. Update the internal links themselves to point to the correct destination rather than relying on the redirect chain.
Export all internal links from your crawl tool and filter for links containing common error patterns: double slashes, missing slashes before query strings, incorrect subdomain references, or obvious typos in path segments. Correct each link in the source CMS or template file.
Identify which CMS component or plugin is responsible for the broken link generation. Test in a staging environment after plugin updates before deploying to production. Implement a post-deployment crawl check as part of your release process to catch new broken links before they accumulate.
AI Context
Google (Googlebot / Search Console)
Google follows internal links to distribute PageRank throughout a site and to discover content. A broken internal link is a signal that the site's maintenance is poor — it does not cause a direct penalty but contributes to overall quality signals. Broken links also prevent PageRank from flowing to linked pages. Sites with large numbers of broken internal links tend to have lower crawl efficiency, meaning less of the site is discovered and updated in Google's index.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI models do not directly evaluate internal link structure. However, broken internal links reduce the quality of user experience signals (time on site, bounce rate) that Google may use as ranking inputs. For retrieval-augmented systems that crawl site structure, broken links create navigation dead ends that prevent the full discovery of a site's content.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A large retailer restructured its product category taxonomy, changing all category URL slugs. The internal links throughout 2,000 blog posts and 500 category pages were not updated. Googlebot encountered over 15,000 broken internal links in the subsequent crawl. Implementing redirects and updating internal links recovered crawl efficiency within eight weeks.
- A media company's CMS plugin update changed the URL structure for author archive pages. Every author byline link across 8,000 articles became a broken link overnight. The issue was discovered via a spike in 404s in the Search Console Crawl Stats report three days after the update.
Internal Link Dilution
Too many internal links on a single page spread link equity so thinly that none of them carry meaningful ranking power to their destinations.
Internal Link Dilution
Too many internal links on a single page spread link equity so thinly that none of them carry meaningful ranking power to their destinations.
Too many internal links on a single page spread link equity so thinly that none of them carry meaningful ranking power to their destinations.
Reality Check
Google has historically cited 100 links per page as a rough threshold — beyond that, each additional link passes diminishing signals. The more important issue is relevance and placement: a link in the body of an article next to topically related text is worth vastly more than the same URL listed in a massive footer alongside 200 other links. Most sites' internal linking problems are not about counting links — they are about concentrating equity on pages that matter.
Symptoms
- Important pages do not rank despite receiving many internal links, because those links are buried in navigation or footers alongside hundreds of others.
- Crawl analysis shows over 200 internal links per page across most of your site templates.
- Link equity mapping (via tools like Sitebulb) shows an uneven distribution where homepage receives almost everything and mid-tier pages receive almost nothing.
- Pages targeted as priority content have weak PageRank flow despite being "linked everywhere".
Likely Causes
Ranked by probability. Highest probability cause first.
- High Overcrowded navigation or footer: mega-menus and footer sitemaps that expose every URL on every page, diluting signals across the entire site.
- Medium Automated or plugin-driven internal linking: related posts plugins or automatic cross-links adding dozens of links per page without prioritisation.
- Low No internal linking strategy: links added ad hoc with no consideration of which pages need equity, resulting in uneven distribution.
Diagnostic Steps
Work through each question to identify the root cause.
- Using Screaming Frog or Sitebulb, check the "Unique Inlinks" for your most important pages. Are those pages receiving fewer inbound internal links than less important pages due to link equity dilution from navigation or footer bloat?
- Is the majority of the link count coming from global templates (header navigation, footer, sidebar) rather than from body content in topically relevant posts?
Fixes
Audit and simplify site-wide navigation. Footer links should be limited to your most important pages (key categories, contact, legal). Mega-menus should expose tier-2 categories only — not individual leaf pages. Reducing footer links from 150 to 20 can significantly improve link equity concentration.
Audit any plugins that automatically generate internal links. Disable or heavily configure them. Replace automatic related posts with manually curated contextual links in post body content. Each link should be chosen because it is genuinely the best next resource for the reader.
Create an internal linking map: list your 20 most commercially important pages and set a target of earning 5–10 contextual body-content inbound internal links each. Over 3 months, systematically add those links when publishing or updating existing content. Track equity flow monthly in Sitebulb.
AI Context
Google (Googlebot / Search Console)
Internal links function as a PageRank distribution system within your domain. Each link on a page distributes a fraction of that page's PageRank to its destinations. Extremely link-dense pages (200+ links) dilute each individual link's contribution to near-zero. Contextual links in topically relevant body text also carry anchor text signals that help Google understand what the destination page is about.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Internal link structure influences how AI crawlers perceive the relative importance of pages on a domain. Pages that receive many high-quality internal links with relevant anchor text are more likely to be retrieved and cited. Sites where internal link equity is fragmented across hundreds of low-value links may have important content ranked below its true authority by retrieval systems.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A news publisher's article templates included a "more from this author" module (12 links), a "trending topics" module (25 links), and a 60-link footer. Core pillar pages were receiving 200 inbound internal links but over 85% were from these template modules. After simplifying templates and adding 8 contextual links per pillar from related articles, average positions improved by 6 places.
- An e-commerce site's "related products" widget was injecting 24 links per product page automatically. Reducing to a curated 4 manually selected related products improved crawl efficiency and lifted category page rankings.
Negative SEO Attack
A competitor or malicious actor is deliberately building low-quality, spammy, or manipulative links to your site in an attempt to trigger a Google penalty.
Negative SEO Attack
A competitor or malicious actor is deliberately building low-quality, spammy, or manipulative links to your site in an attempt to trigger a Google penalty.
A competitor or malicious actor is deliberately building low-quality, spammy, or manipulative links to your site in an attempt to trigger a Google penalty.
Reality Check
Negative SEO attacks are real but their effectiveness is limited. Google's systems are designed to ignore low-quality links rather than penalise the target site. The risk is not zero, but it is lower than most practitioners fear. Panic disavowal of all new links is more dangerous than the attack itself.
Symptoms
- Links from irrelevant, foreign-language, or clearly spammy domains appearing in your profile.
- A manual action notification in Search Console citing "unnatural links to your site."
- Organic traffic drop coinciding with the appearance of suspicious links.
Likely Causes
Ranked by probability. Highest probability cause first.
- medium probability Competitor-Initiated Link Spam: A competitor is using automated tools to build spammy links to your site.
- medium probability Expired Domain Redirect: A spammy expired domain has been redirected to your site.
- low probability Hacked Site Linking: Compromised websites are being used to build links to your site as part of a broader spam network.
- low probability Misidentified Natural Links: New links from unfamiliar sources that appear suspicious but are actually legitimate.
Diagnostic Steps
Work through each question to identify the root cause.
- Is there a manual action in Search Console citing unnatural links?
- Do the suspicious links follow a pattern (same anchor text, same domain pattern, same time period)?
- Has organic traffic dropped?
AI Context
Google (Googlebot / Search Console)
Google's systems are designed to ignore low-quality links rather than penalise the target site. The threshold for a manual action based on inbound links is high. Google's documentation states that it "tries to ignore" spammy links.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Link profile quality is not directly visible to LLMs, but a site that has received a manual action for unnatural links will have reduced search visibility, which reduces the probability of AI citation.
At a Glance
Over-Optimised Anchor Text
Excessive use of exact-match keyword anchor text in your backlink profile triggers Google's spam filters, resulting in ranking penalties.
Over-Optimised Anchor Text
Excessive use of exact-match keyword anchor text in your backlink profile triggers Google's spam filters, resulting in ranking penalties.
Excessive use of exact-match keyword anchor text in your backlink profile triggers Google's spam filters, resulting in ranking penalties.
Reality Check
A backlink profile where every link uses your target keyword as anchor text screams manipulation to search engines. Natural links use varied anchor text: branded names, generic phrases like "click here", naked URLs, and partial matches. When your anchor text distribution is overwhelmingly exact-match, Google's algorithms flag it as an artificial link scheme — because natural editorial links almost never all use the same phrasing.
Symptoms
- Sudden ranking drop for targeted keywords after a link building campaign.
- Google Search Console showing a manual action for "unnatural links to your site".
- Backlink analysis tools (Ahrefs, Semrush) flagging an unnaturally high percentage of exact-match anchors in your profile.
- Rankings for over-optimised keywords are volatile week to week despite no content changes.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Repeated use of exact-match keyword anchors: a deliberate link building strategy using the target keyword as anchor text on every acquired link.
- Medium Lack of diversity in anchor text distribution: missing the natural mix of branded, generic, naked URL, and partial-match anchors that real editorial links produce.
- Low Paid or directory links with keyword-rich anchors: low-quality link schemes often use exact-match anchors by default.
Diagnostic Steps
Work through each question to identify the root cause.
- Export your backlink profile from Ahrefs or Semrush and analyse anchor text distribution. Do exact-match keyword anchors represent more than 20% of your total anchor text across external links?
- Are the exact-match anchor links concentrated on low-authority or spammy domains (Ahrefs DR below 10, or clearly irrelevant sites)?
Fixes
Immediately stop any link building activity that specifies anchor text. For future outreach, use branded anchors, partial matches, or leave anchor text to the discretion of the linking site. A ratio of roughly 70% branded/generic, 20% partial match, and 10% exact match is a reasonable natural target.
Actively build links with varied anchor text to dilute the over-optimised ratio. Earn links through content marketing, digital PR, and brand mentions. Do not specify anchor text in outreach — let journalists and editors choose naturally.
Identify paid links in your backlink profile. Request removal from the webmaster. For links that cannot be removed, compile a disavow file and submit it via Google Search Console. Regularly audit for new link spam acquisitions.
AI Context
Google (Googlebot / Search Console)
Google's Penguin algorithm (now integrated into the core algorithm and running in real-time) evaluates backlink profiles for manipulation signals. Unnatural anchor text concentration is one of its primary signals. A manual action for unnatural links requires a reconsideration request after remediation — algorithmic penalties are lifted automatically once the profile normalises, but this can take multiple algorithm refreshes.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs do not evaluate backlinks directly. However, a site that receives a Penguin penalty loses rankings and visibility, which reduces the probability of the site's content being indexed and therefore appearing in retrieval-augmented AI systems. Over-optimised anchor text from low-quality sites also suggests the site is engaged in manipulative practices, which may affect how curation systems weight the domain.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- An SEO agency ran a large-scale guest post campaign for a client, using the target keyword as the anchor text on every single placed link. Three months and 200 guest posts later, the client received a Penguin-driven ranking drop of 60% for their primary keywords. Disavowing the lowest-quality links and diversifying the anchor text profile took eight months to recover.
- An e-commerce retailer's affiliate programme instructed affiliates to link using a specific product keyword as anchor text. Over two years, 80% of their backlink profile was exact-match. A routine Google core update triggered a significant ranking drop. Updating affiliate guidelines and running a disavow file brought rankings back over six months.
Toxic Backlinks
Harmful inbound links from spammy, irrelevant, or manipulative sites that damage your domain's trustworthiness in Google's eyes.
Toxic Backlinks
Harmful inbound links from spammy, irrelevant, or manipulative sites that damage your domain's trustworthiness in Google's eyes.
Harmful inbound links from spammy, irrelevant, or manipulative sites that damage your domain's trustworthiness in Google's eyes.
Reality Check
The disavow tool is not a magic wand — and using it incorrectly on legitimate links can hurt you more than the toxic links ever would. Most sites with a healthy history can absorb a moderate volume of spammy links without penalty. The disavow tool is for situations where a manual penalty has been issued or where the volume of clearly manipulative links is overwhelming enough to correlate with ranking drops.
Symptoms
- Google Search Console shows a manual action for "Unnatural links to your site".
- A surge of new referring domains from irrelevant, low-DR sites appearing in Ahrefs or Semrush.
- Rankings drop sharply for previously stable keywords correlating with a new backlink influx.
- Backlink audit tools flag a high percentage of your link profile as "toxic" or "spammy".
Likely Causes
Ranked by probability. Highest probability cause first.
- High Poor historical link building: the site previously used link farms, PBNs, paid link schemes, or automated directories.
- Medium Negative SEO attack: a competitor has deliberately pointed thousands of spammy links at your domain to trigger a penalty.
- Low Unmonitored user-generated content: comment spam, forum profiles, or guestbook links created by bots.
Diagnostic Steps
Work through each question to identify the root cause.
- Check Search Console under Manual Actions. Is there an active "Unnatural links to your site" penalty?
- Were the toxic links built by your own team (or a previous agency) as part of a link building campaign?
Fixes
Export all backlinks and categorise by link type. Build a disavow file using Google's disavow syntax (domain:spammydomain.com). Submit via Search Console. Allow 6–12 weeks for recrawl and reassessment.
Set up weekly backlink monitoring alerts in Ahrefs or Semrush. When a spike is detected, evaluate the links before disavowing — not all sudden link influxes are negative. Disavow at domain level for obvious spam. Do not obsess over small volumes.
Enable nofollow on all user-submitted links by default. Use CAPTCHA and spam filtering on comment forms. Regularly purge spam comment profiles. Add rel="ugc nofollow" to any links you cannot control.
AI Context
Google (Googlebot / Search Console)
Google's link spam detection systems (including SpamBrain) assess the quality and intent of backlink profiles algorithmically. Manipulative links are discounted algorithmically before a manual review is triggered. Only when the volume or pattern is extreme does a manual action follow. Legitimate sites rarely need to disavow unless they have prior toxic link building history.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs and AI systems that factor web authority into their retrieval confidence use signals similar to PageRank. A domain with a spammy or artificially inflated link profile may be deprioritised in curated training datasets. AI-powered search systems (Perplexity, SearchGPT) increasingly weigh domain trustworthiness when selecting sources for citation.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A financial services firm suffered a 40% traffic drop after an algorithmic update. Backlink audit revealed 3,000 referring domains from a PBN used by their SEO agency 18 months prior. A disavow file reduced the toxic link ratio from 62% to 8% over 4 months. Rankings partially recovered after the next core update.
- A small e-commerce brand was targeted with a negative SEO attack: 15,000 links from foreign-language casino sites appeared within 72 hours. Because they had no prior manual action history, Google largely ignored the links algorithmically. Proactive disavow at domain level was submitted as a precaution.
Content Not Ranking Despite Length
You wrote a 3,000-word "ultimate guide," but it's being outranked by 500-word pages or different formats entirely.
Content Not Ranking Despite Length
You wrote a 3,000-word "ultimate guide," but it's being outranked by 500-word pages or different formats entirely.
You wrote a 3,000-word "ultimate guide," but it's being outranked by 500-word pages or different formats entirely.
Reality Check
Word count is not a ranking factor. Never has been. "Comprehensive" does not mean "long." If the user just wants a quick answer, a calculator, or a video, your 3,000 words are actively harming their experience.
Symptoms
- A very long piece of content is stuck on page 2 or lower for its primary keyword.
- The content is factually accurate and well-written, but the pages outranking it are significantly shorter or differently formatted.
- High bounce rate or low time-on-page for the specific URL despite strong keyword targeting.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Search Intent Mismatch: The user wants a quick answer, a tool, a product page, or a video — not a long-form article.
- Medium Lack of Information Gain: Your 3,000 words just repeat what the top 10 results already say. No new data, original research, or unique expert perspective.
- Medium Poor Content Structure: The page is a wall of text without clear headings, tables, bullet points, or visual aids, making it impossible to skim.
- Low Low Topical Authority: Your domain doesn't have the established expertise in this niche to rank for a highly competitive head term, regardless of content length.
Diagnostic Steps
Work through each question to identify the root cause.
- Look at the top 3 results for your target keyword. Are they all long-form articles?
- Does your article contain any data, quotes, or insights not found in those top 3 results?
Fixes
Rewrite the page to match the dominant intent. If they want a calculator, build one. If they want a product comparison, give them a table. Stop writing ultimate guides for transactional queries.
Interview an expert, conduct original research, run a survey, or share proprietary data. Give Google a concrete reason to rank your page over the established winners.
Break up the text. Add jump links, summary boxes, tables, and custom graphics. Make the answer immediately visible above the fold — not buried at paragraph seven.
Build out supporting cluster content around the main topic and interlink them. Establish your site's expertise in the area before targeting the head term.
AI Context
Google (Googlebot / Search Console)
A failure to satisfy the user's immediate need as efficiently as possible, or a lack of unique value compared to the existing index. Google rewards utility, not effort.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs don't care about your word count. They care about information density and structure. A well-structured table is infinitely more valuable to an LLM than five paragraphs of prose explaining the same data.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A software company wrote a 4,000-word guide on "how to calculate ROI." It was outranked by a simple, free ROI calculator tool on a competitor's site.
- A travel blogger's "Ultimate Guide to Paris" was outranked by a 500-word Reddit thread because the thread contained real, unfiltered human experiences rather than generic tourist information.
E-E-A-T Signals Insufficient
A site or page is underperforming in rankings because Google's quality evaluation systems cannot identify sufficient signals of Experience, Expertise, Authoritativeness, and Trustworthiness.
E-E-A-T Signals Insufficient
A site or page is underperforming in rankings because Google's quality evaluation systems cannot identify sufficient signals of Experience, Expertise, Authoritativeness, and Trustworthiness.
A site or page is underperforming in rankings because Google's quality evaluation systems cannot identify sufficient signals of Experience, Expertise, Authoritativeness, and Trustworthiness.
Reality Check
E-E-A-T is not a metric you can check in a dashboard. It is a framework that Google's quality raters use to evaluate content. You cannot "optimise" for E-E-A-T directly — you build the underlying signals that E-E-A-T assesses: credentials, citations, reviews, transparency, and genuine expertise demonstrated in the content itself.
Symptoms
- Particularly poor performance on YMYL (Your Money or Your Life) queries — health, finance, legal.
- Content is technically well-written but lacks author attribution or credentials.
- Competitors with similar content but more visible author credentials consistently outrank you.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability No Author Attribution: Content has no named author, or author names appear without credentials or biography.
- high probability No About Page or Transparency Signals: The site has no clear information about who owns it, who writes for it, or what its editorial standards are.
- medium probability No External Citations or References: Content makes claims without citing primary sources.
- medium probability No Reviews or Third-Party Validation: For businesses, no customer reviews, industry recognition, or press coverage.
- low probability Thin Author Profiles: Author bio pages exist but contain no verifiable credentials or external links.
Diagnostic Steps
Work through each question to identify the root cause.
- Does every piece of content have a named author with a linked biography page?
- Does the author biography page include verifiable credentials (qualifications, experience, publications)?
- Does the site have a clear About page that identifies the organisation, its mission, and its editorial standards?
- Does the content cite primary sources for factual claims?
AI Context
Google (Googlebot / Search Console)
E-E-A-T signals are evaluated both algorithmically and by human quality raters. For YMYL topics, the bar is significantly higher — Google needs to be confident that the information comes from a genuinely qualified source.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems that generate responses cite sources they can identify as authoritative. A page with clear author credentials and institutional affiliation is more likely to be cited than an anonymous page with equivalent content.
At a Glance
Entity Ambiguity
Google cannot confidently determine what entity (person, organisation, place, concept) your page is primarily about, causing weak or incorrect SERP appearances.
Entity Ambiguity
Google cannot confidently determine what entity (person, organisation, place, concept) your page is primarily about, causing weak or incorrect SERP appearances.
Google cannot confidently determine what entity (person, organisation, place, concept) your page is primarily about, causing weak or incorrect SERP appearances.
Reality Check
Google's Knowledge Graph is built on entities, not keywords. If your content references the same name or term used for multiple distinct entities without disambiguation, Google must guess which one you mean. Structured data alone does not solve ambiguity — the surrounding prose and the consistency of references across your entire site are what allow Google to resolve entity identity with confidence.
Symptoms
- Your brand appears in SERPs associated with an incorrect Knowledge Graph panel or the wrong "People also search for" entities.
- Content targeting a specific named entity (person, product, place) ranks for unrelated variations of the same name.
- Click-through rates are low because SERP snippets describe a different entity than what users are looking for.
- Google shows mixed or inconsistent rich results (sometimes showing your schema markup, sometimes not).
Likely Causes
Ranked by probability. Highest probability cause first.
- High Unclear or ambiguous terminology: the primary entity name on your site is shared with another entity in Google's Knowledge Graph without disambiguation.
- Medium Missing or incomplete structured data: no schema markup to explicitly declare the entity type, name, identifier (Wikidata QID, ISNI, etc.), or relationships.
- Low Inconsistent entity references across the site: different pages refer to the same entity using different names, abbreviations, or aliases, creating conflicting signals.
Diagnostic Steps
Work through each question to identify the root cause.
- Search Google for the primary entity name your site is about (e.g. your brand name, the person's name, the product name). Does the Knowledge Panel or top results reflect your entity correctly, or does it show a different entity with the same or similar name?
- Does your site use schema markup (Organization, Person, Product, LocalBusiness, etc.) with explicit identifiers such as sameAs properties linking to Wikidata, LinkedIn, Companies House, or other authoritative sources?
Fixes
Add disambiguation context explicitly in the first paragraph of key pages. If your brand name or subject is shared with another well-known entity, name the differentiator: "X (the UK-based software company)" or "X (the economist, not the basketball player)". Consistency across all pages reinforces the signal.
Implement the most specific applicable schema type (Person, Organization, Product, LocalBusiness). Add sameAs properties linking to at least two authoritative external profiles (Wikidata, Wikipedia, LinkedIn, Companies House, ISNI). Validate with Google's Rich Results Test.
Audit every page on your site for how the primary entity is named. Standardise to a single primary name. Create a redirecting page for major aliases. Update all internal anchor text to use the canonical entity name consistently.
AI Context
Google (Googlebot / Search Console)
Google's entity understanding is central to the Knowledge Graph and to how the system assigns E-E-A-T signals to content about specific entities. An entity that cannot be resolved to a unique Knowledge Graph node receives weaker trust signals. Structured data with authoritative sameAs identifiers is one of the strongest disambiguation tools available to site owners.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs are particularly susceptible to entity confusion because they learn entity relationships from statistical co-occurrence in training text. A brand name shared with another well-known entity will cause the LLM to conflate the two unless disambiguating context is consistently present. Adding structured data and explicit disambiguation language increases the probability of correct attribution in AI-generated answers.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A UK tech startup called "Mercury" — the same name as the planet, the Roman god, and Freddie Mercury — struggled to appear in Knowledge Graph results. After adding Organization schema with sameAs links to their Wikidata entry, LinkedIn, and Companies House profile, their brand panel appeared within 6 weeks.
- A financial advisor named "James Williams" (an extremely common name) was invisible in branded search due to entity confusion. Adding Person schema with ISNI identifier, a Wikipedia article, and consistent cross-site name standardisation resolved the ambiguity.
Keyword Cannibalisation
Multiple pages on your site compete for the same keyword, splitting ranking signals and confusing Google about which page to serve.
Keyword Cannibalisation
Multiple pages on your site compete for the same keyword, splitting ranking signals and confusing Google about which page to serve.
Multiple pages on your site compete for the same keyword, splitting ranking signals and confusing Google about which page to serve.
Reality Check
More pages targeting the same phrase does not multiply your chances — it divides them. Google must pick one page to rank for a given query. When two of your pages compete, Google's job is harder, internal link equity is split, and both pages rank worse than one consolidated page would have.
Symptoms
- Two or more of your URLs appear in Search Console impressions data for the same primary keyword.
- Rankings for an important keyword fluctuate between two of your own URLs from week to week.
- Neither page ranks in the top 10 despite both being well-optimised and having decent backlinks.
- Ahrefs or Semrush keyword ranking reports show the same keyword mapped to different landing URLs over time.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Overlapping content themes: similar topics covered across multiple URLs without distinct angle differentiation.
- Medium Broad keyword over-targeting: multiple pages optimised for the same generic head term rather than differentiated long-tail variations.
- Low No content consolidation strategy: the site has grown organically over years without auditing for topic overlap.
Diagnostic Steps
Work through each question to identify the root cause.
- Search Google for site:yourdomain.com "your target keyword" — do two or more of your own pages appear in the results?
- Do the two competing pages target meaningfully different user intents (e.g. one is informational, one is transactional)?
Fixes
Merge the two overlapping pages into one comprehensive resource. 301-redirect the lower-authority URL to the consolidated page. Update all internal links to point to the surviving URL.
Assign one primary keyword per page. Differentiate remaining pages with long-tail modifiers (e.g. "for beginners", "for enterprise", "free tools for"). Update meta titles and H1s accordingly.
Run a quarterly content audit using a keyword-to-URL mapping spreadsheet. Every important keyword should have one and only one designated target page. Flag overlaps for consolidation or differentiation.
AI Context
Google (Googlebot / Search Console)
Google's ranking systems must select one result per query per domain (with rare exceptions). Cannibalising pages split link equity, anchor text signals, and click data across two URLs, making both weaker. Google often resolves cannibalisation by demoting both pages in favour of a competitor with a single authoritative resource.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
When multiple pages from the same domain make similar claims about the same topic, LLMs have difficulty determining which is authoritative. Consolidation into a single definitive resource increases the probability of that resource being cited in AI-generated answers.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A SaaS company had four blog posts all targeting "project management software for remote teams." None ranked above position 18. After merging them into one 3,000-word guide, the consolidated page reached position 5 within 8 weeks.
- An e-commerce retailer had a category page and a blog post competing for "best running shoes for flat feet." The blog post had more links but the category page had more commercial intent signals. Redirecting the blog post to the category page and adding its content as an editorial section lifted the category page from position 14 to position 3.
Missing Structured Data Markup
Absence of schema.org structured data prevents Google from enabling enhanced search features, reducing click-through rates even when rankings are strong.
Missing Structured Data Markup
Absence of schema.org structured data prevents Google from enabling enhanced search features, reducing click-through rates even when rankings are strong.
Absence of schema.org structured data prevents Google from enabling enhanced search features, reducing click-through rates even when rankings are strong.
Reality Check
Content without structured data is like a book without a table of contents — harder for engines to understand and display properly. Structured data does not directly improve rankings, but it unlocks rich results: star ratings, FAQs, recipes, event dates, product prices. These rich results increase click-through rates materially. Sites that implement structured data correctly earn more clicks from the same ranking position than sites that do not.
Symptoms
- No rich snippets (stars, FAQs, prices, events) appearing in search results despite content that qualifies.
- Lower click-through rates than expected given your average ranking position.
- Poor visibility in knowledge panels, carousels, or featured snippet positions that require structured data.
- Google Search Console Enhancements section is empty — no rich result types are tracked.
Likely Causes
Ranked by probability. Highest probability cause first.
- High No schema markup implemented: the site has never added structured data of any kind.
- Medium Incorrect or incomplete schema: markup is present but fails validation due to missing required properties or malformed JSON-LD.
- Low Schema not matching visible content: markup is outdated, duplicated from another page, or contradicts what users can actually see on the page.
Diagnostic Steps
Work through each question to identify the root cause.
- Run your key pages through Google's Rich Results Test (search.google.com/test/rich-results). Does it detect any valid structured data?
- Does the structured data on the page accurately reflect the content a user can see when they visit the page?
Fixes
Implement JSON-LD structured data on all key page types. Start with the most impactful types for your site: Article for editorial content, Product for e-commerce, FAQ for support pages, LocalBusiness for location-based sites. Use Google's Structured Data Markup Helper to generate initial markup and validate before deployment.
Validate all existing markup using the Rich Results Test and Schema Markup Validator (validator.schema.org). Fix errors in required fields first (name, description, image for most types). Add recommended fields to maximise eligibility for rich results.
Audit structured data against live page content on a quarterly basis. Automate extraction where possible — CMS fields should populate JSON-LD dynamically rather than via hardcoded templates that become stale. Never mark up content that is not visible to users.
AI Context
Google (Googlebot / Search Console)
Google uses structured data to understand entity relationships, content type, and key facts about a page. Correctly implemented schema increases eligibility for rich results, which Google's own research indicates materially improve click-through rates. Structured data also informs Google's Knowledge Graph — entities with correct markup are more likely to appear in entity-based features.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Structured data does not directly affect LLM training, but pages that earn rich results generate higher click-through rates and more user engagement signals. For AI Overview citations, structured data helps Google identify pages as authoritative sources for specific entity types, increasing the probability of citation.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A recipe blog with 500 posts added Recipe schema to all articles over a two-week period. Impressions for recipe-related queries increased 34% as rich results became eligible across the site. Click-through rate improved from 2.1% to 5.8% for queries where rich snippets appeared.
- An events site had Event schema implemented but with dates hardcoded in a template that was never updated. Google flagged the structured data as spam — past event dates being served as future events. After automating date population from the CMS, rich results returned and the manual action was lifted.
Search Intent Mismatch
You created a piece of content that answers a question the user isn't actually asking, or provides a format they don't want.
Search Intent Mismatch
You created a piece of content that answers a question the user isn't actually asking, or provides a format they don't want.
You created a piece of content that answers a question the user isn't actually asking, or provides a format they don't want.
Reality Check
If the top 10 results for a query are all e-commerce category pages, and you wrote a 2,000-word informational blog post about the topic, you will never rank. Google has already decided what the user wants to see, and it isn't your blog post.
Symptoms
- A page ranks well for a few days or weeks after publication, then drops off a cliff.
- High impressions but extremely low CTR (Click-Through Rate) in Search Console.
- High bounce rate and low time-on-page despite good content quality.
- You are the only informational result in a sea of transactional pages, or vice versa.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Wrong Content Format: The user wants a listicle, a tool, a video, or a product page — and you gave them a long-form article.
- Medium Wrong Funnel Stage: You are trying to sell to someone who is researching, or explaining basics to an expert looking for a specific solution.
- Medium Ambiguous Query: The keyword has multiple meanings and Google favours the dominant one you didn't target.
- Low Shifting Intent: What users wanted last year is different from what they want today — the SERP composition has changed.
Diagnostic Steps
Work through each question to identify the root cause.
- Google the exact keyword you want to rank for in an incognito window. What format dominates the top 3 results?
- Does your page directly answer the primary question faster or better than the top 3 results?
Fixes
Restructure your content to match the dominant SERP format. If the SERP is full of lists, make your content a list. If it's tools, build a tool. Stop forcing blog posts onto transactional queries.
Adjust the copy to speak directly to the user's current mindset. Informational queries need education; transactional queries need reassurance and conversion paths.
Accept that you won't rank for the dominant meaning, or target a longer-tail, more specific variation of the keyword where the intent clearly matches your content.
Monitor the SERPs for your target queries every quarter. If the dominant format changes (e.g., Google starts showing AI Overviews instead of blog posts), update your content strategy to match.
AI Context
Google (Googlebot / Search Console)
A fundamental failure to satisfy the user's underlying need, leading to poor user engagement signals like pogo-sticking back to the results page.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs are excellent at parsing intent from natural language. If your content doesn't directly address the specific nuance of a query, an LLM like ChatGPT or Perplexity will bypass your page for one that does — regardless of your organic ranking.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A SaaS company wrote an "Ultimate Guide to CRM Software" targeting "best CRM software." They were outranked by G2, Capterra, and PCMag because the user intent was comparison, not education.
- A local bakery tried to rank their "History of Sourdough" blog post for the keyword "sourdough bread." The intent was transactional, so Google ranked local bakeries' product pages instead.
Sudden Drop in Discover Traffic
A sharp, sustained decline in traffic originating from Google Discover, often coinciding with a Discover-specific core update.
Sudden Drop in Discover Traffic
A sharp, sustained decline in traffic originating from Google Discover, often coinciding with a Discover-specific core update.
A sharp, sustained decline in traffic originating from Google Discover, often coinciding with a Discover-specific core update.
Reality Check
Discover traffic is inherently volatile, but a permanent drop following the February 2026 Discover Core Update indicates a fundamental misalignment with Google's current priorities. You cannot recover through technical tweaks. The fix requires addressing local relevance, topic-specific expertise, and content originality at an editorial level.
Symptoms
- Google Search Console shows a steep, sustained cliff in Discover clicks and impressions.
- Organic search traffic remains relatively stable while Discover traffic flatlines.
- The drop coincides with a confirmed Google Discover Core Update (e.g., February 2026).
- New content fails to surface in Discover, whereas it previously did so reliably.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Lack of Local Relevance:: The content targets a global audience, but Google's systems now prioritise publishers based in the user's specific country.
- High Diluted Topic Expertise:: The site covers too many disparate topics. Google evaluates expertise on a granular, topic-by-topic basis, not domain-wide.
- High Sensationalism or Clickbait:: Titles or content rely on outrage, exaggeration, or misleading framing, which the algorithm actively demotes.
- High Lack of Originality:: The content aggregates or repurposes existing news without adding substantial original reporting, analysis, or value.
Diagnostic Steps
Work through each question to identify the root cause.
- Verify the Timeline:: Cross-reference the traffic drop in Google Search Console with the dates of known Discover core updates.
- Analyse the Fallen Content:: Identify the specific articles that lost Discover visibility. Are they outside your core niche? Are the titles sensational or misleading?
- Assess Topic Focus:: Does your site have a clear, dominant topic? If you are a general news site publishing occasional tech reviews, your tech reviews may no longer surface because you lack dedicated topic expertise in that area.
- Evaluate Originality:: Assess honestly whether the content is truly original or a rewrite of a trending story.
Fixes
Stop publishing outside your core area of expertise. Establish deep, topic-specific authority by creating comprehensive content hubs around your primary subjects.
Ensure your content reflects local expertise. Highlight local context, authors, and relevance where applicable.
Review your editorial guidelines. Eliminate clickbait, exaggerated claims, and sensationalism. Ensure the title accurately reflects the content.
Shift resources from aggregating news to producing original research, interviews, or in-depth analysis that cannot be found elsewhere.
AI Context
Google (Googlebot / Search Console)
AI systems evaluating content for Discover look for strong E-E-A-T signals at the topic level. They are trained to identify and filter out derivative content and clickbait.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
To succeed in Discover and AI-driven content surfaces, the content must provide unique value that an AI cannot easily synthesise from other sources.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A lifestyle blog that occasionally writes about finance loses all Discover traffic for its finance articles because it lacks topic-specific expertise in that area.
- A news aggregator sees a permanent drop in Discover traffic as Google prioritises the original publishers of the stories it was summarising.
Thin Content Penalty
Google considers your pages to offer little or no added value compared to what already exists on the web.
Thin Content Penalty
Google considers your pages to offer little or no added value compared to what already exists on the web.
Google considers your pages to offer little or no added value compared to what already exists on the web.
Reality Check
"Thin" does not mean "short." A 50-word answer that perfectly solves a user's problem is high-quality. A 2,000-word article generated by AI that just summarises the top 5 search results is thin content. Thinness is a measure of information gain, not word count.
Symptoms
- Manual action notification in Google Search Console for "Thin content with little or no added value."
- Algorithmic suppression (HCU hit) where your entire domain loses visibility across all queries.
- Pages are consistently crawled but not indexed — Google reads them and rejects them.
- High bounce rates and low time-on-page for informational queries where users expect depth.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Scaled AI/Programmatic Content: Generating thousands of location pages or article summaries without human editing or unique value.
- High Affiliate Content Without Original Testing: Reviewing products by rewriting Amazon descriptions and specs without ever handling the product.
- Medium Doorway Pages: Pages created purely to rank for specific keywords that funnel users to a single destination.
- Medium Scraped/Syndicated Content: Copying content from other sites without adding substantial original commentary or curation.
Diagnostic Steps
Work through each question to identify the root cause.
- Do you have a Manual Action notification in Google Search Console?
- Does your content contain original research, unique expert quotes, or proprietary data not available elsewhere?
Fixes
Delete or noindex the low-value pages. Consolidate remaining pages into fewer, higher-quality resources. Stop publishing unedited AI summaries — AI is a drafting tool, not a publishing pipeline.
Buy and test the products. Include original photos, specific pros and cons not found on the manufacturer's site, and a clear methodology for how you tested them.
Remove them. Create a single, comprehensive page that serves the user's intent better than multiple fragmented pages ever could.
Stop scraping. If syndicating, ensure you use the rel="canonical" tag pointing to the original source, or add significant original commentary that transforms the piece.
AI Context
Google (Googlebot / Search Console)
A waste of index space and a poor user experience. Google wants to reward original creators, not aggregators who add no new information to the web.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs are trained on the entire web and already know what the consensus is. If your page just repeats the consensus, an LLM has no reason to cite you over the original source. Only original data, proprietary insights, or unique framing earns LLM citations.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A local service business created 500 identical pages for every town in their state, changing only the city name in the text. Google flagged it as a doorway page penalty in Search Console.
- A tech blog lost 90% of its traffic after the Helpful Content Update because every article was a rewritten version of press releases from major tech companies — no original commentary, no testing.
Video Content Not Indexed
Videos published on the site are not appearing in Google Video search results or as video rich results in web search.
Video Content Not Indexed
Videos published on the site are not appearing in Google Video search results or as video rich results in web search.
Videos published on the site are not appearing in Google Video search results or as video rich results in web search.
Reality Check
Google requires a dedicated "watch page" for each video — a page where the video is the primary content. Videos embedded on pages where they are secondary to text content are less likely to be indexed as video results.
Symptoms
- No video rich results (thumbnail + duration) appearing in web search for target queries.
- Search Console's Video Indexing report shows videos as "not indexed."
- YouTube videos embedded on the site are indexed on YouTube but not associated with the site.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability No Dedicated Watch Page: Videos are embedded on content pages where they are secondary to text, rather than having their own dedicated page.
- high probability Missing Video Structured Data: No VideoObject schema markup on the video page.
- medium probability Video Not Crawlable: The video file or embed is blocked by robots.txt or loaded via JavaScript that Googlebot cannot render.
- medium probability Missing Video Sitemap: No video sitemap submitted to Search Console.
- low probability Low-Quality Video Content: Google does not consider the video valuable enough to index as a video result.
Diagnostic Steps
Work through each question to identify the root cause.
- Does each video have its own dedicated page where it is the primary content?
- Is VideoObject structured data implemented on each video page?
- Is the video file or embed accessible to Googlebot?
- Is a video sitemap submitted to Search Console?
AI Context
Google (Googlebot / Search Console)
Google indexes videos as a distinct content type. The dedicated watch page requirement reflects Google's need to associate a video with a specific URL for indexing purposes.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems increasingly surface video content in responses. A well-indexed video with structured data and a transcript is more likely to be cited in AI-generated responses that include video recommendations.
At a Glance
HTTP 100 Continue Confusion
The server acknowledges the headers and tells the client it can stop faffing and send the body now.
HTTP 100 Continue Confusion
The server acknowledges the headers and tells the client it can stop faffing and send the body now.
The server acknowledges the headers and tells the client it can stop faffing and send the body now.
Reality Check
Most SEOs treat HTTP 100 Continue like some mystical code from the dark web when in fact it is just a polite nod from the server saying 'Carry on, nothing to see here'. If you obsess over it, you’ve already lost.
Symptoms
- Browser or client hangs waiting to send the full request payload
- Unusual delays in form submissions or file uploads
- Server logs show status 100 before final response but workflow stalls
Likely Causes
Ranked by probability. Highest probability cause first.
- High Client misinterpretation of 100 Continue: Many HTTP clients or scripts do not handle this provisional response properly and wait indefinitely or timeout.
- Medium Premature request body sending: Clients send the request body without waiting for the 100 Continue signal, causing confusion in some server setups.
- Low Proxy or firewall interference: Middlemen that don’t understand this interim status may block or drop the 'Continue' message causing client-server deadlock.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the client wait for the 100 Continue response before sending the body?
- Are there any proxies or firewalls between client and server that might strip or mishandle 100 Continue?
Fixes
Ensure your HTTP client library or script supports and waits for HTTP 100 Continue before sending the body. Upgrade or patch as necessary.
Adjust client behaviour or HTTP settings to respect the 100 Continue handshake. Disable 'Expect: 100-continue' header if your client can’t handle it gracefully.
Configure network intermediaries to allow provisional HTTP statuses. If impossible, route around or disable problematic devices.
AI Context
Google (Googlebot / Search Console)
Googlebot typically does not send large request payloads requiring 100 Continue negotiation; this is mostly irrelevant for crawling but can affect API endpoints the bot interacts with.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models gloss over 100 Continue as ‘server is ready’ and rarely factor it into response generation or RAG workflows since it does not affect final HTTP content.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A bespoke API client used by an agency failed to handle 100 Continue resulting in endless stalls during big file uploads, fixed by switching HTTP libraries.
- Corporate proxy dropped 100 Continue headers causing internal tools to timeout; bypassing proxy restored sanity.
HTTP 101 Switching Protocols Confusion
The server agrees to change communication protocols mid-connection, usually at the client’s request, and promptly switches without fuss.
HTTP 101 Switching Protocols Confusion
The server agrees to change communication protocols mid-connection, usually at the client’s request, and promptly switches without fuss.
The server agrees to change communication protocols mid-connection, usually at the client’s request, and promptly switches without fuss.
Symptoms
- Server responds with status code 101 instead of a usual 200 OK.
- Client connection switches protocols, often to WebSocket or HTTP/2.
- Confusion or misinterpretation of 101 as a failure or red flag in logs.
Likely Causes
Ranked by probability. Highest probability cause first.
- High WebSocket Upgrade Request: The client is asking to upgrade from HTTP to WebSocket, so the server obliges by sending a 101.
- Medium HTTP/2 or HTTP/3 Negotiation: The client initiates a protocol switch for better performance, prompting the 101 response.
- Low Misconfigured Server or Proxy: Occasionally, a server or intermediary might trigger 101 unnecessarily, confusing non-technical tools or SEOs.
Diagnostic Steps
Work through each question to identify the root cause.
- Has the issue been reproduced consistently?
- Has the issue been reproduced consistently?
Fixes
Review server configuration and logs. Consult your hosting provider if the issue persists.
Review server configuration and logs. Consult your hosting provider if the issue persists.
At a Glance
HTTP 103 Early Hints Misconfiguration
HTTP 103 Early Hints is a subtle nudge from your server to the browser, hinting what to preload before the main show arrives - when it works properly.
HTTP 103 Early Hints Misconfiguration
HTTP 103 Early Hints is a subtle nudge from your server to the browser, hinting what to preload before the main show arrives - when it works properly.
HTTP 103 Early Hints is a subtle nudge from your server to the browser, hinting what to preload before the main show arrives - when it works properly.
Reality Check
Most SEOs treat 103 Early Hints like a shiny toy - enabled it once, then promptly ignored it. The truth is, if you don’t configure your preload headers precisely, you’re just wasting bandwidth and confusing browsers. It’s not magic, it’s meticulous.
Symptoms
- Resources (CSS, JS, images) load slower despite 103 Early Hints being enabled.
- Browser console logs preload warnings or errors.
- No noticeable improvement in page load times after implementing Early Hints.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect or missing Link headers in 103 response: If your server sends 103 but without properly formatted Link headers, browsers ignore the hint altogether.
- Medium Server or CDN does not support or strip 103 responses: Some infrastructure layers silently drop 103 responses, negating any benefit.
- Low Overusing Early Hints with non-critical resources: Bombarding browsers with hints for every asset just clutters connection limits and slows priority assets.
Diagnostic Steps
Work through each question to identify the root cause.
- Are your 103 Early Hints responses including correctly formatted Link headers for critical resources?
- Does your server or CDN explicitly support 103 Early Hints, or does it silently remove it?
Fixes
Ensure the 103 response includes Link headers with `rel=preload` and correct `as` attributes. Syntax must be perfect or browsers ignore it.
Confirm your reverse proxies, load balancers, and CDNs pass through 103 status codes. Upgrade or configure them accordingly.
Limit 103 Link headers to truly critical resources - fonts, main CSS, hero images - not every last sprite or script file.
AI Context
Google (Googlebot / Search Console)
Googlebot can process 103 Early Hints, but only if the resource hints are precise and the server infrastructure honours the 103 status. It’s a performance optimisation, not a ranking signal.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models simply digest the final HTML. They do not parse or simulate HTTP 103 responses, so from their perspective, 103 Early Hints are invisible fluff.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site enabled 103 Early Hints but forgot to add proper Link headers. Result? No speed gain and a bloated initial response.
- A news site’s CDN stripped out 103 responses entirely, leading devs to waste weeks chasing phantom performance improvements.
HTTP 200 OK – When Success Isn’t Always Success
The server says ‘All good, here’s your content’, but that doesn’t mean all is truly well.
HTTP 200 OK – When Success Isn’t Always Success
The server says ‘All good, here’s your content’, but that doesn’t mean all is truly well.
The server says ‘All good, here’s your content’, but that doesn’t mean all is truly well.
Reality Check
The vast majority of SEO ‘experts’ treat a 200 status like a golden ticket – it’s not. A 200 OK is just the server’s way of saying ‘I delivered something’. It could be error-ridden HTML, a thin page, or even a cloaked doorway. Stop worshipping the status code and start auditing the content it serves.
Symptoms
- Pages return HTTP 200 status but show ‘Page Not Found’ or error messages in content
- Search engines index low-quality, thin, or duplicate pages despite 200 OK status
- Unexpected content variations or cloaked content served with 200 OK
Likely Causes
Ranked by probability. Highest probability cause first.
- High Soft 404s masquerading as 200 OK: The server returns a 200 status but the page content clearly indicates an error or ‘not found’ situation, fooling both users and bots.
- Medium Thin or duplicate content served with 200 OK: Legitimate page response code but the content lacks substance or unique value, undermining SEO.
- Low Cloaking or hidden content behind a 200 OK: Server delivers different content to users and search engines under the guise of a successful 200 status, risking penalties.
Diagnostic Steps
Work through each question to identify the root cause.
- Has the issue been reproduced consistently?
- Has the issue been reproduced consistently?
Fixes
Review server configuration and logs. Consult your hosting provider if the issue persists.
Review server configuration and logs. Consult your hosting provider if the issue persists.
At a Glance
HTTP 201 Created
The server has successfully created a new resource, typically signalling success in API requests.
HTTP 201 Created
The server has successfully created a new resource, typically signalling success in API requests.
The server has successfully created a new resource, typically signalling success in API requests.
Reality Check
Most SEOs treat 201 like a congratulatory badge and then move on, blissfully ignoring whether the created resource is actually accessible or indexed. Spoiler: success means nothing if Google can’t see it.
Symptoms
- API responses consistently return HTTP 201 status
- Newly created resources fail to appear in search results
- Confusion over whether 201 means ‘job done’ SEO-wise
Likely Causes
Ranked by probability. Highest probability cause first.
- High Resource not publicly accessible: You get a shiny 201 response but the URL is blocked by robots.txt or requires authentication. Congratulations, you created a ghost.
- Medium No indexable content at the new URL: The server confirms creation but serves empty, placeholder, or irrelevant content-Google sees nothing worth ranking.
- Low Misconfigured Location header: The response either omits or misuses the Location header, making it unclear where the new resource lives.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the newly created resource accessible by a standard browser or crawler without authentication?
- Does the resource contain meaningful, indexable content?
Fixes
Ensure the resource’s URL is crawlable, remove robots.txt exclusions, and disable authentication barriers for public endpoints.
Populate the created resource with SEO-friendly, crawlable content rather than empty shells or placeholders.
Add or correct the Location header in the 201 response to point precisely to the new resource’s URL.
AI Context
Google (Googlebot / Search Console)
Google treats 201 as a standard success code indicating resource creation, but indexes only what it can crawl. A 201 without accessible content is invisible in practice.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models see 201 as a technical confirmation. When generating or verifying content, they expect a meaningful resource behind that status, not just a success flag.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce API returns 201 after order creation but the order detail page is behind login, leaving nothing for search engines.
- A blog platform’s API returns 201 but the new post URL serves a skeleton page without text, frustrating indexing efforts.
HTTP 202 Accepted
The server acknowledges your request but hasn’t bothered finishing it yet.
HTTP 202 Accepted
The server acknowledges your request but hasn’t bothered finishing it yet.
The server acknowledges your request but hasn’t bothered finishing it yet.
Reality Check
Most SEOs either ignore the 202 status or confuse it with success-spoiler alert-it’s an 'in progress' signal, not a green light. Treating it like a done deal is a rookie mistake.
Symptoms
- Page returns a 202 status code instead of 200 or 3xx.
- Content expected immediately is missing or incomplete.
- Crawlers appear to drop the page or delay indexing.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Asynchronous processing configured: The server accepts tasks like form submissions or API calls but queues them for later processing, signalling ‘not done yet’.
- Medium Misconfigured response headers: Server or application sends 202 when it should return 200 or a redirect, confusing bots and users alike.
- Low Third-party services delay: External systems or microservices handling parts of the request lag behind, leaving the initial response as a placeholder.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the server explicitly designed to handle requests asynchronously?
- Is the asynchronous processing completing and updating the content within a reasonable timeframe?
Fixes
Ensure your system updates the resource with a final status 200 or relevant redirect once complete. Use webhooks or polling to inform clients and bots.
Audit your HTTP responses and correct status codes to reflect the actual state-202 only for accepted-but-pending, not complete content delivery.
Optimise your service architecture to minimise lag or provide fallback content until processing finishes.
AI Context
Google (Googlebot / Search Console)
Google treats 202 as temporary. Crawlers may defer indexing or consider the page unstable until it returns a definitive 200 or 3xx. Persistent 202 responses risk being dropped from the index.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models relying on RAG or API data see 202 as a 'processing' placeholder. They expect follow-ups or finalised data; otherwise, the answer is incomplete or uncertain.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce API returns 202 on order submission but never updates the order status page promptly, causing Google to index a blank or incomplete page.
- A CMS using asynchronous publishing returns 202 on initial requests; bots crawl during processing, missing the final content state and dropping rankings.
HTTP 203 Non-Authoritative Information
The server successfully fulfilled the request but the returned meta-information isn't exactly from the origin - it's been altered or supplied by a third party.
HTTP 203 Non-Authoritative Information
The server successfully fulfilled the request but the returned meta-information isn't exactly from the origin - it's been altered or supplied by a third party.
The server successfully fulfilled the request but the returned meta-information isn't exactly from the origin - it's been altered or supplied by a third party.
Reality Check
Most SEOs treat HTTP 203 like some obscure curiosity or ignore it altogether. In truth, it's a polite warning flag that your content's provenance is questionable, and Google’s no fool - it notices these slip-ups and shrugs them off or worse.
Symptoms
- Pages return a 203 status code instead of the reliable 200 OK.
- Meta-information or headers differ from the origin server’s original content.
- Search engines show inconsistent indexing or snippet data for affected pages.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Proxy or Gateway Interference: An intermediary server modifies the metadata, often for caching or compliance reasons, resulting in non-authoritative info.
- Medium Content Delivery Network (CDN) Alterations: CDNs sometimes inject headers or tweak responses, causing a mismatch from the origin.
- Low Misconfigured Server or Application: The origin server or app erroneously sends a 203 instead of a 200, signalling altered meta-info when there isn’t any.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the response passing through a proxy, CDN, or intermediary?
- Does the intermediary intentionally modify headers or meta-information?
Fixes
Configure the proxy to preserve origin meta-information or return a 200 status if changes are negligible. Avoid unnecessary metadata tampering.
Disable header or metadata injection features that cause 203 responses or coordinate with CDN support to ensure faithful origin replication.
Correct server or application settings to return 200 OK when content and metadata are authoritative.
AI Context
Google (Googlebot / Search Console)
Google treats a 203 as a caution; the content might be modified or unreliable, so it discounts it slightly compared to a clean 200 OK. It’s not a deal-breaker, but repeated occurrences erode trust signals.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models using retrieval-augmented generation or similar rely on freshness and fidelity. A 203 status signals potential metadata discrepancy, which may reduce confidence in snippet accuracy or source reliability.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A news aggregator proxy returns 203 statuses when injecting sponsored tags in metadata, leading to inconsistent Google snippet updates.
- A CDN adds tracking headers causing origin and edge metadata to diverge, confusing search engines and lowering snippet quality.
HTTP 204 No Content
The server did its job but deliberately chose not to send anything back, leaving you staring into the void.
HTTP 204 No Content
The server did its job but deliberately chose not to send anything back, leaving you staring into the void.
The server did its job but deliberately chose not to send anything back, leaving you staring into the void.
Reality Check
Most SEOs treat 204 like a polite nod instead of a firm handshake - they miss that it means ‘Job done, nothing to see here,’ which sometimes confuses crawlers and browsers alike.
Symptoms
- Browser or client receives a successful response but no content displays
- Crawlers skip indexing the page because there is no document body to parse
- Client-side scripts expecting data get nothing and may error or hang
Likely Causes
Ranked by probability. Highest probability cause first.
- High Intentional empty response: The server is designed to confirm an action succeeded but has no new content to deliver, such as after form submissions or AJAX calls.
- Medium Misconfigured endpoint: A handler returns 204 when it should return content, often due to poor API or CMS setup.
- Low Caching or proxy interference: Intermediate caches or proxies strip the content mistakenly and downgrade the response to 204.
Diagnostic Steps
Work through each question to identify the root cause.
- Is your endpoint supposed to return data or just acknowledge an action?
- Is your client or crawler expecting HTML or JSON content?
Fixes
Confirm 204 is the right status. If you want to deliver minimal confirmation, consider 200 with a small JSON payload for clarity.
Adjust server or CMS settings to return proper content or a 200 status with valid body. Review API response specifications.
Investigate caching rules and proxy configurations to ensure they preserve the original status and content.
AI Context
Google (Googlebot / Search Console)
Googlebot treats 204 as a successful response but notes there is no content to index, effectively skipping the URL for content purposes.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models relying on retrieval augmented generation (RAG) or external data sources get no textual payload from a 204 response, resulting in no fresh context to process.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site uses 204 after AJAX cart updates to speed up the user experience but forgets to update UI elements, leaving users confused.
- A CMS plugin misfires 204 responses on pages where content is expected, causing search engines to drop pages from the index.
HTTP 205 Reset Content
The server tells the client to reset the document view after fulfilling the request, because apparently clearing your screen is a thing.
HTTP 205 Reset Content
The server tells the client to reset the document view after fulfilling the request, because apparently clearing your screen is a thing.
The server tells the client to reset the document view after fulfilling the request, because apparently clearing your screen is a thing.
Reality Check
Most SEOs wouldn’t know HTTP 205 from a polite cough. They expect it to be a magic fix for user experience, but in reality, it’s a niche status code that rarely impacts SEO directly - yet they still waste time chasing it.
Symptoms
- Browser or user agent resets the current document view unexpectedly
- Form fields or interactive elements clear after submission
- Rare or inconsistent use of the 205 status in server responses
Likely Causes
Ranked by probability. Highest probability cause first.
- High Intentional UI reset after data submission: The server deliberately instructs the client to clear a form or reset the interface following a POST or PUT request, often for UX reasons.
- Medium Misconfigured server response: Server erroneously sends 205 when another status code (like 200 or 204) would be more appropriate, confusing clients.
- Low Legacy or experimental application logic: Older or unusual web applications use 205 as part of bespoke client-server interactions that modern browsers barely support.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your server-side logic intentionally require the client to reset the current document view after submitting data?
- Is the user experience improved by automatically clearing or resetting the form/view on submission?
Fixes
Ensure your client-side code properly handles 205 responses and resets views only when appropriate. Test across browsers to avoid inconsistent behaviour.
Audit your server response logic. Replace inappropriate 205 with 200 or 204 as per the actual process requirements.
Modernise your application stack. Remove reliance on 205 where possible in favour of clearer status codes and explicit client-side scripting controls.
AI Context
Google (Googlebot / Search Console)
Google’s crawler treats 205 as a successful response with instructions to reset the document view but this has minimal SEO impact. It rarely affects indexing or ranking.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models regard 205 as a niche HTTP status with limited practical consequences, often glossing over its specific user-agent reset intent in favour of broader HTTP success codes.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A web form submitting user preferences sends 205 to clear the form post-submission, preventing duplicate entries.
- An AJAX-heavy single-page application misuses 205, causing clients to reload parts of the view unnecessarily, confusing users.
HTTP 206 Partial Content
The server sends only a slice of a resource because the client asked for a specific byte range instead of the whole file.
HTTP 206 Partial Content
The server sends only a slice of a resource because the client asked for a specific byte range instead of the whole file.
The server sends only a slice of a resource because the client asked for a specific byte range instead of the whole file.
Reality Check
Most SEOs treat 206 like some mysterious error needing a magic fix. Newsflash – it’s neither an error nor a bug. It’s a perfectly normal, even desirable, response when clients want a chunk, not the whole loaf. Stop freaking out.
Symptoms
- Browser or bot receives only part of an asset, like an image or video
- Crawlers report incomplete content or partial indexing of media files
- Range requests logged frequently in server access logs
Likely Causes
Ranked by probability. Highest probability cause first.
- High Legitimate range requests from clients: Browsers, media players or bots asking for a segment of a file to optimise loading time and bandwidth.
- Medium Misconfigured caching or proxy servers: Caches or CDNs improperly handling or forwarding range headers, causing unintended partial content delivery.
- Low Broken or buggy client implementations: Some crawlers or custom clients improperly sending range headers or expecting full content but only getting partial.
Diagnostic Steps
Work through each question to identify the root cause.
- Are the 206 responses associated with requests containing a ‘Range’ header?
- Is the partial content being served to legitimate clients like browsers or media players?
Fixes
Leave well alone. This response optimises user experience by reducing load times.
Audit cache and proxy rules to respect or properly forward range headers, ensuring consistent content delivery.
Identify offending clients and either update them or block malformed range requests.
AI Context
Google (Googlebot / Search Console)
Googlebot understands 206 perfectly. Partial content responses are normal during crawling or fetching large resources. It does not penalise sites for serving 206 when requested.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models have no direct concept of HTTP status codes but rely on data and context. They might misinterpret partial content as incomplete data unless explicitly told otherwise.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- Video streaming sites frequently serve HTTP 206 responses to enable scrubbing and partial buffering without downloading the entire file.
- Image-heavy websites with lazy loading use range requests to speed up initial page render by fetching only image segments as needed.
HTTP 300 Multiple Choices
The server is throwing up its hands because it can’t pick a single resource from multiple valid options for your request.
HTTP 300 Multiple Choices
The server is throwing up its hands because it can’t pick a single resource from multiple valid options for your request.
The server is throwing up its hands because it can’t pick a single resource from multiple valid options for your request.
Reality Check
Most 'SEO experts' treat 300 status codes like a minor curiosity or ignore them altogether - meanwhile, Google’s bots see indecision as a waste of crawl budget and a user experience fail. If your server can’t decide, neither can search engines, and you’re just making life harder for everyone.
Symptoms
- Search engines receive a 300 response instead of a clean 200 or a proper redirect.
- Users may be prompted with a choice page or get inconsistent content delivery.
- Crawlers may stop or slow indexing due to ambiguous resource resolution.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Server configuration with multiple valid representations: Your server is presenting multiple versions of the same resource - language variants, formats, or encodings - but hasn’t been configured to resolve which one to deliver.
- Medium Content negotiation mismanagement: Your content negotiation headers (Accept, Accept-Language, etc.) are either missing or poorly handled, confusing the server into offering choices rather than picking one.
- Low CMS or application ambiguity: Your content management system or web application is generating multiple URLs or content variants without canonical guidance or redirect logic, causing the server to default to 300.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your server respond with a 300 status code to typical user-agent requests?
- Are you intentionally using content negotiation or variants served under the same URI?
Fixes
Adjust your server settings to prioritise one representation or implement 301 redirects to canonical resources rather than serving a 300 response.
Ensure your server correctly interprets headers and returns 200 with the appropriate variant, or use URL parameters/paths to specify variants explicitly.
Implement canonical tags, or better yet, configure your CMS to serve a single version with proper redirects rather than multiple competing versions.
AI Context
Google (Googlebot / Search Console)
Google hates ambiguity. A 300 Multiple Choices response means the crawler cannot determine which URL or content version to index, often resulting in delayed or incomplete indexing. Google prefers a direct 200 OK or a clear 301/302 redirect.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models that rely on retrieval augmented generation (RAG) or similar frameworks see a 300 as a sign of multiple possible sources for content. Without clear directives, they might pick inconsistent references or flag the content as unstable.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A multinational ecommerce site serves the same product page in multiple languages under the same URL but fails to specify which language variant the server should prioritise, resulting in 300 responses that confuse both users and crawlers.
- A media company attempts to serve different image formats (webp, jpeg, png) depending on browser support but misconfigures content negotiation headers, causing 300 responses that stall indexing.
HTTP 301 Moved Permanently
The server tells browsers and search engines that the requested page has shifted to a new URL for good, passing on its ranking mojo.
HTTP 301 Moved Permanently
The server tells browsers and search engines that the requested page has shifted to a new URL for good, passing on its ranking mojo.
The server tells browsers and search engines that the requested page has shifted to a new URL for good, passing on its ranking mojo.
Reality Check
Most SEOs waffle on about 'canonical power' but fail to grasp that a 301 is a blunt instrument - it either redirects properly or it doesn’t, and half the time they botch the setup with redirect chains that kill your rankings faster than a Google update.
Symptoms
- Search rankings drop after URL changes
- Traffic to old URLs doesn’t redirect properly
- Crawl errors or redirect chains found in logs or tools
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect redirect target: Pointing the 301 to the wrong or broken URL sabotages your entire effort.
- Medium Redirect chains or loops: Multiple 301s stacked like a house of cards, causing crawling delays and lost link equity.
- Low Temporary redirect confusion: Using 302 or other codes instead of 301 when the move is permanent, confusing search engines.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the old URL redirect cleanly and directly to the new URL?
- Is the redirect permanent (HTTP 301) and not temporary (302, 307)?
Fixes
Update the server or CMS redirect to point precisely to the new, canonical URL without trailing errors or typos.
Flatten redirects so old URLs point directly to the final destination in a single 301 response.
Replace any 302 or 307 status codes with 301 to signal permanence to search engines.
AI Context
Google (Googlebot / Search Console)
Google treats a 301 as a signal to transfer ranking signals from the old URL to the new one, typically indexing the new URL exclusively after re-crawling.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don't parse HTTP codes directly but rely on URL consistency and content signals; flawed redirects can confuse content association in RAG or retrieval-augmented generation setups.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client moved site sections but used multiple 301 hops, resulting in severe ranking drops and crawl budget wastage.
- A rebrand where 302 redirects were mistakenly deployed en masse, leaving the old domain ranking and the new domain invisible.
HTTP 302 Found
A temporary redirect telling browsers and search engines 'this content has moved for now, but don’t get too comfortable'.
HTTP 302 Found
A temporary redirect telling browsers and search engines 'this content has moved for now, but don’t get too comfortable'.
A temporary redirect telling browsers and search engines 'this content has moved for now, but don’t get too comfortable'.
Reality Check
Most SEOs treat 302 redirects like a polite shrug, not realising search engines treat them as temporary and often ignore link equity. If you want to move something properly, stop faffing about and use a 301.
Symptoms
- Your page rankings drop unexpectedly after implementing a redirect.
- Google Search Console flags crawl anomalies related to redirects.
- Link equity does not transfer as expected, causing traffic loss.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Misuse of 302 for permanent moves: The classic blunder - using 302 when the move is permanent, confusing search engines and losing SEO juice.
- Medium Temporary redirects for A/B testing or campaigns: Legitimate short-term redirects, but often left in place too long or treated like permanent fixes.
- Low Server or CMS misconfiguration: Systems defaulting to 302 instead of 301, usually due to lazy setups or flaky plugins.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the content permanently moved to a new URL?
- Is the redirect intended to be temporary, under a few weeks?
Fixes
Swap the 302 redirect to a 301 in your server or CMS configuration without delay.
Keep 302s strictly short-term and remove them once the temporary need expires.
Audit your redirect rules, update plugins, or patch your server settings to correctly signal permanent or temporary status.
AI Context
Google (Googlebot / Search Console)
Google treats 302 redirects as signals that the original URL remains canonical, so it won’t pass full link equity or update indexes to the target URL permanently.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models note the 302 as a ‘temporary move’, often suggesting it’s a less severe issue than a 404, but may recommend a 301 for SEO best practice.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site used 302 redirects for category restructures, causing their product pages to drop from search results as Google refused to transfer ranking signals.
- A news publisher ran A/B tests with 302s but forgot to revert them, resulting in confused crawling and indexing behaviour.
HTTP 303 See Other Confusion
HTTP 303 signals the client to fetch the requested resource from another URI using a GET method, not the original one.
HTTP 303 See Other Confusion
HTTP 303 signals the client to fetch the requested resource from another URI using a GET method, not the original one.
HTTP 303 signals the client to fetch the requested resource from another URI using a GET method, not the original one.
Reality Check
Most SEOs treat 303 redirects like 302s or 301s, blissfully ignoring that it’s specifically designed to convert POSTs to GETs - a nuance lost on all but those who’ve actually debugged broken forms and APIs.
Symptoms
- Redirects after form submissions or POST requests lead to a different URL.
- Browser changes method from POST to GET on redirect.
- SEO tools flag unexpected redirect chains or method switches.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Misunderstood redirect purpose: Developers or SEOs misapply 303 redirects as generic temporary redirects, ignoring their POST-to-GET intent.
- Medium Improper server configuration: Server sends 303 when a 301 or 302 would be more appropriate, causing unintended crawl behaviour.
- Medium API or form handling quirks: 303 used correctly in RESTful APIs or form processing but misunderstood by SEO tools, triggering false alarms.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the redirect happening immediately after a POST request, such as form submission?
- Is the redirect switching the request method from POST to GET?
Fixes
Educate dev teams on correct HTTP status usage-use 303 only when explicitly redirecting POSTs to GETs. Otherwise, default to 301 or 302.
Review and correct server or application settings to ensure 303 is only served after POST requests needing method change.
Document API behaviours clearly; update SEO tool filters or rules to avoid misclassifying legitimate 303 redirects as errors.
AI Context
Google (Googlebot / Search Console)
Google treats 303 like a temporary redirect but respects that it should follow the new URI with a GET request, thus indexing the target page correctly without passing link equity like a 301.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models parsing HTTP status explanations may conflate 303 with generic temporary redirects, missing the POST-to-GET method switch nuance unless explicitly trained or prompted.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce checkout form returns a 303 redirect to a confirmation page using GET, preventing users from accidentally resubmitting the form on reload.
- API endpoint returns 303 after a POST to signal the client to fetch the updated resource elsewhere, confusing monitoring tools that expect 200 or 201 codes.
HTTP 304 Not Modified
The server tells your browser ‘nothing new here’, so the browser serves up the cached version instead of fetching fresh content.
HTTP 304 Not Modified
The server tells your browser ‘nothing new here’, so the browser serves up the cached version instead of fetching fresh content.
The server tells your browser ‘nothing new here’, so the browser serves up the cached version instead of fetching fresh content.
Reality Check
Nearly every SEO beginner treats 304 like a magical green light for caching without understanding that excessive or improper 304 responses can silently sabotage content freshness and user experience. Yes, it’s "just caching", but ignore it at your peril.
Symptoms
- Search engines or users see outdated page content despite recent updates.
- Frequent 304 responses logged for pages known to change often.
- Slow or stale user experience due to cached content served instead of updated assets.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Improper Cache-Control or ETag headers: Your server’s telling browsers content hasn’t changed based on headers that are either misconfigured or too aggressive.
- Medium CMS or CDN caching rules overzealous: Content management systems or content delivery networks pushing 304 status even when content has updated.
- Low User’s browser cache interfering: Local cache on the user’s device stubbornly refusing to refresh despite server signals.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you seeing valid content updates on server but 304 responses still being served?
- Are your Cache-Control, Last-Modified and ETag headers configured correctly and consistently?
Fixes
Audit and correct your server’s Cache-Control, ETag and Last-Modified headers to ensure they accurately represent content changes. Avoid setting overly long max-age or stale-if-error values.
Review and adjust caching policies within your CMS plugins or CDN configuration to honour content freshness properly. Purge caches when content updates.
Advise users to clear browser cache or implement cache-busting techniques like versioned URLs for critical resources.
AI Context
Google (Googlebot / Search Console)
Googlebot respects 304 responses as valid signals to use cached content. However, if your 304s mask real content updates, Google may rank pages lower due to stale content detection.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models do not directly parse HTTP statuses but rely on indexed content snapshots. If content behind 304s remains unchanged in the index, LLMs generate responses based on outdated information.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A major retail site pushed aggressive ETag caching resulting in months-old product descriptions showing up in Google’s index.
- A news publisher’s CDN returned 304 for breaking news pages, causing search engines to miss critical updates and drop rankings.
HTTP 305 Use Proxy
The server insists you fetch the resource through a proxy specified in the Location header, no exceptions.
HTTP 305 Use Proxy
The server insists you fetch the resource through a proxy specified in the Location header, no exceptions.
The server insists you fetch the resource through a proxy specified in the Location header, no exceptions.
Reality Check
The 305 status code is about as common as a unicorn at a bus stop, yet plenty of 'SEO experts' pretend it’s some secret weapon. It isn’t. Most servers and browsers have long since abandoned support for it because it was a colossal security risk. If you see it, you’re either dealing with ancient tech or a misconfigured server begging for attention.
Symptoms
- Browser fails to load the requested resource directly
- Server responds with HTTP 305 status code pointing to a proxy URL
- Crawlers may ignore or misinterpret the response, causing indexing issues
Likely Causes
Ranked by probability. Highest probability cause first.
- High Legacy Server Misconfiguration: Some old or poorly maintained servers still send 305 responses, instructing clients to use proxies that may no longer exist or are insecure.
- Medium Proxy Enforcement: A deliberately configured proxy environment where access must go through a specified intermediary for monitoring or caching, though this is rarer these days.
- Low Malicious or Erroneous Setup: Server or network hijacking attempts could misuse 305 to redirect traffic via rogue proxies, or it’s simply an internal routing error.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your server or application explicitly set HTTP 305 in response headers?
- Is the Location header specifying a proxy URL that is accessible and trusted?
Fixes
Update or patch server software to stop sending 305 responses; migrate to standard HTTP 3xx redirects if proxying is necessary.
Ensure proxy URLs are valid, secure, and documented with clear user-agent instructions; consider modern alternatives like VPNs or application-layer proxies.
Audit security logs, disable rogue proxy settings, and tighten network controls to prevent injection of unsafe 305 responses.
AI Context
Google (Googlebot / Search Console)
Googlebot and most modern crawlers ignore HTTP 305 because it is deprecated and insecure. They expect direct access or standard redirects (301/302). Misuse of 305 can cause crawling and indexing failures.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models trained on web data recognise HTTP 305 as an outdated proxy directive. When assisting with SEO, they flag it as a potential misconfiguration or legacy artefact rather than a recommended practice.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A government agency’s intranet still sends 305 due to legacy proxy requirements, causing external crawlers to drop pages.
- An old CMS incorrectly configured to enforce proxy use, leading to broken public-facing links and poor SEO performance.
HTTP 306 Switch Proxy
An obsolete and currently unused HTTP status code reserved for future proxy-switching directives.
HTTP 306 Switch Proxy
An obsolete and currently unused HTTP status code reserved for future proxy-switching directives.
An obsolete and currently unused HTTP status code reserved for future proxy-switching directives.
Reality Check
If you’re chasing HTTP 306, you’re either stuck in a time warp or following some dusty RFC dead end. The internet moved on decades ago. Most SEOs haven’t even heard of it, let alone seen it in the wild.
Symptoms
- Server responds with status code 306, which modern browsers do not recognise.
- Proxy switching instructions are ignored or cause errors.
- Unexpected or confusing HTTP responses in logs referencing status 306.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Legacy or misconfigured server software: Some ancient or misapplied server setups might still emit this code despite it being deprecated.
- Medium Misunderstood documentation or custom implementations: Developers or engineers playing fast and loose with HTTP status codes might erroneously deploy 306.
- Low Faulty proxy or caching appliances: Some proxies could be misreporting status codes due to firmware bugs or outdated protocols.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you seeing HTTP 306 responses in current production or staging environments?
- Is your server or proxy software recently updated or custom-configured to handle proxy switching?
Fixes
Disable or update any modules or plugins that emit HTTP 306; revert to standard proxy handling methods.
Educate your team on the current HTTP spec. Replace 306 with appropriate, standard-compliant status codes like 302 or 307.
Update firmware or replace the offending hardware; ensure proxies adhere strictly to modern HTTP standards.
AI Context
Google (Googlebot / Search Console)
Google’s crawlers ignore HTTP 306 as obsolete; it neither aids nor hinders indexing but may cause crawling anomalies if present.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models treat HTTP 306 as an archaic relic with no practical relevance; they are likely to flag its presence as an error or misconfiguration.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A legacy corporate intranet server still emitting HTTP 306 when internal proxies switch, confusing modern browsers.
- A misconfigured caching proxy appliance incorrectly responding with HTTP 306 instead of standard 302, causing crawl failures.
HTTP 307 Temporary Redirect
A server telling your browser, ‘Go here instead - but keep everything exactly the same’, at least for now.
HTTP 307 Temporary Redirect
A server telling your browser, ‘Go here instead - but keep everything exactly the same’, at least for now.
A server telling your browser, ‘Go here instead - but keep everything exactly the same’, at least for now.
Reality Check
Nearly every SEO hack thinks a redirect is a redirect is a redirect. No. 307 insists your method and payload stay put, unlike the lazy 302s that lose their way. Yet, most SEOs blunder by treating it like a 302 or a 301, wrecking crawl budgets and user flows in the process.
Symptoms
- Unexpected redirect loops or failed form submissions
- Search engines not transferring link equity as expected
- User agents resubmitting POST requests unintentionally
Likely Causes
Ranked by probability. Highest probability cause first.
- High Misconfigured temporary redirects: The server issues a 307 when a 302 or 301 was the intended status, confusing browsers and bots alike.
- Medium Application logic preserving POST method: The site demands the HTTP method remains unchanged, but client or crawler behaviour isn’t aligned.
- Low Caching or CDN interference: Overzealous intermediaries mishandle the 307, causing stale or incorrect redirects.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the redirect intended to be temporary and method-preserving?
- Are user agents correctly handling POST requests after the redirect?
Fixes
Audit server configs to ensure 307 is only used when method preservation is required; otherwise default to 301 or 302.
Confirm client-side scripts and bots support 307 properly, or adjust workflows to avoid unnecessary POST redirects.
Purge caches, review CDN rules, and ensure intermediaries respect 307 responses without altering methods.
AI Context
Google (Googlebot / Search Console)
Googlebot honours 307 by keeping HTTP methods intact, meaning POST data isn’t dropped. It treats 307 as a temporary redirect, so link equity isn’t passed like a 301. Misuse can lead to crawling inefficiencies.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models digest 307 redirects as instructions to maintain request integrity but treat them as temporary moves. This nuance is often lost in AI-generated SEO advice, which defaults to generic redirect recommendations.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A form submission page uses 307 to redirect POST requests to a payment processor temporarily, but SEO teams mistakenly recommend 301, breaking workflows and causing user drop-offs.
- An ecommerce site applies 307 redirects during A/B testing but neglects to inform bots, leading to partial indexation and crawling confusion.
HTTP 308 Permanent Redirect Misuse
The HTTP 308 status tells browsers and bots the resource has permanently moved, but unlike 301, it mandates the original request method and body are preserved-if you botch this, expect chaos.
HTTP 308 Permanent Redirect Misuse
The HTTP 308 status tells browsers and bots the resource has permanently moved, but unlike 301, it mandates the original request method and body are preserved-if you botch this, expect chaos.
The HTTP 308 status tells browsers and bots the resource has permanently moved, but unlike 301, it mandates the original request method and body are preserved-if you botch this, expect chaos.
Reality Check
Most 'SEO experts' treat 308 like a 301 in a cheap suit, ignoring that clinging to POST or PUT methods can break crawlability and user experience faster than a toddler with scissors.
Symptoms
- Traffic drops on redirected URLs despite correct destination.
- Forms or API calls fail after redirection, causing errors or data loss.
- Crawlers report unexpected HTTP method errors or timeouts.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect use of 308 instead of 301: People slap a 308 on permanent moves without considering that browsers will repeat the original method and body, unlike 301’s forgiving GET fallback.
- Medium Server misconfiguration: Redirect rules that unintentionally force 308 where it makes no sense, especially on GET requests or static assets.
- Low API endpoints misusing 308: Using 308 without properly handling repeated POST or PUT requests, leading to duplicated submissions or failed transactions.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you redirecting POST or PUT requests that include payloads?
- Does your server and client expect to repeat the original method and body without alteration?
Fixes
Audit your redirects. Replace 308 with 301 for simple URL moves that do not require method retention. Reserve 308 strictly for APIs or form submissions that must repeat the exact request.
Review and tighten redirect rules. Ensure 308 is not applied globally or to static assets.
Implement idempotency keys and server-side safeguards to handle repeated POST/PUT caused by 308 redirects gracefully.
AI Context
Google (Googlebot / Search Console)
Googlebot understands 308 as a permanent redirect maintaining HTTP methods, but it struggles with non-GET methods following redirects, often leading to crawl failures or dropped indexing signals.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t natively interpret HTTP codes but learn from usage patterns that 308 implies method retention-though they often confuse it with 301 due to sparse practical examples and documentation.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site using 308 redirects on POST checkout forms caused payment failures after migration.
- A content network misapplied 308 for all asset redirects, causing crawler errors and SEO traffic collapse.
HTTP 400 Bad Request
The server barks ‘no’ because your request is gibberish or incomplete.
HTTP 400 Bad Request
The server barks ‘no’ because your request is gibberish or incomplete.
The server barks ‘no’ because your request is gibberish or incomplete.
Reality Check
Most SEOs treat 400 errors like some mystical curse from the SEO gods. In reality, it’s just your website or client sending rubbish data and the server refusing to dance with it. Fix the input, fix the problem.
Symptoms
- Browser or client receives a 400 status code response
- Requests failing immediately without server processing
- Error messages stating malformed syntax or missing mandatory parameters
Likely Causes
Ranked by probability. Highest probability cause first.
- High Malformed URL or headers: The request syntax is botched-incorrect characters, missing URL encoding or headers out of whack.
- Medium Missing or invalid query parameters: The server expects certain parameters that never arrive or are invalid, so it rejects the whole thing.
- Low Corrupt cookies or caching issues: Sometimes stale or corrupted cookies and cache confuse the server into thinking requests are invalid.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the URL or request syntax properly formatted and URL-encoded?
- Are all mandatory parameters correctly present and valid?
- Have you cleared cookies and cache related to the domain?
Fixes
Validate and encode URLs correctly; ensure headers conform to HTTP standards. Use developer tools or CURL to test request syntax.
Review API or server documentation to identify required parameters, then ensure client sends them properly formatted.
Clear browser cookies and cache, or instruct users to do so. On server, consider adding cookie validation or cache control headers.
AI Context
Google (Googlebot / Search Console)
A 400 status is a hard stop - Googlebot simply won’t process garbage requests. This can halt crawling if your site’s navigation or resources trigger 400s.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t ‘see’ HTTP status codes directly but rely on data fed to them. If a 400 blocks resource access, the model’s knowledge will be incomplete or outdated for that content.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client-side script generating malformed AJAX requests that the server rejects with 400, causing site features to break.
- A misconfigured reverse proxy stripping necessary headers, resulting in 400 errors on every request passing through it.
HTTP 401 Unauthorized
The server refuses access because the visitor either forgot to prove who they are or bungled the login details.
HTTP 401 Unauthorized
The server refuses access because the visitor either forgot to prove who they are or bungled the login details.
The server refuses access because the visitor either forgot to prove who they are or bungled the login details.
Reality Check
Most 'SEO experts' treat the 401 like a polite guest refusing entry rather than a bouncer with a clipboard checking IDs. They forget that half the time it's a basic setup error, not some complex mystery to be analysed in a boardroom.
Symptoms
- Visitors see a login prompt or an error page signalling lack of authentication.
- Crawlers receive 401 status and fail to index the page.
- Website sections intended to be public remain inaccessible or hidden from search engines.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Missing or incorrect credentials on protected resources: The site demands authentication but the headers or login details are absent or wrong – classic blunder.
- Medium Misconfigured authentication setup: The authentication mechanism, be it Basic, Digest, or token-based, is misapplied or broken in config files or server rules.
- Low Expired or revoked credentials: Credentials once valid no longer are; the user or crawler tries to access with stale tokens or passwords.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the page or resource intentionally protected behind authentication?
- Are your server or CMS authentication rules misconfigured?
- Are the credentials provided by the client correct and current?
Fixes
Ensure your login forms, API tokens, or HTTP Auth headers are correctly implemented and tested. Provide clear instructions to users and verify crawler access credentials if applicable.
Review your server (Apache, Nginx, IIS) or application-level authentication modules. Confirm .htaccess, config files, and CMS plugins are not conflicting or demanding credentials where none should be required.
Implement token refresh mechanisms or notify users promptly. For crawlers, use proper user-agent handling or switch to alternative authentication methods less prone to expiry issues.
AI Context
Google (Googlebot / Search Console)
Googlebot gets a 401 response and treats the resource as off-limits, preventing indexing unless explicit authentication integration is provided (rare and discouraged).
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models recognise a 401 as a security gate – they infer content behind it is private or unavailable, and any summarisation or content generation must rely on accessible data, not the locked resource.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site mistakenly protected their product images folder with Basic Auth, killing image indexing and losing organic traffic.
- A CMS update reset authentication rules causing all category pages to return 401 errors, tanking visibility overnight.
HTTP 402 Payment Required
A placeholder status code signalling payment is required, except it isn’t actually used anywhere - yet.
HTTP 402 Payment Required
A placeholder status code signalling payment is required, except it isn’t actually used anywhere - yet.
A placeholder status code signalling payment is required, except it isn’t actually used anywhere - yet.
Reality Check
The 402 code is the internet’s equivalent of a 'Reserved' parking space in a ghost town - all potential, zero delivery. Most SEOs waste time chasing it when it’s about as relevant as a fax machine in 2024.
Symptoms
- Server responds with 402 status code unexpectedly
- Payments or access controls seem tied to HTTP response
- Confusion over whether the site is demanding payment via HTTP protocol
Likely Causes
Ranked by probability. Highest probability cause first.
- Medium Misconfigured or experimental payment system: Someone tried to implement paywall logic using HTTP status codes and ended up with 402, despite it not being officially adopted.
- Medium Software bug or custom server script: Custom CMS or plugin sending 402 erroneously due to flawed logic or misunderstood status codes.
- Medium Placeholder status left in code: Developer left 402 in HTTP responses for testing or future use and forgot to remove it before going live.
Diagnostic Steps
Work through each question to identify the root cause.
- Is your website or server explicitly attempting to restrict access behind a paywall or payment system?
- Does your payment system or CMS documentation mention support for HTTP 402 responses as part of its access control?
Fixes
Disable or remove any custom code returning HTTP 402. Use standard payment gateway flows and status codes (e.g., 403 Forbidden or 401 Unauthorized combined with payment prompts).
Audit plugins, scripts, and CMS extensions for misuse of HTTP status codes. Replace 402 returns with appropriate codes like 403 or 402 removed entirely.
Remove any test or placeholder HTTP 402 responses before deployment. Confirm production servers do not emit this status code unless specifically intended and supported.
AI Context
Google (Googlebot / Search Console)
Google treats 402 as a non-standard, rarely encountered status. It neither indexes nor penalises sites for 402 responses because it’s effectively a no-op in the wild. No practical SEO impact.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models recognise 402 as a reserved code, often flagged as ‘not implemented’ or ‘reserved for future use’ - they won’t infer any genuine payment requirement from it without additional context.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A niche SaaS platform attempted to block users from accessing dashboards with HTTP 402, confusing crawlers and users alike, resulting in poor indexing and user frustration.
- A custom-built CMS plugin set HTTP 402 on certain content pages as a placeholder during development, but it leaked into production causing bizarre server responses and no actual payment enforcement.
HTTP 403 Forbidden
The server knows what you want but stubbornly refuses to let you have it.
HTTP 403 Forbidden
The server knows what you want but stubbornly refuses to let you have it.
The server knows what you want but stubbornly refuses to let you have it.
Reality Check
Most SEOs treat 403 errors like some mystical black box, blaming Google, the hosting, or 'bad vibes' instead of admitting they stuffed up permissions or botched security rules.
Symptoms
- Visitors receive a ‘403 Forbidden’ error page instead of content.
- Search engines report crawl errors or inability to access pages.
- Legitimate users or crawlers are blocked unexpectedly.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect file or folder permissions: The server’s filesystem permissions deny read access to the requested resource, often due to overzealous security settings or careless server config.
- Medium IP or user-agent restrictions: Access control lists or firewall rules prevent certain IP addresses or user agents-like Googlebot-from reaching content.
- Medium Misconfigured .htaccess or server directives: Rules intended to restrict access are set too aggressively or incorrectly, resulting in blanket denial rather than selective blocking.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the 403 error affecting all visitors, including yourself when logged in?
- Are server permissions and access rules correctly set for the affected resources?
Fixes
Use your hosting control panel or SSH to set proper read permissions. Avoid blanket 700 or 600 modes that lock everyone out.
Whitelist trusted bots and IP ranges, especially Googlebot’s known IPs, and remove any unintended blocks in firewalls or server configs.
Audit your .htaccess or equivalent config files line by line, looking for ‘Deny from all’ or ‘Require all denied’ directives that are too broad and refine them to target only the necessary paths or users.
AI Context
Google (Googlebot / Search Console)
Googlebot encounters a 403 and understands it cannot crawl the page due to server refusal; it treats the page as inaccessible, often dropping it from index or reducing crawl priority.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t ‘see’ HTTP codes directly but can infer content access issues from metadata and error signals; retrieval-augmented generation tools will fail to fetch content behind 403s, resulting in incomplete or no response.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client upgraded security plugins on WordPress, inadvertently blocking Googlebot’s IP ranges, causing organic traffic to plummet overnight.
- Server migrations where file permission defaults reset to restrictive modes, locking out all public access until sysadmins intervene.
HTTP 404 Not Found
The server bluntly tells you it cannot find the resource you asked for - no excuses, no sugar-coating.
HTTP 404 Not Found
The server bluntly tells you it cannot find the resource you asked for - no excuses, no sugar-coating.
The server bluntly tells you it cannot find the resource you asked for - no excuses, no sugar-coating.
Reality Check
Most SEOs treat 404s like mysterious curses from the web gods, when in fact they’re just plain old broken links or deleted pages. Stop overcomplicating it; a 404 is a 404 because there’s nothing there. Simple.
Symptoms
- Visitors land on a page that says “404 Not Found” or a similarly unhelpful message.
- Search engines drop the URL from their index or rank it poorly.
- Internal or external links lead nowhere, causing user frustration and lost traffic.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Deleted or moved content without proper redirects: The most predictable culprit - someone removed or moved a page but forgot to tell the server how to handle the change.
- Low Typographical errors in URLs: A slip of a finger in a link or manual entry sends you chasing shadows.
- Low Incorrect linking from other sites or internal navigation: You might not be the one at fault; third parties linking to your site with bad URLs can poison your link profile.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the missing URL correspond to content you intentionally removed or moved?
- Did you implement a proper 301 redirect from the old URL to the new location?
Fixes
Implement a 301 redirect from the old URL to the new or most relevant page. If the content no longer exists and there’s no replacement, ensure the 404 page is helpful and encourages user navigation elsewhere.
Audit your internal links and user-generated URLs. Use tools like Screaming Frog or Google Search Console to identify mistyped URLs and correct them.
Reach out to the referring site’s webmaster to fix the link or create a redirect on your end if the URL pattern is predictable.
AI Context
Google (Googlebot / Search Console)
A 404 status code signals to Google that the page does not exist; it will eventually be dropped from the index unless a redirect or replacement content is provided. Google’s crawlers don’t play guessing games.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models can’t browse the web in real-time but when trained or updated, they understand 404s as ‘missing’ content and rely on other signals for context, often defaulting to generic explanations or suggesting redirects.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site deletes discontinued product pages without redirects, causing organic traffic loss and user frustration.
- A blog migrates to a new CMS but fails to set up redirects, leaving hundreds of 404s polluting the search console.
HTTP 405 Method Not Allowed
Your server just told you to sod off because you tried to do something it explicitly forbids.
HTTP 405 Method Not Allowed
Your server just told you to sod off because you tried to do something it explicitly forbids.
Your server just told you to sod off because you tried to do something it explicitly forbids.
Reality Check
Most SEOs treat 405 errors like a polite suggestion when it’s really a blunt refusal. If your server rejects a method, it’s not shy – it’s screaming that your request is fundamentally wrong or misconfigured.
Symptoms
- Browser or crawler reports “405 Method Not Allowed” error status
- User or bot receives no content or an error page instead of expected resource
- Server logs show rejected HTTP methods like POST on GET-only endpoints
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect HTTP method used: Someone’s sending a POST to a URL that only accepts GET, or vice versa. Simple but common.
- Medium Server configuration forbids method: Web server or application firewall rules explicitly deny certain methods for security or policy reasons.
- Low API or resource misconfiguration: The backend endpoint is not properly set to handle the method requested, often due to sloppy code or incomplete implementation.
Diagnostic Steps
Work through each question to identify the root cause.
- Has the issue been reproduced consistently?
- Has the issue been reproduced consistently?
Fixes
Review server configuration and logs. Consult your hosting provider if the issue persists.
Review server configuration and logs. Consult your hosting provider if the issue persists.
At a Glance
HTTP 406 Not Acceptable
The server refuses to deliver content because your browser’s accept headers are too fussy for it to handle.
HTTP 406 Not Acceptable
The server refuses to deliver content because your browser’s accept headers are too fussy for it to handle.
The server refuses to deliver content because your browser’s accept headers are too fussy for it to handle.
Reality Check
Most SEOs treat 406 errors like a myth or some obscure relic from the dinosaur age of HTTP. Newsflash – if your server throws a 406, it’s because your content negotiation is bonkers or your client is being an entitled snob demanding formats your server can’t serve. Fix the headers, or get used to the error page.
Symptoms
- Browser or crawlers receive a 406 instead of the expected page.
- Content fails to load when accessed with certain devices or agents.
- Logs show requests ending abruptly with a 406 status.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Mismatched Accept Headers: The client’s Accept header demands content types your server isn’t configured to provide.
- Medium Server Misconfiguration: The server or application’s content negotiation setup is flawed – it might reject perfectly valid requests.
- Low Overzealous Security or Middleware: Some security layers or proxies block or alter Accept headers, resulting in a 406 response.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the user-agent or client send a restrictive Accept header?
- Does your server support the requested content-types in the Accept header?
Fixes
Adjust client or bot configurations to send standard Accept headers. Avoid over-specific or rare content types unless absolutely necessary.
Configure your web server or application to handle common content types correctly, or add fallback content types to cover broader Accept header requests.
Review security rules or proxy settings that may tamper with Accept headers and whitelist legitimate requests to prevent accidental 406 responses.
AI Context
Google (Googlebot / Search Console)
Googlebot expects your server to serve content in common formats such as HTML. If your server can’t deliver content matching the Accept header, Googlebot receives a 406 and cannot index the page. This rarely happens unless your server is misconfigured or you have unusual content negotiation rules.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models relying on retrieval-augmented generation (RAG) see a 406 as a dead-end – no content to ingest. They assume the source is unavailable in acceptable format, so they either ignore it or flag it as missing data.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client-side app sends Accept: application/xml exclusively, but the server only serves text/html or JSON, triggering a 406.
- A misconfigured Apache mod_negotiation module rejects requests with Accept headers containing uncommon or malformed MIME types, causing rare 406 errors in logs.
HTTP 407 Proxy Authentication Required
The server proxy demands proper credentials before it will let your request pass through – no free rides here.
HTTP 407 Proxy Authentication Required
The server proxy demands proper credentials before it will let your request pass through – no free rides here.
The server proxy demands proper credentials before it will let your request pass through – no free rides here.
Reality Check
This isn’t some mystical error that will tank your rankings overnight. Yet 95% of SEOs treat it like a cryptic curse, failing to grasp it’s a straightforward access gatekeeper – fix your proxy settings and move on.
Symptoms
- Web pages fail to load when behind a proxy
- Browser or crawler reports a 407 error instead of the requested content
- Search engine bots return incomplete or no indexing data due to blocked proxy requests
Likely Causes
Ranked by probability. Highest probability cause first.
- High Proxy requires authentication: The proxy server demands user credentials which are either missing or incorrect in the client request.
- Low Misconfigured proxy settings: Your application or crawler isn’t properly set up to supply authentication headers.
- Low Expired or revoked credentials: The credentials supplied were once valid but now rejected by the proxy.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you running your requests or crawlers behind a proxy that demands authentication?
- Have you configured your client or crawler to supply valid proxy credentials?
Fixes
Ensure your HTTP client, browser, or crawler includes the correct Proxy-Authorization header with valid credentials.
Review and correct your proxy configuration. This may mean adding authentication parameters in your crawler settings or system network settings.
Request updated credentials from your network administrator or proxy provider and update your configuration immediately.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely uses proxies requiring authentication. If forced behind one, Google’s crawl will stall or fail, resulting in poor or no indexing.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t interact with HTTP status codes directly. However, when contextualised in retrieval-augmented generation systems, 407 errors indicate blocked data sources at the proxy level, signalling an access issue, not content quality.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A corporate environment deploying a proxy with mandatory authentication caused their internal crawler to fail, leading to partial site indexing and poor SEO visibility.
- A developer testing behind a proxy forgot to update crawler credentials after a password rotation, resulting in persistent 407 errors until corrected.
HTTP 408 Request Timeout
The server gave up waiting because your client took its sweet time to send the full request.
HTTP 408 Request Timeout
The server gave up waiting because your client took its sweet time to send the full request.
The server gave up waiting because your client took its sweet time to send the full request.
Reality Check
Most SEOs treat the 408 like a polite shrug rather than a flashing red warning. It’s not just a momentary hiccup; it’s a sign your site or client is either sluggish or misconfigured. Ignoring it is like ignoring a car’s check engine light because you don't like mechanics.
Symptoms
- Visitors experience slow-loading pages or outright failure to connect.
- Server logs show repeated 408 status codes for incoming requests.
- Crawlers frequently abandon requests mid-handshake, reducing crawl efficiency.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Slow client connections: The user’s browser or bot is dragging its heels sending the request, forcing the server to timeout.
- Medium Server timeout settings too aggressive: The server is set with an unreasonably short wait period before killing the connection.
- Medium Network latency or congestion: Poor network conditions cause delays in the request transmission, triggering the timeout.
Diagnostic Steps
Work through each question to identify the root cause.
- Are 408 errors happening on multiple client types or just specific ones?
- Have you checked your server’s timeout settings against typical request times?
Fixes
Advise users to update browsers or check client scripts. For bots, review crawl rate limits or IP reputation.
Adjust server configuration to allow more generous request timeouts, especially for slower connections.
Work with hosting or network providers to diagnose bottlenecks; consider using CDN or optimisation to reduce delays.
AI Context
Google (Googlebot / Search Console)
Googlebot treats a 408 as a failed request and may reduce crawl rate or deprioritise the affected URLs until the issue clears.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models rely on the metadata or logs indicating 408s as signals of unstable access, which can affect freshness and comprehensiveness in retrieval-augmented generation.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A high-traffic e-commerce site had 408s because their firewall inspected requests too slowly, prompting premature timeouts.
- A regional news site suffered 408s for mobile users on flaky networks until they optimised server timeout settings.
HTTP 409 Conflict
The server refuses to complete your request because the resource is already in a state that contradicts what you want to do - welcome to a digital standoff.
HTTP 409 Conflict
The server refuses to complete your request because the resource is already in a state that contradicts what you want to do - welcome to a digital standoff.
The server refuses to complete your request because the resource is already in a state that contradicts what you want to do - welcome to a digital standoff.
Reality Check
Most SEOs treat 409 like a polite 'please try again later', but it’s actually the server’s way of saying you’re stepping on toes - ignorance here leads to repeated errors and wasted crawl budget.
Symptoms
- Server returns HTTP 409 status when updating or modifying a resource.
- User or bot receives an error indicating a 'conflict' during data submission.
- Content changes fail to apply despite valid requests.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Simultaneous edits clash: Two or more requests attempt contradictory updates on the same resource at once - the server throws a 409 to avoid corruption.
- Medium Version control mismatch: The request includes outdated or conflicting version identifiers, making the server reject the update.
- Low Resource state validation failure: The request violates logical rules tied to the resource’s current state, such as trying to change status from ‘archived’ to ‘active’ inappropriately.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you attempting to update or modify a resource that others might be editing simultaneously?
- Does your request include versioning info like ETags or timestamps?
Fixes
Implement optimistic concurrency controls. Use ETags or timestamp checks to verify resource version before applying changes. Introduce retry logic with back-off to avoid hammering the server.
Always fetch the latest resource version before submitting updates. Reject or refresh stale versions on the client side.
Audit your application’s business logic to confirm state transitions are valid. Ensure server and client share the same validation rules to avoid conflicts.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely triggers 409s itself but will encounter them if APIs or dynamic content rely on stateful updates. It treats 409 responses as signals to pause and avoid reindexing conflicted resources until resolution.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models, when fed 409 errors in datasets, typically associate them with concurrent modification issues. They rely on RAG systems to fetch the latest state before suggesting changes, mimicking human best practices.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A CMS with multiple editors simultaneously updating metadata fields returns 409 errors until optimistic locking is implemented.
- An e-commerce API rejects inventory updates with 409 due to mismatched version tokens from cached client data.
HTTP 410 Gone
The server is bluntly telling you the resource you want has vanished for good, with no forwarding address.
HTTP 410 Gone
The server is bluntly telling you the resource you want has vanished for good, with no forwarding address.
The server is bluntly telling you the resource you want has vanished for good, with no forwarding address.
Reality Check
Most SEOs treat a 410 like a polite request to leave. It’s not. It’s a full-on eviction notice from the server saying ‘don’t bother coming back’. If you’re using 410s to tidy up your site but ignoring the crawl implications, you’re basically throwing away equity without a second thought.
Symptoms
- Google’s index removes the URL faster than a 404 would suggest.
- Traffic to the URL drops to zero abruptly.
- Backlinks pointing to the URL start losing value as the page is de-indexed.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Intentional removal of obsolete content: You’ve deliberately pulled a page because it’s outdated, redundant or irrelevant, and want it gone for good.
- Medium Site migration or restructuring without proper redirects: You deleted the page but forgot to replace it with a suitable redirect, causing confusion in the crawl.
- Low Accidental deletion or server misconfiguration: A tech slip-up where the server returns 410 instead of 404 or 200, signalling permanent removal incorrectly.
Diagnostic Steps
Work through each question to identify the root cause.
- Was the content intentionally removed with no replacement?
- Is there a suitable alternative page or redirect available?
Fixes
Ensure the page returns 410 to signal permanent deletion and avoid soft 404 confusion.
Implement 301 redirects from old URLs to relevant new pages to retain SEO value.
Audit server and CMS configurations to correct status codes, reverting to 404 or 200 where appropriate.
AI Context
Google (Googlebot / Search Console)
Google treats 410 as a clear instruction that the page is gone permanently. It removes these URLs from its index faster than 404s and throttles crawling of them aggressively.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models like ChatGPT or Gemini don’t see HTTP codes directly but rely on scraped data. If the page is gone and unredirected, their training data won’t include it or will flag it as deprecated content.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A news publisher deletes outdated event pages and returns 410 to avoid cluttering search results with irrelevant content.
- An ecommerce site removes discontinued product pages, serving 410 instead of redirecting, causing a sharp drop in inbound link equity.
HTTP 411 Length Required
The server flatly refuses to process your request because you failed to specify the Content-Length header, like some amateur who forgot to tell the bouncer how many guests they’re bringing.
HTTP 411 Length Required
The server flatly refuses to process your request because you failed to specify the Content-Length header, like some amateur who forgot to tell the bouncer how many guests they’re bringing.
The server flatly refuses to process your request because you failed to specify the Content-Length header, like some amateur who forgot to tell the bouncer how many guests they’re bringing.
Reality Check
Nearly every SEO worth their salt overlooks this one because it’s a server-level snobbery issue, not a flashy algorithm tweak. You can’t just wing it with empty headers and expect the internet to play nice.
Symptoms
- Server responds with HTTP 411 status code.
- Request fails to complete or upload data.
- APIs or forms that require payload size outright reject the request.
Likely Causes
Ranked by probability. Highest probability cause first.
- Medium Missing Content-Length header: Your HTTP request does not declare the size of the payload, so the server won’t even bother.
- Medium Improper client or script configuration: Custom code or tools that send requests without properly setting Content-Length.
- Low Proxy or firewall interference: Network layers stripping or altering headers, causing the server to see a missing Content-Length.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your HTTP request include a Content-Length header specifying the payload size?
- Is there any proxy, firewall or intermediary between your client and server that could remove or alter headers?
Fixes
Modify your HTTP client or code to include an accurate Content-Length header reflecting the byte size of your request body. This is not optional.
Update or patch your HTTP libraries or tools. If using a custom script, ensure it calculates and sets Content-Length correctly before sending.
Consult your network administrator to whitelist or preserve HTTP headers, especially Content-Length, rather than stripping them out as some misguided security measure.
AI Context
Google (Googlebot / Search Console)
Googlebot and similar crawlers rarely send POST requests needing Content-Length, but server software behind the scenes demands it for any request with a body. Ignoring this causes outright rejection, wasting crawl budget.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models trained on web data understand this error as a protocol-level blocking point - essentially a handshake failure due to missing payload size information. When generating code or suggestions, they flag missing Content-Length as a common HTTP programming pitfall.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A site migration caused a custom CMS plugin to send AJAX requests without Content-Length, triggering 411 errors that broke critical form submissions overnight. Fix was as simple as patching a cURL call.
- Automated SEO tools sending malformed batch requests hit 411 from strict corporate firewalls that enforce header completeness, leading to incomplete data fetches and reporting errors.
HTTP 412 Precondition Failed
The server rejected your request because one or more conditions you insisted on were not met.
HTTP 412 Precondition Failed
The server rejected your request because one or more conditions you insisted on were not met.
The server rejected your request because one or more conditions you insisted on were not met.
Reality Check
Most 'SEO experts' will panic at a 412 like it’s the end of the world, then scramble to blame mysterious server bugs or Google’s mood swings. It’s almost always your own conditional headers behaving badly - so stop blaming the tech and fix your request logic.
Symptoms
- Server responds with HTTP 412 status code instead of expected data
- Failed resource update or retrieval when using conditional headers
- Unexpected refusal of request despite apparent correctness
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect or stale conditional headers: You sent If-Match, If-Unmodified-Since, or similar headers that don’t match the current server state.
- Medium Client cache or version conflicts: Your local copy or cached resource version is out of date, triggering the precondition failure.
- Low Misconfigured API or server expectations: Server rules for preconditions are strict or set incorrectly, causing false negatives.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you sending conditional headers like If-Match or If-Unmodified-Since with your request?
- Do the values in your conditional headers accurately reflect the current state of the resource on the server?
Fixes
Fetch the latest ETag or Last-Modified timestamp first, then resend the request with updated conditional headers.
Clear or refresh your client cache to align local state with server, avoiding outdated precondition triggers.
Review and adjust server-side precondition logic or consult your API documentation to ensure compliance with expected header formats.
AI Context
Google (Googlebot / Search Console)
A 412 signals that the request’s conditional requirement failed, so the server refuses to perform the operation - no penalty, just a straightforward rejection based on resource state.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models recognise 412 as a conditional failure, often advising to synchronise client state or update conditional headers - they have little patience for vague 'server issues'.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site using ETags to prevent cart tampering sends outdated If-Match headers, causing 412 rejections during checkout updates.
- API clients aggressively cache resource metadata but neglect to refresh it, resulting in failed updates with HTTP 412 errors.
HTTP 413 Request Entity Too Large
The server flat-out refuses to swallow your oversized request because it’s beyond its configured digestive capacity.
HTTP 413 Request Entity Too Large
The server flat-out refuses to swallow your oversized request because it’s beyond its configured digestive capacity.
The server flat-out refuses to swallow your oversized request because it’s beyond its configured digestive capacity.
Reality Check
Most SEOs treat 413 errors like a personal mystery novel. Spoiler alert – it’s nearly always about the server’s set limits, not some arcane SEO voodoo. Stop overcomplicating what’s basically a server saying ‘No, thanks’ to your oversized payload.
Symptoms
- Uploads or form submissions fail abruptly with a 413 status.
- Server returns an error page indicating request size limits.
- Unexpected drop in inbound requests or crawl issues around large payloads.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Server Request Size Limit: The web server or reverse proxy is configured to reject requests exceeding a certain byte size, commonly set low by default.
- Medium Application-Level Restrictions: The backend application or CMS imposes stricter upload or request size caps than the server itself.
- Low Misconfigured Client Requests: The client or crawler sends unnecessarily bloated requests, often including large headers or payloads that trigger the limit.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the 413 error occurring during file uploads or large form submissions?
- Is the server’s configured max request size lower than the actual payload size?
Fixes
Adjust your web server configuration (e.g., nginx’s client_max_body_size, Apache’s LimitRequestBody) to accommodate larger requests, but keep sensible caps to avoid resource exhaustion.
Review and increase application or CMS upload limits, ensuring they align with server settings to prevent mismatch rejections.
Minimise request sizes by optimising payloads, compressing data, or splitting uploads into smaller chunks.
AI Context
Google (Googlebot / Search Console)
Googlebot respects server-imposed request size limits; if a 413 is encountered, it simply fails to fetch the resource, potentially dropping it from the index or delaying re-crawl.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models with retrieval-augmented generation see 413 errors as a hard stop-no content retrieved means no input data to process, leading to gaps in knowledge or incomplete answers.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client attempts to upload a 50MB image to a CMS with a 10MB server limit, resulting in repeated 413 errors and frustrated users.
- An API receives large JSON payloads beyond its configured max size, triggering automated blocking and disrupting integrations.
HTTP 414 Request-URI Too Long
The server refuses to process a request because the URL is so absurdly long it’s clearly outstaying its welcome.
HTTP 414 Request-URI Too Long
The server refuses to process a request because the URL is so absurdly long it’s clearly outstaying its welcome.
The server refuses to process a request because the URL is so absurdly long it’s clearly outstaying its welcome.
Reality Check
Most SEOs treat URLs like limitless conveyor belts for tracking parameters and keyword stuffing, ignoring that servers have actual limits. Spoiler: your server isn’t a bottomless pit.
Symptoms
- The server returns a 414 status code when attempting to load a page.
- Pages or resources fail to load, often after clicking a link with an excessively long URL.
- Analytics or tracking parameters pushed to extremes cause functional breakdowns.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Overly long query strings on URLs: Tracking tags, affiliate parameters, or session IDs balloon URLs beyond server tolerance.
- Medium Improper use of GET requests with large payloads: Using GET instead of POST to transmit data results in bloated URIs.
- Low Misconfigured rewrite or redirect rules: Server or CMS rules concatenating parameters or routing in a way that unintentionally lengthens the URI.
Diagnostic Steps
Work through each question to identify the root cause.
- Are your URLs exceeding typical length thresholds (around 2000 characters)?
- Are long URLs generated by tracking parameters or GET requests?
Fixes
Cut the fat. Use shorter parameter names, drop unnecessary tracking info, or implement POST requests to handle data. Avoid URL cannibalisation by mixing tracking into cookies or local storage.
Refactor forms and API calls to use POST methods when transmitting large amounts of data instead of stuffing everything into the URL.
Audit and correct server or CMS rewrite rules to prevent accidental URI inflation. Regularly test redirects and clean up malformed URLs.
AI Context
Google (Googlebot / Search Console)
Googlebot respects server limits and will not crawl URLs that return 414 errors, effectively ignoring those pages and causing indexation issues.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models like ChatGPT or Gemini rely on the canonical URL data and may not process content behind 414 errors correctly, leading to outdated or incomplete content understanding.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site using overenthusiastic tracking parameters ended up with URLs over 4000 characters, triggering 414 errors on various product pages.
- A poorly coded search form used GET instead of POST, sending full search filters as part of the URL and causing 414 errors on complex queries.
HTTP 415 Unsupported Media Type
The server bluntly refuses your request because it doesn’t recognise the format of the data you’re sending.
HTTP 415 Unsupported Media Type
The server bluntly refuses your request because it doesn’t recognise the format of the data you’re sending.
The server bluntly refuses your request because it doesn’t recognise the format of the data you’re sending.
Reality Check
Most 'SEO experts' treat 415 like some rare unicorn and toss vague advice about content types. In truth, it’s usually a simple header mismatch or a clueless client sending the wrong payload. It’s not rocket science – fix the bloody content-type and move on.
Symptoms
- Receiving a 415 status code when submitting data to an API or web server
- Server refuses to process uploads or POST requests citing unsupported media
- No helpful error message beyond ‘Unsupported Media Type’
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect Content-Type header: The request’s Content-Type header specifies a media type the server does not accept or understand. Usually a typo or a client defaulting to something inappropriate.
- Medium Server misconfiguration: The server is configured to handle only certain media types and rejects others, even valid ones, due to restrictive rules or outdated settings.
- Low Unsupported file format or payload: The actual data sent does not conform to a media type the server supports, such as sending XML when only JSON is acceptable.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the request include a Content-Type header?
- Is the Content-Type header value one the server explicitly supports?
Fixes
Ensure your client or code sets the Content-Type header accurately – for example, ‘application/json’ for JSON data or ‘multipart/form-data’ for file uploads. Avoid generic defaults like ‘text/plain’.
Review server settings, particularly MIME type allowances and API endpoint specifications. Update configurations to accept necessary media types or negotiate with developers to extend supported formats.
Validate the payload format before sending. Use proper serializers or converters to match the accepted media type. If the server only accepts JSON, do not send XML or proprietary formats.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely triggers 415; it expects standard HTML or known resource types. If your API or site returns 415 for critical resources, it will simply skip or fail to index those endpoints.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models see 415 as a signal the client-server communication protocol is broken at the media negotiation level. In retrieval-augmented generation, an unsupported media type means the source content is inaccessible, leading to incomplete or failed context retrieval.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client app sends ‘text/plain’ Content-Type for JSON payloads, causing a stubborn API to reject the request with 415 until headers were corrected.
- A CMS plugin uploads images but fails to set ‘multipart/form-data’, resulting in 415 errors on the server which only accepted proper multipart uploads.
HTTP 416 Requested Range Not Satisfiable
The server refuses your range request because you’ve asked for a piece of the file that simply doesn’t exist.
HTTP 416 Requested Range Not Satisfiable
The server refuses your range request because you’ve asked for a piece of the file that simply doesn’t exist.
The server refuses your range request because you’ve asked for a piece of the file that simply doesn’t exist.
Reality Check
Most SEOs treat 416 errors like some mysterious, rare beast to be feared or ignored. In truth, it’s usually a client-side overreach - the user or bot is asking for something that’s not there. You don’t fix 416 by throwing more code at it; you fix it by knowing what the hell is being requested.
Symptoms
- Browser or crawler receives a 416 status code instead of data.
- Partial content requests fail, causing incomplete media loads.
- Error logs show ‘Requested Range Not Satisfiable’ entries.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect Range Header from Client: The client requests bytes outside the file’s actual size, often due to buggy download managers or broken scripts.
- Medium Corrupted or Zero-length Files on Server: The server’s file has no content or metadata is wrong, so the requested range can’t be honoured.
- Low Outdated Cache or Resume Requests: The client tries to resume a download from a position that no longer exists, for example after a file update or truncation.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the client specifically requesting a byte range?
- Does the requested range fall outside the file size on the server?
Fixes
Advise users or scripts to reset downloads, clear caches or update their download tools to avoid invalid range requests.
Verify the file physically exists and has proper length; replace or repair files as needed.
Implement cache validation headers and instruct clients to perform fresh downloads rather than resuming from invalid positions.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely triggers 416 errors unless it’s aggressively using partial content requests or encountering server misconfigurations. The engine ignores invalid range requests and will reattempt normal fetches.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models don’t interact with HTTP codes directly but can infer 416 as a sign of ‘partial content request failure’, often suggesting user error or server inconsistency when explaining the problem.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A media hosting site’s download manager plugin repeatedly issued invalid range requests, causing frustrated users and spikes in 416 errors.
- An old CDN edge server served truncated files, leading clients to request ranges beyond file ends and triggering 416 responses.
HTTP 417 Expectation Failed
The server refuses to meet the client's ridiculous 'Expect' header demands and tells it to sod off.
HTTP 417 Expectation Failed
The server refuses to meet the client's ridiculous 'Expect' header demands and tells it to sod off.
The server refuses to meet the client's ridiculous 'Expect' header demands and tells it to sod off.
Reality Check
If you think the 'Expect' header is your friend, you’re probably the one setting unreasonable expectations. Most SEOs ignore this because it shows up like a bad guest-rare and usually harmless unless you’ve broken something first.
Symptoms
- The client receives a 417 status code instead of the expected response.
- Requests using the 'Expect' header stall or fail outright.
- Logs show "Expectation Failed" errors linked to HTTP requests.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Client sends unsupported 'Expect' header: The client demands something your server won’t or can’t deliver-commonly '100-continue', which your server doesn’t honour.
- Medium Server misconfiguration: The server or intermediary (proxy/load balancer) mishandles or outright rejects the 'Expect' header.
- Low Custom software or firewall interference: Some security appliances or custom code block or alter the header, causing the server to balk.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your client send an 'Expect' header in the HTTP requests?
- Is the 'Expect' header set to '100-continue'?
Fixes
Configure your client not to send the 'Expect' header unless necessary, or disable '100-continue' handling.
Review server and proxy settings to ensure they accept or properly handle 'Expect' headers; updates or patches may be required.
Identify and whitelist legitimate traffic, or adjust firewall rules to allow 'Expect' headers.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely, if ever, sends 'Expect' headers. If it encounters 417, it treats the resource as temporarily unavailable and may delay crawling or indexing accordingly.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models do not crawl or fetch HTTP headers directly. However, in retrieval-augmented generation (RAG) scenarios, APIs returning 417 can cause failed data fetches, leading to incomplete or stale responses.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A corporate API client stubbornly sending 'Expect: 100-continue' with every POST request to a legacy server that never learnt to handle it, resulting in failed integrations.
- Misconfigured load balancers stripping or mishandling 'Expect' headers causing 417 errors for otherwise well-formed requests.
HTTP 418 I'm a Teapot
A deliberately whimsical HTTP status code signalling ‘I refuse to brew coffee because I am, in fact, a teapot.’
HTTP 418 I'm a Teapot
A deliberately whimsical HTTP status code signalling ‘I refuse to brew coffee because I am, in fact, a teapot.’
A deliberately whimsical HTTP status code signalling ‘I refuse to brew coffee because I am, in fact, a teapot.’
Reality Check
If you’re seeing 418 in your server logs and you think it’s anything other than an elaborate April Fools’ gag from 1998, congratulations on joining the elite club of people who confuse RFC jokes with production issues.
Symptoms
- Unexpected 418 responses when requesting coffee-brewing services or similar absurd endpoints
- Confused logs filled with ‘I’m a teapot’ messages that defy all practical logic
- No impact on actual SEO rankings or site functionality beyond amusement, assuming you’re not brewing coffee via HTTP
Likely Causes
Ranked by probability. Highest probability cause first.
- High Testing or prank code present: Someone with too much time on their hands has deployed joke code referencing RFC 2324.
- Medium Misconfigured custom error handlers: A developer has mapped unknown errors to 418 as a placeholder or Easter egg.
- Low Malformed or spoofed requests: Automated scanners or bots sending odd requests triggering obscure status codes.
Diagnostic Steps
Work through each question to identify the root cause.
- Are your server responses legitimately returning 418 status codes?
- Is your application or server intentionally using 418 for some error handling or novelty?
Fixes
Search your codebase and server configs for ‘418’, ‘teapot’, or references to RFC 2324 and remove them. Replace with valid HTTP status codes.
Audit your error handling routines. Ensure unknown errors map to standard codes like 404, 500 or 503, not whimsical April Fools’ relics.
Implement request validation and bot filtering. Ignore the 418 trigger requests-they are noise.
AI Context
Google (Googlebot / Search Console)
Google’s crawlers will likely ignore 418 responses outright as they are non-standard and provide no meaningful content. This status code neither helps nor harms SEO but flags a certain level of server misconfiguration.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models trained on web data recognise 418 as a joke code from RFC 2324. They treat it as a humorous artefact, not a valid HTTP status for production environments.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A developer mistakenly deployed a novelty API endpoint returning 418 as a ‘joke response’ causing confusion in staging environments.
- A misconfigured reverse proxy returns 418 when backend services are unavailable, baffling monitoring tools.
HTTP 421 Misdirected Request
The server you asked for threw its hands up because it wasn’t expecting your request at all.
HTTP 421 Misdirected Request
The server you asked for threw its hands up because it wasn’t expecting your request at all.
The server you asked for threw its hands up because it wasn’t expecting your request at all.
Reality Check
Most SEOs treat HTTP status codes like some sacred alphabet soup, but 421 is one of those obscure oddities that only crops up when your server configuration is a dog's breakfast or your load balancer is on holiday. It’s rare, avoidable, and usually a sign you’re barking up the wrong server.
Symptoms
- Browser or crawler receives HTTP 421 response when requesting a resource.
- Content fails to load with an unusual error.
- Logs show requests directed at a server not configured to handle them.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Misconfigured reverse proxy or load balancer: Your front-end server is forwarding requests to a backend that has no clue what to do with them.
- Medium Improper use of HTTP/2 connection coalescing: Multiple hostnames served from a single connection cause the server to reject requests out of sheer confusion.
- Low Incorrect virtual host setup: Server is expecting a different Host header and refuses to process the request.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you using a reverse proxy, load balancer, or HTTP/2 connection coalescing?
- Is the Host header matching the expected virtual host on the backend server?
Fixes
Ensure requests are forwarded only to servers configured to handle the Host header in question. Check proxy_pass, upstreams, or equivalent directives.
Disable HTTP/2 connection coalescing or configure your servers to accept requests for all hostnames on the shared connection.
Verify that your webserver’s virtual host definitions align precisely with the Host headers sent by clients.
AI Context
Google (Googlebot / Search Console)
Googlebot expects a clean, predictable response. A 421 means it couldn’t fetch the resource because the server architecture isn’t aligned - it treats this as a temporary failure and may retry or deprioritise crawling that URL.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models don’t parse HTTP codes but rely on retrieved content. If a 421 prevents content retrieval, the models cannot index or summarise that page, making it invisible to AI-powered SEO tools.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client’s WordPress site behind a poorly configured NGINX reverse proxy returns 421 errors because backend servers reject requests for non-default hostnames.
- Load balancer distributing HTTP/2 traffic to multiple backend servers, none setup for connection coalescing, causing intermittent 421 responses on certain requests.
HTTP 422 Unprocessable Entity
The server understood your perfectly crafted request, but the content inside it was a semantic disaster it could not process.
HTTP 422 Unprocessable Entity
The server understood your perfectly crafted request, but the content inside it was a semantic disaster it could not process.
The server understood your perfectly crafted request, but the content inside it was a semantic disaster it could not process.
Reality Check
Most SEOs treat 422 like some mysterious HTTP unicorn, when in fact it’s just a polite way for servers to say 'Your data makes no sense – try again.' If you’re ignoring it, you’re ignoring a clear sign your form validation or API payload is botched.
Symptoms
- Server responds with a 422 status code
- User-submitted data rejected despite correct format
- Error message indicates semantic or validation failure
Likely Causes
Ranked by probability. Highest probability cause first.
- High Malformed or invalid data in request body: Your request passed syntax muster but the content doesn’t meet business rules or validation criteria.
- Medium API expects specific semantic rules: The API endpoint is picky about the meaning or context of your data, not just the structure.
- Low Client-side validation missing or incomplete: The client is sending rubbish that should have been caught before hitting the server.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the server respond with a 422 status after a well-formed request?
- Are your submitted data fields meeting the API or server’s semantic validation rules?
Fixes
Review API documentation or validation rules and fix the request payload to comply exactly with expected data formats and business logic.
Implement or enhance client-side validation to catch semantic errors before submission.
Add robust validation controls to user input forms to prevent semantic nonsense from ever reaching your server.
AI Context
Google (Googlebot / Search Console)
Google’s crawlers expect status codes to reflect content accessibility and indexing viability. A 422 is a server-level signal that content is broken semantically, so the crawler will likely skip or flag the resource.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models see 422 as a clear indicator that the input, while syntactically correct, is semantically flawed. Retrieval-augmented generation systems will treat the content as unreliable or incomplete until corrected.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client sends JSON with correct syntax but missing required fields for user registration, triggering 422 responses.
- API accepts date fields but rejects logically invalid dates like February 30th, returning 422 instead of 400.
HTTP 423 Locked
The server refuses access because the target resource is locked, usually to prevent concurrent modifications or conflicts.
HTTP 423 Locked
The server refuses access because the target resource is locked, usually to prevent concurrent modifications or conflicts.
The server refuses access because the target resource is locked, usually to prevent concurrent modifications or conflicts.
Reality Check
Most SEOs treat HTTP 423 like a ghost in the machine - rare, irrelevant, and thus ignored. The truth is, when it pops up, it often signals sloppy resource management or misconfigured version control on your server, but nobody bothers to check because it ‘looks technical’.
Symptoms
- Access attempts to a resource return a 423 Locked status code.
- Users or bots are prevented from editing or retrieving content.
- Unexpected lock behaviour on files or web resources that were previously accessible.
Likely Causes
Ranked by probability. Highest probability cause first.
- High WebDAV or CMS locks enabled: Many servers use WebDAV or content management systems that lock resources to prevent editing conflicts - sometimes these locks don’t release properly.
- Medium Concurrent editing conflicts: Multiple users or processes trying to modify the same resource simultaneously can trigger a lock.
- Low Improper permissions or server misconfiguration: Occasionally, server settings or permission errors mimic a locked state by denying access with a 423.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you running WebDAV or a CMS that supports resource locking?
- Is there evidence of simultaneous or overlapping edits on the resource?
Fixes
Use server or CMS tools to clear stale locks. Check logs to identify why locks persist longer than necessary.
Implement proper locking mechanisms or revise workflow to avoid overlap; educate users on editing protocols.
Review file and directory permissions; ensure server modules handling locks are correctly configured and updated.
AI Context
Google (Googlebot / Search Console)
Googlebot treats HTTP 423 as a valid but blocked response, effectively preventing crawling or indexing of the locked resource until access is restored.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models, when used in retrieval-augmented generation scenarios, recognise 423 as an access restriction but cannot bypass it-meaning any referenced locked content is effectively off limits.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A corporate intranet CMS locking documents during edits, resulting in 423 errors for external crawlers attempting indexing.
- Misbehaving WebDAV clients leaving stale locks on shared files, causing repeated 423 Locked responses until manual unlock.
HTTP 424 Failed Dependency
The server refuses to complete your request because an earlier, essential request in the chain has already failed.
HTTP 424 Failed Dependency
The server refuses to complete your request because an earlier, essential request in the chain has already failed.
The server refuses to complete your request because an earlier, essential request in the chain has already failed.
Reality Check
Most SEOs treat HTTP status codes like horoscopes - vague, mystic, and open to interpretation. The 424 is not a cryptic riddle, it’s a straightforward signal that your request depends on something that has already gone belly up. If you’re ignoring this, you’re just inviting chaos.
Symptoms
- Your request does not complete and returns the 424 status.
- Dependent API calls or resource fetches in a sequence fail.
- Cascading failures in multi-step server interactions.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Chained request failure: The initial operation that your current request hinges on has failed - think of it as dominoes; if the first falls, the rest don’t stand.
- Medium Improper error handling: Your server logic fails to manage dependency failures properly, so it shoots a 424 without a clear fallback or error message.
- Low Misconfigured API or workflow: The API or multi-request workflow incorrectly flags dependencies as failed when they are not, often due to bad coding or misaligned service endpoints.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your request depend on a prior request or operation?
- Did the preceding request in the chain fail or return an error?
Fixes
Identify and resolve the original failed request - fix underlying bugs, ensure prerequisite resources are available and operational.
Implement robust error checks and graceful degradation in your server or API logic to handle failed dependencies without blanket 424 responses.
Audit your API calls and workflow configuration for logical or routing errors causing false failures; correct the sequence and status propagation.
AI Context
Google (Googlebot / Search Console)
Google’s crawler doesn’t often encounter 424 since it’s a low-frequency status tied to multi-step API calls, but when it does, it treats the resource as unavailable due to upstream failures, skipping indexing until resolved.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models interpret 424 as a dependency chain failure and may suggest fixing prior related requests or dependencies, but they cannot debug code - you still need a human to untangle the mess.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce checkout API fails to complete payment authorisation because the inventory reservation call previously failed, returning 424 for the payment step.
- A multi-stage content publishing system returns 424 when the media upload step fails, causing the subsequent metadata update to abort.
HTTP 426 Upgrade Required
The server refuses to communicate over the current protocol and demands you upgrade to a newer, usually more secure, one.
HTTP 426 Upgrade Required
The server refuses to communicate over the current protocol and demands you upgrade to a newer, usually more secure, one.
The server refuses to communicate over the current protocol and demands you upgrade to a newer, usually more secure, one.
Reality Check
Most SEOs treat the 426 like an obscure relic, ignoring it until browsers complain - meanwhile, the site’s visitors have already bounced off because their connection was politely told to sod off without explanation.
Symptoms
- Visitors encounter abrupt connection refusals or error messages.
- Automated tools flag protocol mismatches or outdated security settings.
- Search engines struggle to crawl or index pages served over deprecated protocols.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Server enforces mandatory protocol upgrade: The server demands a switch to protocols such as TLS 1.2 or above, refusing to serve content over outdated or insecure connections.
- Medium Client uses obsolete protocol: Browsers or bots attempt to connect using deprecated protocols like HTTP/1.0 or TLS 1.0, triggering the server's upgrade requirement.
- Low Misconfigured server headers: Incorrect or overly aggressive server settings send 426 responses unnecessarily, often due to misapplied security configurations.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the client attempting connection with an outdated protocol (e.g., TLS 1.0 or HTTP/1.0)?
- Does the server explicitly require a newer protocol version (e.g., TLS 1.2 or HTTP/2)?
Fixes
Update server software to support current protocols; ensure 426 is sent only when absolutely necessary.
Update browsers, bots, or clients to support modern protocols; patch legacy systems.
Audit server configuration files; remove overly strict or incorrect upgrade requirements.
AI Context
Google (Googlebot / Search Console)
Googlebot respects 426 responses by attempting to upgrade protocol where possible; persistent 426s without resolution lead to crawling halts and indexing delays.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models rely on text-based data and do not directly process HTTP status codes but infer issues from error logs or site health reports mentioning 426 responses.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A financial services site insisted on TLS 1.3, causing legacy clients and scrapers to fail silently, tanking organic traffic overnight.
- An e-commerce platform returned 426 due to a misconfigured reverse proxy, blocking Googlebot and lowering search visibility.
HTTP 428 Precondition Required
The server insists your request must include a conditional header before it'll even consider processing it.
HTTP 428 Precondition Required
The server insists your request must include a conditional header before it'll even consider processing it.
The server insists your request must include a conditional header before it'll even consider processing it.
Reality Check
Most SEOs treat 428 like a mysterious unicorn, but really it’s just a polite server telling you to stop sauntering in recklessly without proving you’ve checked first. Ignore it, and you get nowhere fast.
Symptoms
- Requests to modify or delete resources are refused outright.
- Server responds with HTTP 428 status instead of the expected 200 or 204.
- Browser or client shows errors during update or PUT operations.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Missing conditional headers: The request did not include headers like ‘If-Match’ or ‘If-Unmodified-Since’, which the server demands to prevent unintended overwrites.
- Medium Misconfigured client requests: The client software or API call does not correctly implement HTTP precondition logic.
- Low Strict server policy: Server administrators have set rigid rules to avoid race conditions or lost updates, requiring all write operations to be conditional.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your request include conditional headers such as ‘If-Match’ or ‘If-Unmodified-Since’?
- Is your client or API correctly formatting and sending these headers?
Fixes
Ensure your HTTP client includes ‘If-Match’ with the correct ETag value or ‘If-Unmodified-Since’ with the proper timestamp before sending modification requests.
Update your API calls or client library to properly attach conditional headers as per the server’s requirements.
Consult your server admin or hosting provider to adjust or clarify precondition requirements, if business processes allow.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely interacts with 428 responses since it does not perform state-changing requests like PUT or DELETE. For crawling, 428 is irrelevant, but it signals technical correctness and safe update practices on the server side.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models see 428 as a protocol enforcement detail - not an error per se but a gatekeeper instructing the client to prove it’s playing nicely before proceeding.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A content management system’s API rejecting updates due to missing ‘If-Match’ headers, preventing accidental overwrites in collaborative editing.
- A poorly coded client application ignoring 428 responses, repeatedly sending unsafe writes causing frustration and failed deployments.
HTTP 429 Too Many Requests
The server is slapping you down because you’ve been a greedy pest, bombarding it with too many requests in too short a time.
HTTP 429 Too Many Requests
The server is slapping you down because you’ve been a greedy pest, bombarding it with too many requests in too short a time.
The server is slapping you down because you’ve been a greedy pest, bombarding it with too many requests in too short a time.
Reality Check
Most SEOs treat 429 like a mere nuisance or believe it’s a rare event. Newsflash: it’s a glaring red flag that your site or bots are behaving like digital bullies and Google doesn’t appreciate being overwhelmed, no matter how ‘important’ you think your crawling schedule is.
Symptoms
- Search console warnings about excessive crawl rate or blocked resources
- Sudden drop in Googlebot crawl frequency
- User complaints about slow responses or being blocked intermittently
Likely Causes
Ranked by probability. Highest probability cause first.
- High Overzealous crawling: Your own bots, or third parties, are pinging your server so often it throws up its hands and says ‘enough’.
- Medium API or service abuse: Automated tools or integrations hammering endpoints without proper rate limiting.
- Low Misconfigured rate limits: Server or firewall rules set thresholds too low, triggering 429 unnecessarily.
Diagnostic Steps
Work through each question to identify the root cause.
- Are your analytics or logs showing a spike in request volume from particular IPs or user agents?
- Is the traffic legitimate (e.g. Googlebot, your own crawlers) or suspicious (e.g. unknown bots, scrapers)?
Fixes
Use Google Search Console’s crawl rate controls, adjust your robots.txt to disallow or slow crawling on heavy pages, and implement server-side rate limiting that favours genuine crawlers.
Introduce API keys, strict rate limits, and monitor usage with alerts to nip abuse in the bud.
Audit your server and firewall configurations. Raise thresholds sensibly to match your typical traffic patterns without inviting abuse.
AI Context
Google (Googlebot / Search Console)
Google respects your server limits. When greeted with 429s, it backs off dramatically, reducing crawl frequency to avoid penalising your site or wasting resources. Persistent 429s can mean slower indexing or overlooked content.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models ingest content indexed by Google. If 429s cause content to vanish from the index, AI-generated responses become stale or incomplete, reflecting gaps rather than up-to-date information.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A large e-commerce site’s aggressive internal crawler triggered 429s, causing Googlebot to retreat, tanking organic traffic overnight. Simple crawl rate adjustments fixed it.
- An API integration for keyword data repeatedly hit rate limits, returning 429s and breaking automated reporting workflows until throttling was properly configured.
HTTP 431 Request Header Fields Too Large
The server throws its toys out the pram because your request headers are bloated beyond its tolerance.
HTTP 431 Request Header Fields Too Large
The server throws its toys out the pram because your request headers are bloated beyond its tolerance.
The server throws its toys out the pram because your request headers are bloated beyond its tolerance.
Reality Check
Most SEOs treat this like some mystical beast only developers can fix. Spoiler alert - it’s usually a matter of trimming down your cookie fat or reining in rogue plugins. No magic, just good housekeeping.
Symptoms
- Server responds with HTTP 431 status code.
- Requests fail intermittently or consistently with header size complaints.
- Logs indicate header fields exceed server or proxy limits.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Excessive Cookies: Browsers or applications stacking up cookies like hoarders storing junk, pushing header sizes over the line.
- Medium Overly Verbose Headers: Custom headers or user-agent strings ballooning unnecessarily.
- Low Proxy or Server Configuration Limits: The server or upstream proxy has strict header size caps set too low.
Diagnostic Steps
Work through each question to identify the root cause.
- Have you checked the size of your request headers, especially cookies?
- Are cookies or other headers unusually large or numerous?
Fixes
Clear unnecessary cookies, reduce cookie scope and lifespan, and audit your site’s scripts for cookie bloat.
Simplify or remove custom headers. Don’t send more metadata than strictly needed.
Increase allowed header size in your web server or proxy settings (e.g., Apache’s ‘LimitRequestFieldSize’ or Nginx’s ‘large_client_header_buffers’).
AI Context
Google (Googlebot / Search Console)
Googlebot won’t proceed if your headers trigger a 431 error. It’s a hard stop – no crawling, no indexing. The bot expects clean, lean requests.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t care about HTTP headers. They parse content. But if your site is inaccessible due to 431 errors, there’s no content for them to ingest in the first place.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site whose persistent cart cookies ballooned over years of unregulated expiry dates, triggering 431 errors on checkout.
- A SaaS app with verbose authentication tokens in headers exceeding proxy limits after a recent API update.
HTTP 500 Internal Server Error
The server threw a wobbly and couldn’t complete your request, leaving you with a vague ‘something went wrong’ message.
HTTP 500 Internal Server Error
The server threw a wobbly and couldn’t complete your request, leaving you with a vague ‘something went wrong’ message.
The server threw a wobbly and couldn’t complete your request, leaving you with a vague ‘something went wrong’ message.
Reality Check
If you think a 500 error is just a momentary hiccup or a rare blip, you’re already behind the curve. Ninety-five per cent of SEOs treat it like a minor inconvenience rather than the red flag it is - a sign the underlying server or application is banging its head against a wall.
Symptoms
- Visitor sees a generic ‘500 Internal Server Error’ page instead of the content.
- Complete failure to load key website resources or pages.
- Random, inconsistent failures on otherwise stable pages.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Server-side application crash: Code errors, unhandled exceptions, or runtime failures in scripts or applications cause the server to fail silently.
- Medium Configuration errors: Misconfigured .htaccess files, PHP settings, or other server directives trip the server up.
- Medium Resource exhaustion: Overloaded server memory, CPU, or exhausted PHP workers leave the server unable to handle requests.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the server running your own code or a CMS system?
- Are there recent changes or deployments on the site?
- Are server resource limits being hit (RAM, CPU, PHP workers)?
Fixes
Inspect server error logs immediately. Identify and fix bugs or unhandled exceptions in the codebase. Use proper error handling and debugging tools.
Validate all server and application config files, especially .htaccess and php.ini. Restore from backup if misconfiguration is suspected.
Monitor server resource metrics. Optimise inefficient code and database queries. Consider scaling hosting resources or moving to a more robust environment.
AI Context
Google (Googlebot / Search Console)
Googlebot treats HTTP 500 as a temporary failure and will retry crawling later. Persistent 500 errors signal site instability, leading to crawling deprioritisation and indexation issues.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models see a 500 error as a server-side breakdown, meaning no content is retrievable or analysable at that moment-effectively a ‘content black hole’.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A busy ecommerce site suffers intermittent 500s after a plugin update, tanking traffic and sales until the faulty plugin is disabled.
- A news website’s custom PHP script hits memory limits during peak hours, causing 500 errors that only resolve after server upgrade.
HTTP 501 Not Implemented
The server bluntly refuses to handle your request because it does not support the required functionality.
HTTP 501 Not Implemented
The server bluntly refuses to handle your request because it does not support the required functionality.
The server bluntly refuses to handle your request because it does not support the required functionality.
Reality Check
The majority of SEOs treat 501 errors like some mystical 'server tantrum' to be tiptoed around, when in truth it’s a lazy or outdated server admitting it can’t be bothered with your fancy request.
Symptoms
- Server returns a 501 status code in response to a client request
- Certain HTTP methods (like PUT or DELETE) trigger the error, while GET works fine
- APIs or advanced features fail to operate due to unsupported server commands
Likely Causes
Ranked by probability. Highest probability cause first.
- High Unsupported HTTP method: The server software doesn’t recognise or support the HTTP method you’re trying to use, often because it’s an unusual or non-standard verb.
- Medium Minimal or outdated server software: The server is running an old or stripped-down stack that lacks implementation for newer HTTP features.
- Low Misconfigured reverse proxy or gateway: A proxy or gateway in front of the server incorrectly passes requests it cannot handle, resulting in a 501 response.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you using an HTTP method other than GET, POST, or HEAD?
- Does your server or API documentation confirm support for this HTTP method?
Fixes
Switch to supported methods where possible or update server software to one that supports the required HTTP verbs.
Upgrade to a modern server platform or enable modules/extensions that implement the missing methods.
Audit proxy settings to ensure it correctly forwards all HTTP methods; adjust configuration or replace faulty middleware.
AI Context
Google (Googlebot / Search Console)
Googlebot ignores responses with 501 status as the requested resource or action is unfulfilled; it flags functionality gaps but does not penalise for missing HTTP methods.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Large language models detect 501 as an explicit 'function not supported' signal, often concluding the server is limited or legacy, which may affect content freshness and feature availability in their summarisation.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A client attempts a PATCH request to an Apache server without mod_http_patch enabled, resulting in persistent 501 errors.
- A minimal IoT device’s web server returns 501 to all methods except GET, breaking API integrations expecting standard REST methods.
HTTP 502 Bad Gateway
Your server, playing proxy, got rubbish back from the upstream server and decided to throw a tantrum.
HTTP 502 Bad Gateway
Your server, playing proxy, got rubbish back from the upstream server and decided to throw a tantrum.
Your server, playing proxy, got rubbish back from the upstream server and decided to throw a tantrum.
Reality Check
Nearly every 'SEO expert' blames 502 errors on vague 'server issues' and recommends witchcraft fixes. Meanwhile, the real problem is almost always misconfigured proxies or overloaded backends. Stop panicking and start diagnosing.
Symptoms
- Visitors see a 502 Bad Gateway error page instead of your content.
- Crawlers report inaccessible pages or dropped indexing attempts.
- Intermittent outages or slow load times coupled with error spikes.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Upstream server unresponsive or crashing: The backend your proxy or gateway depends on is down, overloaded, or timing out.
- Medium Misconfigured reverse proxy or firewall: Proxy settings or firewall rules blocking or mangling requests.
- Low DNS resolution failures: The proxy cannot resolve the upstream server’s IP correctly, sending requests to nowhere.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the upstream server reachable and healthy?
- Are proxy or firewall settings correctly forwarding requests?
Fixes
Restart or scale your backend services. Check logs to identify crashes, memory leaks, or CPU spikes. Implement proper timeout and retry configurations.
Audit your proxy configurations (Nginx, Apache, HAProxy) for correct backend server addresses, ports, and SSL settings. Verify firewall rules aren’t blocking essential traffic.
Confirm DNS records are accurate and TTLs are appropriate. Use direct IP testing to isolate DNS issues.
AI Context
Google (Googlebot / Search Console)
Googlebot treats 502 errors as server failures, often retrying later. Persistent 502s lead to deindexing or ranking drops because the content is effectively invisible.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models see 502 errors as gaps in available content. Retrieval-augmented generation (RAG) systems might skip or flag these sources as unreliable or unavailable.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An ecommerce site experienced 502 errors because their load balancer wasn’t properly routing to newly added backend servers after a deployment.
- A news portal suffered intermittent 502s due to a firewall update that blocked incoming traffic on the proxy’s communication port.
HTTP 504 Gateway Timeout
Your server acted as a diligent middleman but got stood up when the upstream server failed to respond in time.
HTTP 504 Gateway Timeout
Your server acted as a diligent middleman but got stood up when the upstream server failed to respond in time.
Your server acted as a diligent middleman but got stood up when the upstream server failed to respond in time.
Reality Check
Nearly every SEO in the industry treats 504 errors like a polite suggestion rather than a critical roadblock. If your gateway times out, Google’s patience isn’t the issue – your architecture is.
Symptoms
- Visitors get a blank or error page citing a 504 Gateway Timeout.
- Intermittent unavailability of your website or specific resources.
- Crawlers fail to access pages, leading to drop in indexing or rankings.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Upstream Server Overload: The server your gateway relies on is overwhelmed, slow, or outright unresponsive – classic signs of under-provisioned infrastructure or spikes in traffic.
- Medium Network Connectivity Issues: Communication between your gateway and upstream server is interrupted due to flaky connections, firewall misconfigurations, or routing problems.
- Medium Misconfigured Timeout Settings: Your gateway or proxy sets timeout limits too low, killing legitimate but slower responses before they arrive.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the upstream server responsive when accessed directly?
- Is the upstream server under heavy load or crashing?
Fixes
Audit backend performance, scale horizontally or vertically, optimise queries and code, implement caching strategies to reduce load.
Check your firewall rules, ensure stable routing between gateway and upstream, verify SSL handshakes if applicable.
Adjust gateway and proxy timeout thresholds to balance patience and resource utilisation – don’t be trigger-happy killing slow but valid responses.
AI Context
Google (Googlebot / Search Console)
A 504 signals a failure to access content promptly. Googlebot will mark the page as temporarily unreachable and reduce crawl rate, risking ranking drops if persistent.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t directly deal with HTTP codes, but retrieval-augmented generation systems will see missing data or stale caches, impacting answer accuracy.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site running complex inventory checks on a slow database saw 504 errors spike during peak sales hours, tanking conversions.
- A news aggregator’s reverse proxy had a 5-second timeout on upstream fetches, causing 504 errors whenever partner APIs slowed down.
HTTP 505 HTTP Version Not Supported
The server throws its toys out of the pram because you dared to use an HTTP version it neither recognises nor supports.
HTTP 505 HTTP Version Not Supported
The server throws its toys out of the pram because you dared to use an HTTP version it neither recognises nor supports.
The server throws its toys out of the pram because you dared to use an HTTP version it neither recognises nor supports.
Reality Check
Nearly all 'SEO experts' waste time chasing fancy fixes for 505 errors when the truth is your client’s server is stuck in the digital dark ages or your request is outright malformed. The web moves on – so should your expectations.
Symptoms
- Server response includes HTTP status code 505.
- Browser or crawler fails to load or index the requested URL.
- Error logs explicitly mention unsupported HTTP protocol version.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Obsolete or non-compliant client request: Your user agent or crawler is using a bizarre or legacy HTTP version nobody in their right mind supports any longer.
- Medium Legacy or misconfigured server software: The server runs outdated software that refuses modern HTTP requests or is misconfigured to reject standard versions.
- Low Proxy or intermediary device interference: A proxy or firewall in the request chain mishandles or downgrades the HTTP version, triggering the error.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the offending request specify an HTTP version other than 1.0 or 1.1, such as HTTP/2 or HTTP/3?
- Is the server environment known to support modern HTTP versions?
Fixes
Update user agents, crawlers, or scripts to use standard HTTP/1.1 or supported versions. Avoid hacking HTTP headers with non-standard protocol strings.
Patch or upgrade server software to a version that supports at least HTTP/1.1. Verify server configuration files do not explicitly reject modern HTTP versions.
Audit the request chain for proxies or firewalls that manipulate HTTP versions. Configure or replace devices that cause incompatibility.
AI Context
Google (Googlebot / Search Console)
Googlebot expects standard HTTP protocols. If the server responds with 505, Google simply marks the URL as inaccessible and excludes it from indexing until fixed.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models do not directly parse HTTP responses but rely on data from crawlers; repeated 505 errors result in incomplete or missing information about the URL’s content.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A bespoke CMS running an obsolete server stack returned 505 errors when the site transitioned to HTTP/2, killing crawl access.
- Misconfigured corporate proxy stripped HTTP/2 headers, causing all external requests to receive 505 responses from a modern server.
HTTP 506 Variant Also Negotiates
The server has stuffed up its own content negotiation by making a chosen variant negotiate again, causing an internal configuration cock-up.
HTTP 506 Variant Also Negotiates
The server has stuffed up its own content negotiation by making a chosen variant negotiate again, causing an internal configuration cock-up.
The server has stuffed up its own content negotiation by making a chosen variant negotiate again, causing an internal configuration cock-up.
Reality Check
Most SEOs never even hear of this status because it’s about server-level misconfigurations that only the server admin should be screwing around with. Yet the 95% who do guess will blame Google instead of their broken backend.
Symptoms
- Server responds with HTTP 506 status instead of delivering content
- Visitors see error pages instead of expected variants (language, format)
- Crawlers fail to access resources due to negotiation loops
Likely Causes
Ranked by probability. Highest probability cause first.
- High Misconfigured content negotiation: The server’s variant resource is mistakenly set to negotiate again, creating an infinite loop the server can’t untangle.
- Medium Faulty server or proxy settings: Reverse proxies or intermediaries mismanaging negotiation headers can trigger this error.
- Low Software bugs or updates gone wrong: Sometimes a botched update or incompatible modules cause negotiation logic to break unexpectedly.
Diagnostic Steps
Work through each question to identify the root cause.
- Is your server configured to perform content negotiation on this resource?
- Does the chosen variant attempt its own negotiation or redirect?
Fixes
Disable content negotiation on the variant resource itself; ensure only the parent resource negotiates content.
Audit proxies, CDNs, and intermediaries to stop them from altering negotiation headers improperly.
Roll back recent changes or update server software to a stable version that handles negotiation correctly.
AI Context
Google (Googlebot / Search Console)
Googlebot treats HTTP 506 as a server error; it cannot crawl or index the resource until the server stops tripping over its own negotiation logic.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models see the error as a meta-problem; no content reached, so the AI has nothing to process or summarise.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A multilingual news site with misconfigured language negotiation doubled up content negotiation on variants, killing crawlability.
- An ecommerce platform using proxies that mangled negotiation headers causing 506 errors intermittently on product pages.
HTTP 507 Insufficient Storage
The server has run out of space and can’t store what your request demands – basically, it’s full up and waving a white flag.
HTTP 507 Insufficient Storage
The server has run out of space and can’t store what your request demands – basically, it’s full up and waving a white flag.
The server has run out of space and can’t store what your request demands – basically, it’s full up and waving a white flag.
Reality Check
Most SEOs treat HTTP 507 like a mythical beast, ignoring it until clients scream bloody murder. Newsflash – if your server can’t store data, your site might as well be offline. Yet the usual suspects spend hours chasing ghosts instead of checking disk space.
Symptoms
- Server returns HTTP 507 status instead of normal response.
- Pages or resources fail to load or update properly.
- Logs show storage-related errors or warnings about disk space limits.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Disk space exhausted: The most obvious – your server’s storage is full, leaving no room for temporary files, caches, or uploads.
- Medium Quota restrictions: User or application-specific quotas limit storage, triggering 507 when limits are hit, even if the physical disk has space.
- Low Filesystem or partition issues: Misconfigured or corrupted partitions can report full storage while physical space exists.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the server’s disk space nearly or completely full?
- Are quotas or storage limits in place for this user or process?
Fixes
Delete unnecessary files, clear caches, rotate logs, or upgrade storage capacity. Don’t pretend a reboot will fix a full drive.
Review and increase user or application quotas in your server management settings.
Run filesystem checks, repair corrupted partitions, and ensure correct mount options.
AI Context
Google (Googlebot / Search Console)
Googlebot simply hits a wall – it cannot retrieve or update content if the server’s storage is insufficient. This translates to poor crawl budget use and possible deindexing if persistent.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models rely on up-to-date indexed data; if the source site returns 507 errors, the data becomes stale or unavailable, leading to lower quality or missing information in AI outputs.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site failed to upload product images for days due to a full storage partition, causing dozens of 507 errors and lost sales.
- A CMS hit user quota limits after a large content migration, returning 507 errors for editors trying to save pages.
HTTP 508 Loop Detected
The server has thrown in the towel after spotting an infinite loop during request processing, refusing to churn forever.
HTTP 508 Loop Detected
The server has thrown in the towel after spotting an infinite loop during request processing, refusing to churn forever.
The server has thrown in the towel after spotting an infinite loop during request processing, refusing to churn forever.
Reality Check
Most SEOs treat this like a myth or an exotic beast only appearing in lab environments. In truth, it’s a glaring sign your server config or CMS plugins are stuck in some sad, recursive dance - but nobody bothers to look because ‘it’s rare’. Rare doesn’t mean irrelevant; it means you’re ignoring a ticking time bomb.
Symptoms
- Server returns HTTP 508 status code instead of the page you requested.
- Crawlers and users get stuck in endless redirects or error loops.
- Site performance plummets and server logs show repeated requests cycling without resolution.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Faulty Redirect Chains: Redirect rules that point back to themselves or form a circular path, causing the server to loop indefinitely.
- Medium Misconfigured CMS Plugins or Extensions: Extensions that hook into request processing and inadvertently cause repeated internal calls.
- Low Recursive Script or API Calls: Server-side scripts triggering themselves or external APIs looping requests without exit conditions.
Diagnostic Steps
Work through each question to identify the root cause.
- Are you seeing repeated redirects or requests cycling in server logs?
- Are your redirects or server-side calls configured to eventually terminate or do they reference each other in a cycle?
Fixes
Review and simplify redirect rules. Use tools like Curl or Screaming Frog to map redirects. Remove or correct any redirects that point back to themselves or form loops.
Disable plugins/extensions one by one to identify the culprit. Update or replace problematic plugins. Audit custom hooks or filters that may trigger recursive requests.
Examine server-side code for calls that trigger themselves without exit conditions. Add proper stopping criteria or timeouts to prevent infinite recursion.
AI Context
Google (Googlebot / Search Console)
Googlebot respects HTTP 508 as a hard stop. It won’t waste crawl budget looping forever; it flags the site as problematic and throttles crawling accordingly. The site’s ranking potential suffers from crawl inefficiency and poor user experience signals.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models don’t understand HTTP codes natively but rely on contextual hints. Tools powered by LLMs incorporating Retrieval Augmented Generation (RAG) might detect the loop from logs or error messages, but won’t diagnose the root cause without explicit human-provided data.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- A popular CMS site with poorly constructed redirect plugins causing their homepage to bounce endlessly between www and non-www versions, triggering HTTP 508 errors in server logs.
- An e-commerce platform where a custom API integration triggered repeated internal requests due to missing exit conditions, causing server overload and 508 responses.
HTTP 510 Not Extended
The server refuses to serve your resource because your request failed to meet its arbitrary extension policy.
HTTP 510 Not Extended
The server refuses to serve your resource because your request failed to meet its arbitrary extension policy.
The server refuses to serve your resource because your request failed to meet its arbitrary extension policy.
Reality Check
Most SEOs and devs never even see this status, yet somehow they insist it’s ‘critical’. It’s the internet’s polite way of saying ‘your request is too basic for me’.
Symptoms
- Server responds with HTTP 510 status code.
- Resource access denied with an explanation about missing extensions.
- Client software or crawler receives a message about unmet policy requirements.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Missing required extensions in request headers: The server demands certain protocol extensions or headers that your client didn’t send because you’re using default or outdated tools.
- Medium Misconfigured server expecting unnecessary extensions: Your server is set up to require extensions nobody really uses, a classic ‘clever’ sysadmin blunder.
- Low Client-server protocol mismatch: The client and server aren’t speaking the same ‘language’ or protocol version, leading to denied access.
Diagnostic Steps
Work through each question to identify the root cause.
- Does your client request contain all required protocol extensions or headers specified by the server?
- Is the server configuration custom or unusually strict about extensions?
Fixes
Modify your HTTP request to include the required extensions or headers. Consult the API or server documentation to know exactly what is missing.
Adjust your server settings to relax extension requirements. Unless you’re running a NASA launch control, this ‘feature’ is overkill.
Update or configure your client software to support the server’s required protocol and extensions. Using legacy tools in 2024 is asking for trouble.
AI Context
Google (Googlebot / Search Console)
Googlebot rarely, if ever, encounters HTTP 510. If it does, the resource is simply not fetched or indexed because the request is incomplete according to server policy.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models like ChatGPT or Gemini see HTTP 510 as a protocol-level refusal. They rely on external data retrieval systems; if those fail with 510, the LLM gets no content to process.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- An overly zealous API endpoint rejecting simple GET requests because it expects custom headers, locking out basic crawlers.
- Legacy enterprise servers refusing connections from modern clients that omit deprecated protocol extensions.
HTTP 511 Network Authentication Required
The client must authenticate to access the network before proceeding with any web requests.
HTTP 511 Network Authentication Required
The client must authenticate to access the network before proceeding with any web requests.
The client must authenticate to access the network before proceeding with any web requests.
Reality Check
Most 'SEO experts' wouldn’t spot this one in a lineup - they treat it like a mere server hiccup rather than the network bouncer demanding ID before you even get through the door. Welcome to the world beyond standard HTTP codes.
Symptoms
- Browser or crawler receives a 511 status code instead of the expected content.
- Unable to access the site without passing through a captive portal or network login.
- Automated SEO tools report failed fetches or blocked pages despite server being "up".
Likely Causes
Ranked by probability. Highest probability cause first.
- High Captive portal or network gateway requiring login: Your client is stuck behind a network that insists on authentication before allowing any traffic through, common in hotels, airports, or corporate setups.
- Medium Proxy or firewall intercepts requests: A proxy device returns 511 to enforce policy or credential checks, disrupting normal SEO crawling.
- Low Misconfigured network or ISP settings: Rare but can happen if network authentication is accidentally enforced on public or semi-public servers.
Diagnostic Steps
Work through each question to identify the root cause.
- Is your content or crawler behind a network that requires manual login or authentication (e.g., hotel Wi-Fi, corporate VPN)?
- Can you access the site manually through this network without authentication?
Fixes
Authenticate properly with the network before crawling or accessing content. For persistent SEO crawling, arrange static IP whitelisting or alternative network access.
Configure the proxy or firewall to allow unauthenticated access for legitimate crawlers or whitelist IP ranges.
Consult network administrators to disable unnecessary authentication on public-facing servers.
AI Context
Google (Googlebot / Search Console)
Googlebot cannot bypass network authentication barriers and interprets the 511 as a hard stop, resulting in indexing failure or partial crawl.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models have no direct network access and depend on documented or cached content; a 511 status means content is effectively unreachable and thus opaque to the model.
At a Glance
Case Patterns
Real-world scenarios seen in practice.
- Hotel Wi-Fi networks intercepting Googlebot, causing indexed pages to vanish unexpectedly.
- Enterprise networks blocking external SEO tools until proper VPN or login credentials are provided.
Crawl Budget Waste
Googlebot is spending its allocated time on your site crawling low-value URLs instead of your important pages.
Crawl Budget Waste
Googlebot is spending its allocated time on your site crawling low-value URLs instead of your important pages.
Googlebot is spending its allocated time on your site crawling low-value URLs instead of your important pages.
Reality Check
If you have fewer than 10,000 URLs, you probably don't have a crawl budget problem — you have a site quality problem. But if you have faceted navigation, infinite scroll, or poorly implemented tags, you can accidentally create millions of useless URLs that trap Googlebot in an endless loop.
Symptoms
- Important new pages take weeks to get indexed despite being submitted in the sitemap.
- Search Console's Crawl Stats report shows high crawl activity on parameter URLs (?sort=price, ?color=red), tags, or internal search results.
- Server logs show Googlebot hitting non-canonical or broken pages frequently.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Faceted Navigation/Filters: E-commerce sites allow endless combinations of filters, creating a near-infinite number of crawlable URLs.
- Medium Session IDs/Tracking Parameters: URLs that append unique IDs for every user visit, making Googlebot think each visit is a new page.
- Medium Soft Error Pages/Infinite Spaces: Pages that return 200 OK but have no content, or calendars/archives that extend indefinitely.
- Low Hacked Pages/Spam: Your site was compromised and thousands of spam URLs were injected and are now being crawled.
Diagnostic Steps
Work through each question to identify the root cause.
- Look at your GSC Crawl Stats report. Are the most-crawled URLs your actual, valuable pages?
- What type of URLs are dominating the crawl: parameter URLs, tag/category archives, or internal search results?
Fixes
Use robots.txt to explicitly block Googlebot from crawling parameter URLs. Do not rely solely on canonical tags — Googlebot still has to crawl the page to see the canonical.
Remove tracking parameters from internal links. Use cookies or local storage for session tracking instead of URL parameters. Block them in robots.txt.
Ensure error pages return a true 404 or 410 status code. Apply noindex tags or robots.txt rules to infinite calendars, paginated archives beyond depth 2, or empty filter combinations.
Clean the hack immediately, return 410 Gone status codes for the spam URLs, and secure the vulnerability that allowed the injection.
AI Context
Google (Googlebot / Search Console)
A massive waste of their server resources and electricity. Google will aggressively throttle crawling of a site if it detects endless low-value URL spaces.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs generally rely on sitemaps or specific seed URLs for RAG retrieval. They are less likely to get trapped in infinite spaces, but if your core content is buried under layers of parameters, they may never find it.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A clothing retailer's site went from 5,000 indexed pages to 2 million because they allowed Googlebot to crawl every combination of size, colour, and price filter. Their core product pages stopped getting indexed entirely.
- A real estate site let Googlebot crawl their internal search results page, creating a new URL for every possible combination of location and property type a user ever searched.
Page Not Being Indexed
Google knows the page exists but has actively chosen not to add it to the search index.
Page Not Being Indexed
Google knows the page exists but has actively chosen not to add it to the search index.
Google knows the page exists but has actively chosen not to add it to the search index.
Reality Check
"Discovered - currently not indexed" does not mean Google hasn't found your page. It means Google found it, looked at your crawl budget or site quality, and decided it wasn't worth the server resources to crawl it right now. "Crawled - currently not indexed" is worse: Google read it and decided it wasn't good enough to keep.
Symptoms
- The URL appears in Search Console as "Discovered - currently not indexed" or "Crawled - currently not indexed".
- Searching site:yourdomain.com/your-exact-url returns zero results.
- The page has been live for more than 14 days with no indexation status change.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Low Overall Site Quality: Google doesn't trust your domain enough to index deep pages.
- Medium Crawl Budget Exhaustion: Your site has too many low-value URLs (tags, filters, pagination) eating up Googlebot's time.
- Medium Thin or Duplicate Content: The specific page offers nothing new compared to what is already in the index.
- Low Technical Blockers: Rogue noindex tags, canonicalized to another URL, or blocked by robots.txt.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the URL pass a Live Test in Google Search Console?
- Is the status "Discovered - currently not indexed" or "Crawled - currently not indexed"?
Fixes
You cannot fix this at the page level. Improve E-E-A-T signals across the domain, strengthen internal linking to the page, and consider consolidating weaker content elsewhere on the site.
Audit your site architecture. Block parameter URLs, tag pages, and low-value programmatic pages using robots.txt to force Googlebot to focus on your core pages.
Consolidate the page with another relevant page, or significantly upgrade the information gain: add unique data, expert quotes, or original research.
Remove the noindex tag, fix the canonical tag to self-reference, or update robots.txt. Retest with the URL Inspection tool after each change.
AI Context
Google (Googlebot / Search Console)
A mathematical resource allocation decision based on expected utility versus server cost. Googlebot has a limited crawl budget per domain and will skip URLs it considers low-value.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs do not index the web like Googlebot; they rely on training data cutoffs and real-time retrieval (RAG). If a page isn't in Google or Bing's index, RAG systems cannot retrieve it to ground their answers — making indexation a prerequisite for AI citation.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site with 10,000 products had 80,000 "Discovered - currently not indexed" URLs because their faceted navigation created endless parameter URLs.
- A blog published 50 AI-generated articles in a week; all stuck in "Crawled - currently not indexed" because the content lacked information gain versus the existing index.
Google Business Profile Not Ranking Locally
A Google Business Profile exists and is verified but is not appearing in the local pack (map results) for relevant local queries.
Google Business Profile Not Ranking Locally
A Google Business Profile exists and is verified but is not appearing in the local pack (map results) for relevant local queries.
A Google Business Profile exists and is verified but is not appearing in the local pack (map results) for relevant local queries.
Reality Check
This is a more specific version of the local SEO ranking problem. The Business Profile itself is the primary lever — Google's documentation explicitly states that businesses with complete and accurate information are more likely to appear in local results.
Symptoms
- Profile appears for branded searches but not category searches.
- Profile appears in some locations but not others.
- Profile was previously ranking but has dropped out of the local pack.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Profile Suspended or Pending Reinstatement: A policy violation or verification issue has suspended the profile.
- high probability Incomplete Profile: Missing primary category, address, phone number, or business hours.
- medium probability Category Mismatch: Primary category does not match the query type.
- medium probability Insufficient Reviews: Significantly fewer reviews than competitors in the local pack.
- medium probability NAP Inconsistency: Name, Address, and Phone number on the profile do not match the website or other citations.
- low probability Distance Factor: The business address is too far from the search origin for the query type.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the profile showing any suspension or policy violation notices in Business Profile Manager?
- Is the profile 100% complete — primary category, address, phone, hours, description, photos?
- Does the primary category exactly match the type of business users are searching for?
- How does your review count compare to the top 3 local results?
AI Context
Google (Googlebot / Search Console)
The Business Profile is the primary data source for local search results. An incomplete or inconsistent profile reduces Google's confidence in the business's information and suppresses local visibility.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems increasingly use Business Profile data to generate local recommendations. An incomplete profile means incomplete data in AI-generated local responses.
At a Glance
Local SEO Rankings Not Appearing
Your business is not appearing in the Google local pack (map results) for relevant local queries despite having a Google Business Profile.
Local SEO Rankings Not Appearing
Your business is not appearing in the Google local pack (map results) for relevant local queries despite having a Google Business Profile.
Your business is not appearing in the Google local pack (map results) for relevant local queries despite having a Google Business Profile.
Reality Check
Local ranking is determined by three factors Google documents explicitly: relevance, distance, and prominence. If you are not appearing, one or more of these is insufficient — not a technical glitch.
Symptoms
- Business appears for branded searches but not category searches ("dentist near me").
- Rankings appear inconsistently — present for some queries, absent for others.
- Competitors with fewer reviews and less content consistently outrank you.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Incomplete Business Profile: Missing primary category, incomplete address, no business description, or few photos.
- high probability Distance Factor: The user's search location is too far from your registered address for Google to include you.
- medium probability Insufficient Prominence: Low review count, few citations in local directories, weak web presence.
- medium probability Unverified or Suspended Profile: Profile exists but has not been verified, or has been suspended for a policy violation.
- low probability Category Mismatch: Primary category does not match the query type.
Diagnostic Steps
Work through each question to identify the root cause.
- Is your Business Profile verified in Google Business Profile Manager?
- Is your primary category the most specific and accurate description of your business type?
- How many reviews does your profile have compared to the top 3 local results?
- Is your NAP (Name, Address, Phone) consistent across your website, Google Business Profile, and major directories?
AI Context
Google (Googlebot / Search Console)
A business that has not fully completed its profile is signalling lower relevance and prominence than competitors who have. The algorithm has no way to distinguish between a business that is genuinely less relevant and one that simply hasn't completed its profile.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Local business data from Google Business Profile is increasingly used to populate AI-generated local recommendations. An incomplete profile means incomplete data in AI responses.
At a Glance
Brand Name Not Appearing in Search
Searching for a brand name in Google does not return the brand's own website as the first result, or the brand does not appear at all.
Brand Name Not Appearing in Search
Searching for a brand name in Google does not return the brand's own website as the first result, or the brand does not appear at all.
Searching for a brand name in Google does not return the brand's own website as the first result, or the brand does not appear at all.
Reality Check
A brand that does not rank for its own name has a fundamental trust or indexing problem. This is almost always either a technical issue (site not indexed, manual action) or a brand identity problem (the brand name is too generic or too similar to a more established entity).
Symptoms
- The brand's website does not appear in the first page of results for a branded query.
- A Knowledge Panel does not appear for the brand despite it being an established business.
- Competitors are bidding on the brand name in paid search and appearing above organic results.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Site Not Indexed: The site has a robots.txt block or noindex tag preventing indexing.
- medium probability Manual Action: A site-wide manual action has removed the site from search results.
- medium probability Generic Brand Name: The brand name is a common word or phrase that Google does not associate with the specific business.
- medium probability New Domain: The site is new and has not yet accumulated sufficient authority for Google to rank it confidently for branded queries.
- low probability Negative SEO or Brand Suppression: A competitor is actively working to suppress the brand's search visibility.
Diagnostic Steps
Work through each question to identify the root cause.
- Does `site:yourdomain.com` return any results in Google?
- Is there a manual action in Search Console?
- Is the brand name a generic word (e.g., "Apple," "Amazon," "Blue")?
- How old is the domain?
AI Context
Google (Googlebot / Search Console)
A brand that ranks for its own name has demonstrated sufficient entity authority for Google to confidently associate the domain with the brand. Failure to rank for branded queries is a signal of insufficient entity recognition.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems use entity recognition to identify and cite brands. A brand with weak entity signals is less likely to be mentioned in AI-generated responses about its industry.
At a Glance
Competitor Outranking on Brand Terms
A competitor's website is appearing above your own website in organic search results when users search for your brand name.
Competitor Outranking on Brand Terms
A competitor's website is appearing above your own website in organic search results when users search for your brand name.
A competitor's website is appearing above your own website in organic search results when users search for your brand name.
Reality Check
A competitor outranking you for your own brand name in organic search is a serious signal — it means Google has more confidence in the competitor's page as the answer to a query about your brand than in your own page. This is almost always an authority or entity recognition problem, not a technical one.
Symptoms
- A competitor's comparison page ("Brand X vs. Brand Y") ranks above your homepage for your brand name.
- Review sites (Trustpilot, G2, Capterra) consistently outrank your homepage for branded queries.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Weak Domain Authority: Your domain has insufficient authority for Google to confidently rank it first for branded queries.
- high probability Competitor Targeting Your Brand: A competitor has created content specifically targeting your brand name.
- medium probability Review/Directory Sites: Third-party review or directory sites have more authority than your domain for informational branded queries.
- medium probability Brand Name Ambiguity: Your brand name is shared with another entity that has more authority.
- low probability Technical Issues: Your homepage is not properly optimised for your brand name.
Diagnostic Steps
Work through each question to identify the root cause.
- What type of page is outranking you — a competitor's site, a review site, or a directory?
- What is your domain's authority relative to the outranking page?
AI Context
Google (Googlebot / Search Console)
A brand that cannot rank first for its own name has insufficient entity authority. Google's confidence in the association between the brand name and the domain is lower than its confidence in the competing page.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems that generate brand comparisons or recommendations will cite the pages that rank for brand-related queries. A competitor outranking you for your brand name means AI systems may cite the competitor when users ask about your brand.
At a Glance
Featured Snippet Lost or Never Won
A page that previously held a featured snippet position has lost it, or a page that should be winning a featured snippet is not appearing in the zero position.
Featured Snippet Lost or Never Won
A page that previously held a featured snippet position has lost it, or a page that should be winning a featured snippet is not appearing in the zero position.
A page that previously held a featured snippet position has lost it, or a page that should be winning a featured snippet is not appearing in the zero position.
Reality Check
Featured snippets are selected automatically by Google. You cannot claim one — you can only structure your content to make selection more probable. Losing a snippet does not mean your content got worse; it may mean a competitor's content got better structured.
Symptoms
- A page ranking in positions 2-5 for a query that triggers a featured snippet is not winning the snippet.
- Featured snippet impressions have dropped in Search Console's Search Appearance report.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Competitor Restructured Their Content: A competitor rewrote their answer to be more direct and concise, displacing your snippet.
- high probability Content Format Mismatch: Your content is in prose but the query deserves a list, or vice versa.
- medium probability Answer Too Long: Paragraph snippets are typically 40-60 words. Longer answers are less likely to be selected.
- medium probability Heading Doesn't Mirror the Query: The heading above your answer doesn't match the query language Google is trying to answer.
- low probability Core Update Re-evaluation: A core update changed which page Google considers the best answer for the query.
Diagnostic Steps
Work through each question to identify the root cause.
- What format is the current featured snippet (paragraph, list, or table)?
- Does your heading above the answer mirror the exact query language?
- Is your answer in the first 1-2 sentences of the section, or buried in the middle?
- Is your page ranking in the top 5 for the query?
AI Context
Google (Googlebot / Search Console)
The featured snippet is Google's best attempt to answer the query without requiring a click. The page that provides the clearest, most direct answer in the correct format wins.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Featured snippet optimisation and AI Overview citation optimisation are closely related. Content structured for featured snippets is also structured for AI retrieval.
At a Glance
Keyword Rankings Fluctuating Wildly
Rankings for target keywords are moving significantly (5+ positions) on a daily or weekly basis without any corresponding changes to the site.
Keyword Rankings Fluctuating Wildly
Rankings for target keywords are moving significantly (5+ positions) on a daily or weekly basis without any corresponding changes to the site.
Rankings for target keywords are moving significantly (5+ positions) on a daily or weekly basis without any corresponding changes to the site.
Reality Check
Some ranking fluctuation is normal — Google's results are dynamic by design. Wild fluctuation (10+ position swings daily) is typically a sign of one of three things: a query where Google has not settled on a preferred result, a site with thin authority competing in a competitive space, or an active algorithm update rolling out.
Symptoms
- Rank tracking tools showing inconsistent data across different data centres.
- Rankings appear to stabilise briefly then fluctuate again.
- Fluctuation is concentrated on specific pages or query types.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Algorithm Update Rolling Out: Google updates often cause temporary volatility as the new signals propagate across data centres.
- high probability Query Volatility: Some queries are inherently volatile because Google has not determined a stable preferred result.
- medium probability Thin Authority: A site with borderline authority for a competitive query will fluctuate as Google's confidence in the ranking changes.
- medium probability Competitor Activity: Competitors are actively optimising for the same queries, causing the competitive landscape to shift.
- low probability Rank Tracking Methodology: The rank tracking tool is measuring from different locations or devices, producing inconsistent data.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the fluctuation coinciding with a known algorithm update?
- Is the fluctuation concentrated on specific pages or query types?
- How does your page compare to the pages that are displacing it?
AI Context
Google (Googlebot / Search Console)
Ranking volatility is a normal feature of a dynamic ranking system. Google's results change as the web changes, as user behaviour changes, and as Google's systems are updated.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems that use search data for citation will cite whichever page is ranking at the time of their data collection. Volatile rankings mean inconsistent AI citation.
At a Glance
Traffic Drop After Algorithm Update
Your site lost significant organic visibility immediately following a confirmed or unconfirmed Google algorithm update.
Traffic Drop After Algorithm Update
Your site lost significant organic visibility immediately following a confirmed or unconfirmed Google algorithm update.
Your site lost significant organic visibility immediately following a confirmed or unconfirmed Google algorithm update.
Reality Check
Algorithm updates are not penalties; they are re-evaluations of the entire web graph. You weren't punished — someone else was simply determined to be a better answer for the query based on the new weights Google applied to its ranking signals. Recovery is measured in months, not days.
Symptoms
- Sudden, sharp decline in organic sessions (often 30–60%) correlating exactly with a known Google update date.
- Rankings drop across the board, not just on a single page or keyword.
- Impressions in Google Search Console fall off a cliff across the entire domain.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Helpful Content System (HCU) Re-evaluation: Google decided your site is primarily "search engine-first" rather than "people-first" content.
- High Core Update Shift: Google changed how it weights relevance, authority, or user intent for your core topics.
- Medium Spam Update Hit: You were caught using manipulative tactics — scaled content abuse, expired domains, site reputation abuse.
- Low Technical or Tracking Error: Your analytics broke exactly when an update happened.
Diagnostic Steps
Work through each question to identify the root cause.
- Did the traffic drop happen exactly on the date of a named Google update?
- Did you lose traffic across the entire domain or just specific folders and page types?
Fixes
Fundamentally change your content strategy. Delete or consolidate unhelpful, derivative content. Add unique information gain, expert quotes, and original research to your core pages. This takes months — there are no shortcuts.
Analyse the pages that replaced you in the SERPs. What format are they using? What intent are they serving? Adapt your content to match the new reality of the SERP, not the old one.
Stop the manipulative tactic immediately. Disavow toxic links, remove scaled AI content, and clean up your site architecture. Submit a reconsideration request only after all violations are resolved.
Cross-reference the GSC impressions chart (which is independent of your analytics) against the drop. If GSC shows no change, your tracking broke — not your rankings.
AI Context
Google (Googlebot / Search Console)
A necessary recalibration of the entire web graph's quality signals. Entire site categories can be re-evaluated simultaneously based on shifts in how Google models "helpful content."
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs don't have "algorithm updates" in the same way, but they do have model updates and retrieval preference shifts. If your content is suddenly excluded from RAG retrieval, the new model likely prefers more authoritative or more efficiently structured sources.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A product review affiliate site lost 80% of its traffic during an HCU update because all its content summarised Amazon reviews without any original testing or photography.
- A local business lost rankings for "best plumber near me" because Google decided a directory site better served the user intent for that specific query type.
Back Button Hijacking Penalties
A site is penalised because it, or a third-party script running on it, manipulates the browser history API to prevent users from navigating back to the search results.
Back Button Hijacking Penalties
A site is penalised because it, or a third-party script running on it, manipulates the browser history API to prevent users from navigating back to the search results.
A site is penalised because it, or a third-party script running on it, manipulates the browser history API to prevent users from navigating back to the search results.
Reality Check
This is not a technical glitch. Google classifies back button hijacking as a malicious spam practice. Enforcement begins June 15, 2026. The cause is almost always a rogue third-party script or ad network, not your core codebase. You are responsible for everything running on your pages.
Symptoms
- Users complain they cannot use the browser back button to leave your site.
- Clicking "Back" reloads the current page or redirects to an unsolicited URL.
- A sudden, severe drop in organic traffic (automated demotion).
- A manual action notification in Google Search Console citing "Malicious behaviour" or "Spam policies violation."
- Unusually high bounce rates coupled with extremely low time-on-page metrics.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Rogue Ad Networks:: Low-tier advertising platforms injecting malicious scripts to artificially inflate pageviews or force ad impressions.
- High Third-Party Widgets:: Compromised or aggressive third-party tools (pop-ups, lead capture forms) manipulating the browser history API.
- High Malware Infection:: The site has been compromised and attackers have injected scripts designed to trap users.
- High Aggressive Exit-Intent Scripts:: Poorly coded scripts attempting to prevent users from leaving by manipulating `history.pushState`.
Diagnostic Steps
Work through each question to identify the root cause.
- Replicate the Behaviour:: Open your site in an incognito window. Navigate to a few pages, then attempt to use the browser back button to return to the search results. If you are trapped or redirected, the issue is present.
- Check Search Console:: Look for manual actions under "Security & Manual Actions." A penalty here confirms Google has detected the behaviour.
- Isolate the Cause:: Disable all third-party scripts, plugins, and ad networks. Repeat the navigation test. If the issue disappears, re-enable them one by one until the hijacking behaviour returns.
- Audit History API Usage:: Use Chrome DevTools (Console) to monitor calls to `history.pushState` or `history.replaceState`. Excessive or unexpected calls indicate manipulation.
Fixes
Once identified (usually an ad network or widget), immediately remove the script from your site.
If an ad network is responsible, terminate the relationship. Only use reputable advertising partners.
If using exit-intent popups, ensure they trigger via mouse movement, not by manipulating browser history.
If you received a manual action, remove the malicious code, document the steps taken to secure the site, and submit a reconsideration request via Google Search Console.
AI Context
Google (Googlebot / Search Console)
AI search agents and evaluators heavily weigh user experience signals. A site that traps users is a strong negative signal for trustworthiness and usability.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems will likely flag this behaviour, leading to exclusion from AI-generated summaries and generative search experiences, which prioritise safe, authoritative destinations.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A publisher implements a new, high-yield ad network and sees a traffic drop two weeks later after the network's scripts begin hijacking browser navigation.
- An affiliate site uses an aggressive script to force users through a funnel, resulting in a manual action after Google's June 2026 enforcement begins.
Core Web Vitals Failing
Your site is failing Google's real-world user experience metrics for loading speed, interactivity, or visual stability.
Core Web Vitals Failing
Your site is failing Google's real-world user experience metrics for loading speed, interactivity, or visual stability.
Your site is failing Google's real-world user experience metrics for loading speed, interactivity, or visual stability.
Reality Check
Core Web Vitals are a tie-breaker, not a primary ranking factor. Fixing a failing CWV score will not jump you from page 2 to page 1 if your content or authority is lacking. However, a failing score will cost you conversions and user trust, which indirectly harms your SEO over time.
Symptoms
- Google Search Console's "Core Web Vitals" report shows "Poor" or "Needs Improvement" URLs for Mobile or Desktop.
- PageSpeed Insights shows failing field data (CrUX) for LCP (Largest Contentful Paint), INP (Interaction to Next Paint), or CLS (Cumulative Layout Shift).
- High bounce rate, especially on mobile devices, despite strong content quality.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Unoptimised Images/Media: Loading massive hero images, uncompressed videos, or missing width/height attributes causing layout shifts.
- High Heavy JavaScript Execution: Third-party scripts (ads, tracking, chat widgets) or massive JS bundles delaying interactivity (INP).
- Medium Slow Server Response Time: Cheap hosting, lack of a CDN, or unoptimised database queries delaying the initial HTML response (TTFB).
- Medium Render-Blocking Resources: Loading CSS or synchronous JavaScript in the <head> that prevents the browser from painting the page.
Diagnostic Steps
Work through each question to identify the root cause.
- Run your page through PageSpeed Insights (pagespeed.web.dev). Are you failing the Field Data (CrUX) or just the Lab Data (Lighthouse)?
- Which specific metric is failing in the Field Data?
Fixes
Optimise and compress your largest hero image. Use modern formats (WebP/AVIF). Preload the LCP image with <link rel="preload">. Implement a CDN and upgrade your hosting to improve server response time (TTFB).
Audit and remove unnecessary third-party scripts. Defer non-critical JavaScript with async/defer attributes. Break up long tasks in your main thread execution using the Scheduler API or requestIdleCallback.
Always include explicit width and height attributes on images and videos. Reserve space for dynamically injected content like ads or banners. Ensure web fonts load without causing a flash of unstyled text (FOUT) using font-display: swap.
Inline critical CSS. Defer or async non-critical CSS and JavaScript. Move render-blocking scripts to the bottom of the <body> or use the defer attribute.
AI Context
Google (Googlebot / Search Console)
A measure of user frustration. A slow, janky site provides a poor experience even if the content is excellent. Google uses real-world Chrome User Experience (CrUX) data, not synthetic lab scores.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs and AI agents don't care about your visual stability or paint times. However, if your heavy JavaScript completely blocks the agent from rendering the DOM, you have a critical discoverability failure that overlaps with the AI Overview citation problem.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site failed CLS because their promotional banner was injected via JavaScript after the main content loaded, pushing the entire page down just as users tried to click a product. Reserving space for the banner fixed it immediately.
- A news publisher failed INP because they had 15 different ad networks and tracking pixels executing on the main thread every time a user clicked a link.
Duplicate Content Issues
Identical or very similar content appears on multiple URLs within your site or across different domains.
Duplicate Content Issues
Identical or very similar content appears on multiple URLs within your site or across different domains.
Identical or very similar content appears on multiple URLs within your site or across different domains.
Reality Check
There is no such thing as a "duplicate content penalty." Google doesn't penalise you for having the same content on two URLs; it simply picks one to rank and ignores the other. The real problem is that duplicate content dilutes your link equity and wastes your crawl budget.
Symptoms
- Only one version of a page ranks, while other versions with the same content are completely ignored.
- Search Console shows "Duplicate without user-selected canonical" or "Duplicate, Google chose different canonical than user."
- Link equity (backlinks) is split between multiple URLs that serve the same purpose.
- Internal search results, parameter URLs, and session IDs are being indexed.
Likely Causes
Ranked by probability. Highest probability cause first.
- High URL Parameters: E-commerce filters, tracking tags, and session IDs create multiple URLs for the exact same page.
- Medium HTTP/HTTPS and WWW/Non-WWW: Your server doesn't enforce a single protocol, so all four versions of your domain are accessible.
- Low Printer-Friendly/PDF Versions: Separate URLs for printing or PDF versions of your content that are also being indexed.
- Low Syndicated Content: You published an article verbatim on Medium or LinkedIn without a cross-domain canonical.
Diagnostic Steps
Work through each question to identify the root cause.
- Is the duplicate content on your own domain or a different domain?
- What is causing the duplicate URLs on your domain — URL parameters, or HTTP/HTTPS protocol inconsistency?
Fixes
Implement self-referencing canonical tags on your main pages. Point all parameter variations back to the main URL using rel="canonical". Block parameter URLs in robots.txt as a belt-and-braces measure.
Set up permanent 301 redirects at the server level to force all traffic to the secure, preferred version of your domain (https://www.yourdomain.com). Ensure your CMS is set to the preferred URL.
Add a noindex meta tag to the printer-friendly or PDF versions of your pages. They serve users but should not consume index space.
Require all publishers to use a cross-domain canonical tag pointing to your original URL. If they won't, add noindex to the syndicated version, or accept you're giving away your ranking signal.
AI Context
Google (Googlebot / Search Console)
A canonicalisation challenge. Google has to figure out which version is the "master" copy to consolidate ranking signals — and it may not choose the version you prefer.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs often struggle with duplicate content because it introduces noise into their retrieval systems. A clean, canonicalised site architecture ensures the LLM retrieves the correct, authoritative version of your content rather than a parameter-laden copy.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A massive e-commerce site had their link equity split between category/shoes and category/shoes?sort=price. Implementing proper canonical tags consolidated the signals and boosted the main category page's rankings.
- A blogger republished all their articles on Medium to get more reach, but Medium outranked their own blog because they forgot to use the cross-domain canonical tag.
FAQ Rich Results Disappeared
Valid FAQ schema markup is present on the page, but Google no longer displays the FAQ rich snippet in the search results.
FAQ Rich Results Disappeared
Valid FAQ schema markup is present on the page, but Google no longer displays the FAQ rich snippet in the search results.
Valid FAQ schema markup is present on the page, but Google no longer displays the FAQ rich snippet in the search results.
Reality Check
This is almost certainly not a technical error. As of August 2023, Google deprecated FAQ rich results for the vast majority of websites. Unless you are a highly authoritative government or health website, Google will ignore your FAQ schema for rich snippet purposes. You cannot force the snippet to appear through technical fixes; it is now an authority-gated feature.
Symptoms
- FAQ rich snippets that previously appeared for your pages have vanished from the SERPs.
- The Rich Results Test shows the FAQ schema is perfectly valid with no errors.
- Google Search Console's Performance report shows a decline in impressions and clicks attributed to the "FAQ rich results" search appearance type.
- Competitors in the same industry have also lost their FAQ snippets.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Algorithmic Deprecation:: Google intentionally restricted FAQ rich results to well-known, authoritative government and health websites in August 2023 to simplify the search results page.
- High Lack of Source Authority:: The site does not meet the extreme authority threshold now required to trigger this specific rich result type.
- Low Technical Error (Rare):: The schema markup is broken or the FAQ content is not visible on the page to users.
Diagnostic Steps
Work through each question to identify the root cause.
- Check Site Category:: Is your site a recognised government entity or a major, authoritative health organisation? If not, the feature is disabled algorithmically for your site.
- Verify Schema Validity:: Use the Rich Results Test. If the schema is valid but not displaying, the issue is algorithmic, not technical.
- Check Search Console:: Look for manual actions or structured data errors. If none exist, the issue is algorithmic.
- Check Competitors:: If other non-government, non-health sites in your niche have also lost FAQ snippets, this confirms the algorithmic restriction.
Fixes
FAQ rich results are no longer a viable strategy for most sites. Do not invest development resources trying to fix valid schema.
While deprecated FAQ schema causes no harm, removing it reduces markup bloat. Prioritise this during your next technical audit.
Shift focus from FAQ schema to building genuine topical authority and E-E-A-T signals that improve overall rankings.
Well-structured FAQ content on the page (even without schema) remains highly valuable for AI Overviews and chat-based search agents, which use clear Q&A formats for synthesis.
AI Context
Google (Googlebot / Search Console)
While traditional search engines have deprecated the visual FAQ snippet for most sites, AI Overviews and chat-based search agents rely heavily on clear, concise Q&A formats to extract facts.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Maintaining a well-structured FAQ section on your page increases the probability of being cited by AI systems, even if the rich snippet no longer appears in standard search results.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site spends weeks auditing their FAQ schema after snippets disappear, only to discover the feature was algorithmically restricted for their industry.
- A local service business loses its FAQ snippets but retains strong rankings because the underlying content quality is high.
Hacked Content and Spam Injections
Malicious code or spam content inserted into your site without your knowledge, used by attackers to rank their own pages via your domain's authority.
Hacked Content and Spam Injections
Malicious code or spam content inserted into your site without your knowledge, used by attackers to rank their own pages via your domain's authority.
Malicious code or spam content inserted into your site without your knowledge, used by attackers to rank their own pages via your domain's authority.
Reality Check
Hacked sites are not just a security problem — they are an active SEO liability. Attackers exploit your domain authority to rank pharmaceutical, gambling, or malware pages in Google. Google's spam systems will detect the injected content and can issue a manual penalty, remove affected pages, or delist the entire domain. Recovery after a Google spam penalty takes months, not days.
Symptoms
- Google Search Console shows a "Security issue" alert or a manual action for "Hacked content".
- Searching site:yourdomain.com shows pages in languages you don't publish, or pages about pharmaceuticals, gambling, or adult content.
- Googlebot's crawl discovers pages and links you did not create.
- Visitors report browser security warnings when visiting your site.
- Server access logs show unusual POST requests or file modification timestamps on core files.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Outdated CMS, themes, or plugins: known vulnerabilities in unpatched software exploited by automated scanners.
- Medium Weak or compromised credentials: brute-forced admin passwords, reused passwords from breached accounts, or stolen session tokens.
- Low Unvetted third-party scripts: tag manager containers or inline scripts loading external JS that has been compromised at the source.
Diagnostic Steps
Work through each question to identify the root cause.
- Check Google Search Console under Security & Manual Actions. Are there any security warnings or manual action notifications?
- Has a full server-side security scan been performed and has the attack vector been identified and closed?
Fixes
Update everything immediately. Remove unused plugins and themes entirely — deactivated plugins still present attack surface. Enable automatic security updates where possible. Run Wordfence, Sucuri, or equivalent post-update.
Rotate all admin passwords. Enforce two-factor authentication for all accounts with write access. Audit user accounts for unknown additions. Regenerate API keys and secret salts.
Audit every script loaded on the site. Remove scripts from unknown or untrusted sources. Implement a Content Security Policy (CSP) header to restrict which external domains can execute scripts. Review your tag manager container for unauthorised tags.
AI Context
Google (Googlebot / Search Console)
Google's spam detection systems actively identify hacked content patterns including cloaking (different content shown to Googlebot vs users), injected links, and known malware signatures. Discovery leads to either an algorithmic demotion or a manual action. Both require a formal reconsideration request after cleanup and can take 60–90 days to lift.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Hacked content that reaches training data pipelines injects low-quality, misleading, or malicious text into model weights. For retrieval-augmented systems, injected spam pages may be retrieved and surfaced in AI answers, polluting outputs. Domains known for spam injections are also deprioritised in curated datasets used for fine-tuning.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A corporate site running an unpatched WordPress 5.6 plugin was compromised via a known file upload vulnerability. Over 2,000 pharmaceutical spam pages were injected. Google issued a manual action within 3 weeks. Recovery required a full server wipe, CMS reinstall from backup, and a 90-day reconsideration timeline.
- A legal firm's site had its tag manager container accessed via a reused password from a breached service. An attacker injected a redirect script that cloaked visitors on mobile to a gambling site. The site received a "sneaky redirects" manual action.
Image Search Traffic Missing
Images on the site are not appearing in Google Images search results, or image search traffic has dropped significantly.
Image Search Traffic Missing
Images on the site are not appearing in Google Images search results, or image search traffic has dropped significantly.
Images on the site are not appearing in Google Images search results, or image search traffic has dropped significantly.
Reality Check
Image search is a significant traffic source for visual industries (photography, food, fashion, home décor, travel). For other industries, it is a secondary channel. The fixes are straightforward and the effort is low relative to the potential gain.
Symptoms
- Images that should appear in image search results are not indexed.
- Image search traffic has dropped following a site migration or CMS change.
- Competitor images appear in image search but yours do not.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Missing or Generic Alt Text: Images have no alt text or generic alt text ("image1.jpg") that provides no context.
- high probability Images Blocked in robots.txt: The image directory or CDN domain is blocked from crawling.
- medium probability Lazy Loading Without Proper Implementation: Images use lazy loading but without the correct attributes, preventing Googlebot from seeing them.
- medium probability Low-Quality or Duplicate Images: Images are too small, too compressed, or duplicates of images on other sites.
- low probability noindex on Image Pages: The pages containing the images have noindex tags.
Diagnostic Steps
Work through each question to identify the root cause.
- Do images have descriptive alt text?
- Are images blocked in robots.txt?
- Are images using lazy loading?
- Are images served from a CDN domain?
AI Context
Google (Googlebot / Search Console)
Images are indexed separately from pages. Alt text is the primary signal Google uses to understand image content. Without it, images are effectively invisible to image search.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems that generate visual content recommendations increasingly reference Google Images data. Images with strong metadata are more likely to be surfaced in AI-generated visual recommendations.
At a Glance
Improper Use of Canonical Tags
Misapplied canonical tags cause search engines to ignore important pages or consolidate ranking signals incorrectly, removing pages from the index.
Improper Use of Canonical Tags
Misapplied canonical tags cause search engines to ignore important pages or consolidate ranking signals incorrectly, removing pages from the index.
Misapplied canonical tags cause search engines to ignore important pages or consolidate ranking signals incorrectly, removing pages from the index.
Reality Check
Canonicals are not a magic bullet — slap them on wrongly and you will shoot your rankings in the foot. A canonical tag is a declaration: you are telling Google which URL is the definitive version of a piece of content. Google treats this as a strong hint. Point the canonical at the wrong page and you have effectively asked Google to ignore the page you care about.
Symptoms
- Key pages drop out of the index after a canonical tag change or CMS update.
- Unexpected pages rank instead of the intended canonical URL — a paginated version or a filtered variant ranks instead of the main page.
- Loss of organic traffic despite unchanged content and no algorithmic update.
- Google Search Console Coverage report shows pages as "Duplicate, submitted URL not selected as canonical" on URLs you want indexed.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Self-referencing canonicals missing: pages without self-canonicals invite crawlers to infer duplicates, especially on sites with URL parameters.
- Medium Canonical pointing to an unrelated or low-value page: directing canonical to a different topic or to a higher-level category page strips the target page of all its ranking signals.
- Low Cross-domain canonicals misapplied: declaring canonicals to external domains passes link equity externally and may cause the originating page to be deindexed.
Diagnostic Steps
Work through each question to identify the root cause.
- In Google Search Console, open URL Inspection for the affected page. Does the "Google-selected canonical" differ from your declared canonical?
- Do all pages on your site carry a self-referencing canonical tag pointing to their own clean URL (without parameters or session identifiers)?
Fixes
Insert a self-referencing canonical on every page that points to its own clean, preferred URL. This is the baseline canonical implementation. Configure your CMS or framework to inject this automatically so new pages receive it by default.
Audit canonical URLs across your site using Screaming Frog's canonical report. For each canonical, verify that the target page is the genuine duplicate or preferred version. Correct any that point to category pages, homepages, or dissimilar content.
Remove cross-domain canonicals unless you have a specific and intentional reason for them (e.g. syndicating content and wanting the original to retain credit). Use 301 redirects if the goal is to consolidate two domains permanently.
AI Context
Google (Googlebot / Search Console)
Google treats canonical tags as strong hints but not absolute directives. When Google overrides a declared canonical, it is because its content analysis disagrees with the site's declaration — usually because the declared canonical points to a page with substantially different content. Canonical mismanagement can remove pages from the index permanently without any error message in Search Console.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Canonical mismanagement does not affect LLM training data directly, but pages removed from Google's index as a result are effectively invisible to retrieval-augmented AI systems. Content that is de-indexed due to a bad canonical cannot be surfaced in AI-generated answers that rely on live search data.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A publishing site canonicalised all blog posts to the homepage during a template migration, intending to set up canonicals later. They forgot. Every blog post lost its individual rankings within three weeks as Google treated the homepage as the canonical for all content.
- An e-commerce site added canonicals pointing filtered URLs to category pages. The filter pages had built up significant backlinks from product comparison sites. After canonicalisation, those backlinks stopped flowing to the category page as expected — the canonical was interpreted as a redirect signal by some crawlers, stripping equity.
International Targeting Errors
The wrong language or country version of a page is appearing in search results for users in a specific country or language market.
International Targeting Errors
The wrong language or country version of a page is appearing in search results for users in a specific country or language market.
The wrong language or country version of a page is appearing in search results for users in a specific country or language market.
Reality Check
hreflang is one of the most error-prone aspects of technical SEO. The most common error — missing self-referential annotations — is invisible to most site owners because it doesn't cause a crawl error; it simply causes Google to ignore the entire hreflang set.
Symptoms
- Search Console's International Targeting report shows hreflang errors.
- Users in target markets are landing on the wrong language version of the site.
- The same URL appearing in multiple country's search results simultaneously.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Missing Self-Referential hreflang: Each page must declare its own hreflang — this is the most commonly missed requirement.
- high probability Incorrect Language or Country Codes: Using `en-EN` instead of `en-GB`, or `fr` instead of `fr-FR`.
- medium probability hreflang Pointing to Non-Canonical URLs: hreflang annotations pointing to redirected or paginated URLs rather than the canonical version.
- medium probability Incomplete Annotation Set: Not all language variants reference each other — the set must be complete and reciprocal.
- low probability No x-default Annotation: Missing fallback for users whose language/region is not explicitly covered.
Diagnostic Steps
Work through each question to identify the root cause.
- Does Search Console's International Targeting report show errors?
- Does every page include a self-referential hreflang annotation?
- Do all hreflang annotations use correct ISO 639-1 language codes and ISO 3166-1 Alpha 2 country codes?
- Do hreflang annotations point to canonical, non-redirected URLs?
AI Context
Google (Googlebot / Search Console)
hreflang is a hint, not a directive. If the implementation is incorrect, Google will fall back to its own language detection, which may not match your intent.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems that serve multiple language markets will use the language signals from your pages to determine which version to cite. Incorrect hreflang means the wrong version may be cited in the wrong language market.
At a Glance
JavaScript-Rendered Content Not Indexed
Key content loaded via JavaScript after the initial HTML response fails to appear in search indexes because crawlers cannot or do not execute the rendering step.
JavaScript-Rendered Content Not Indexed
Key content loaded via JavaScript after the initial HTML response fails to appear in search indexes because crawlers cannot or do not execute the rendering step.
Key content loaded via JavaScript after the initial HTML response fails to appear in search indexes because crawlers cannot or do not execute the rendering step.
Reality Check
Google is decent at rendering JavaScript but relying on it exclusively is a recipe for invisibility. Googlebot renders JavaScript in a second wave that can occur days or weeks after the initial crawl. During that window, your content does not exist in the index. On resource-constrained pages with complex JavaScript, Googlebot may time out or encounter errors before rendering completes. Any content that is not in the initial HTML response is at risk.
Symptoms
- Pages rank poorly or not at all despite content that is clearly visible to users in a browser.
- Google Search Console URL Inspection shows a rendered page that is missing body content, product descriptions, or key text.
- View Source (not DevTools) of the page shows empty div containers where content should be, populated only after JavaScript executes.
- Fetch as Google or the URL Inspection Live Test returns an incomplete page rendering.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Client-side rendering only: all meaningful content is injected into the DOM by JavaScript after the initial HTML response is received.
- Medium Render-blocking scripts or JavaScript errors: scripts fail silently, throw errors, or block the rendering pipeline, preventing content from loading within Googlebot's rendering budget.
- Low No server-side or pre-rendering fallback: the site architecture assumes all visitors (including crawlers) can execute JavaScript perfectly.
Diagnostic Steps
Work through each question to identify the root cause.
- Right-click the page and select "View Page Source" (not "Inspect Element"). Is your main content — headlines, body text, product descriptions — visible in the raw HTML?
- Use Google Search Console URL Inspection > Live Test. Does the rendered screenshot show all the content a user sees in a browser?
Fixes
Implement server-side rendering (SSR) or static site generation (SSG) for all content that matters for SEO. Frameworks like Next.js, Nuxt, SvelteKit, and Astro support SSR natively. For existing single-page applications where full SSR is not immediately feasible, implement dynamic rendering: serve a pre-rendered HTML snapshot to crawlers while continuing to serve the JavaScript SPA to real users.
Audit JavaScript errors using the JavaScript console in Chrome DevTools and Lighthouse. Ensure all required APIs, endpoints, and third-party scripts load successfully. Use resource hints (preload, prefetch) for critical JavaScript. Audit third-party tag manager containers for scripts that block rendering.
Evaluate your technology stack for SSR capability. If a full architecture change is not feasible, use a pre-rendering service (Prerender.io, Rendertron) to serve static HTML snapshots to crawlers. Implement the user-agent detection to deliver pre-rendered content only to bots — do not cloak by showing different content to Google than to users.
AI Context
Google (Googlebot / Search Console)
Google's rendering pipeline processes JavaScript in a second wave that is resource-constrained and asynchronous. Pages with complex client-side rendering may wait in the rendering queue for days or weeks. Content that is not rendered before Googlebot's session timeout is not indexed from that crawl. Googlebot also has a memory budget — very large JavaScript bundles may exceed it, causing partial renders.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Retrieval-augmented AI systems rely on content that is indexed and available. JavaScript-rendered content that fails to be indexed is invisible to these systems. Additionally, LLMs trained on web crawl data (Common Crawl and similar) face the same rendering limitations as Googlebot — content that requires JavaScript execution to appear is systematically underrepresented in training datasets.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A React-based e-commerce site found that product description text — injected client-side via an API call after initial load — was not indexed. Customers searching for specific product features found competitors instead. Implementing SSR for product pages via Next.js resolved the indexing gap within five weeks of deployment.
- A job board built on a Vue.js SPA had zero job listings indexed despite thousands of live postings. The listings were loaded from an API call that Googlebot consistently failed to complete within its rendering budget. Dynamic rendering via Prerender.io delivered static HTML to crawlers and resulted in 4,000 listings being indexed within four weeks.
Manual Action Received
Google's Search Console shows a manual action — a human-applied penalty for a specific spam policy violation — that is suppressing or removing the site from search results.
Manual Action Received
Google's Search Console shows a manual action — a human-applied penalty for a specific spam policy violation — that is suppressing or removing the site from search results.
Google's Search Console shows a manual action — a human-applied penalty for a specific spam policy violation — that is suppressing or removing the site from search results.
Reality Check
A manual action is the most serious SEO issue a site can face. It is also one of the most recoverable — because it is specific, documented, and has a defined remediation process. The worst response is to panic and make broad changes. The correct response is to read the action description carefully and fix exactly what it describes.
Symptoms
- Organic traffic has dropped to near zero or has been eliminated for specific sections.
- Site is not appearing in search results for branded queries (site-wide action).
- Specific pages are not appearing in search results (partial action).
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Link Spam: Purchased links, link exchanges, or automated link building.
- high probability Scaled Content Abuse: Large volumes of AI-generated or templated content with no added value.
- medium probability Cloaking: Different content served to Googlebot than to human users.
- medium probability Site Reputation Abuse: Third-party content published on your domain that exploits your site's authority.
- medium probability Hacked Content: Malicious content injected by a third party.
- low probability User-Generated Spam: Spammy content in comments, forums, or user profiles.
Diagnostic Steps
Work through each question to identify the root cause.
- What does the manual action description say?
- Is the action site-wide or partial?
- Can you identify the specific pages or patterns that triggered the action?
AI Context
Google (Googlebot / Search Console)
A manual action is Google's most direct communication that a site has violated its policies. The reconsideration process is designed to give sites the opportunity to demonstrate genuine remediation.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
A site under a manual action has reduced or zero search visibility. AI systems that use search data for citation will not cite pages that do not appear in search results.
At a Glance
Misconfigured hreflang Tags
Incorrect hreflang attributes confuse search engines about language and regional targeting, causing the wrong page version to rank in the wrong country.
Misconfigured hreflang Tags
Incorrect hreflang attributes confuse search engines about language and regional targeting, causing the wrong page version to rank in the wrong country.
Incorrect hreflang attributes confuse search engines about language and regional targeting, causing the wrong page version to rank in the wrong country.
Reality Check
Most SEOs slap on hreflang tags without bothering to test if they actually work or conflict with sitemaps. The spec requires every page to carry reciprocal tags — if page A declares hreflang pointing to page B, page B must declare hreflang pointing back to page A. Miss one tag and Google is entitled to ignore the entire cluster. International SEO that relies on broken hreflang is international SEO that does not work.
Symptoms
- International pages not ranking in target regions despite having localised content.
- Duplicate content warnings across country versions in Google Search Console.
- Language-specific organic traffic drops with no corresponding ranking changes in the primary market.
- Google Search Console URL Inspection shows a different hreflang alternate being served than the one you configured.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Incorrect language or region codes: using non-standard or malformed language-region combinations (e.g. "en-UK" instead of "en-GB") misleads Google.
- Medium Missing return tags: a page declares hreflang pointing to its alternate but the alternate does not carry a reciprocal tag pointing back.
- Low Conflicting signals from canonicals or sitemaps: canonical tags or sitemap entries contradict the hreflang directives, causing Google to distrust the entire hreflang cluster.
Diagnostic Steps
Work through each question to identify the root cause.
- Use a tool such as hreflang Tag Testing Tool (Aleyda Solis) or Screaming Frog's hreflang report. Do any pages show validation errors — missing return tags, invalid codes, or mismatched URLs?
- Do your hreflang tags use correct ISO 639-1 language codes paired with ISO 3166-1 Alpha-2 region codes where needed (e.g. en-GB, fr-FR, de-AT)?
Fixes
Replace all hreflang values with correctly formatted ISO 639-1 and ISO 3166-1 Alpha-2 codes. Use a reference list from the hreflang specification and validate with automated testing before and after deployment.
Ensure every hreflang tag on one page is mirrored on all of its counterparts. If page A has hreflang pointing to pages B and C, then B must point to A and C, and C must point to A and B. A tag manager or CMS-level template is the most reliable way to enforce this at scale.
Audit canonical tags and sitemap entries to ensure they align with hreflang declarations. Remove contradictions. The canonical URL on each page should match the self-referencing hreflang URL for that page.
AI Context
Google (Googlebot / Search Console)
Google uses hreflang to determine which version of a page to serve to users in specific language or regional contexts. When hreflang tags are broken or contradictory, Google falls back to its own content analysis and geolocation signals, which are less accurate. Broken hreflang clusters are often ignored entirely in favour of the page Google deems canonical.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Language models do not process hreflang attributes directly. However, misconfigured hreflang that causes the wrong regional content to rank means that LLMs trained on or retrieving from that content may surface region-specific information in the wrong context — for example, citing UK pricing or regulations in a US query.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- An e-commerce site targeting both UK and US markets found only the US version ranking across both regions. Investigation revealed that UK pages carried hreflang declarations pointing to US pages but US pages lacked reciprocal tags. Adding return tags recovered UK-specific rankings within four weeks.
- A news portal serving European markets lost significant organic traffic from Germany and France after a CMS migration. The migration had corrupted hreflang language codes, inserting "de-DE" as "de_DE" (underscore instead of hyphen). Fixing the separator character resolved the issue.
Missing XML Sitemap or Sitemap Errors
Absence of a valid XML sitemap or errors within it hinder Google's ability to discover, crawl, and index all important pages efficiently.
Missing XML Sitemap or Sitemap Errors
Absence of a valid XML sitemap or errors within it hinder Google's ability to discover, crawl, and index all important pages efficiently.
Absence of a valid XML sitemap or errors within it hinder Google's ability to discover, crawl, and index all important pages efficiently.
Reality Check
Crawlers do not magically discover every page — ignoring sitemaps is SEO laziness with real consequences. Sitemaps are the explicit declaration of what you want crawled and how often it changes. A missing sitemap is not fatal for small sites with strong internal linking, but for large sites or sites with orphaned content, it is a material handicap. A sitemap full of errors is arguably worse than no sitemap: it trains Google's systems to distrust your crawl signals.
Symptoms
- Inconsistent indexing of pages — some sections are well-indexed while others are not discovered for months.
- Crawl errors or sitemap submission warnings in Google Search Console Sitemaps report.
- New pages take unusually long to appear in search results despite being linked from the homepage or key navigation.
- Sitemap validator tools return XML errors, invalid URLs, or blocked URL warnings.
Likely Causes
Ranked by probability. Highest probability cause first.
- High No sitemap exists or has not been submitted to Search Console: the site has never set up an XML sitemap.
- Medium Malformed sitemap: XML syntax errors, invalid URL formats, or URLs that return 4xx errors are present in the sitemap file.
- Low Sitemap not updated or incomplete: the sitemap was generated once at launch and never updated, omitting new pages or including removed ones.
Diagnostic Steps
Work through each question to identify the root cause.
- Check Google Search Console under Indexing > Sitemaps. Is a sitemap submitted and does it show a valid processed status with URL counts matching expectations?
- Does your sitemap include any URLs that return 4xx errors, are blocked by robots.txt, or carry noindex tags?
Fixes
Generate an XML sitemap using your CMS sitemap feature, a plugin (e.g. Yoast SEO, Rank Math for WordPress), or a dedicated sitemap generator. Validate the XML structure before submission. Submit the sitemap URL (typically /sitemap.xml or /sitemap_index.xml) in Google Search Console under Indexing > Sitemaps.
Validate your sitemap using Google Search Console's Sitemaps report and the W3C XML Validator. Fix all XML syntax errors. Ensure all URLs use the correct protocol (https if the site has SSL). Remove any URLs that return errors, are blocked by robots.txt, or are non-canonical.
Configure your CMS to generate sitemaps dynamically so new and removed pages are reflected automatically. For static sites, add sitemap generation to your deployment pipeline. Set lastmod dates accurately — Google uses these to prioritise recrawling of updated content.
AI Context
Google (Googlebot / Search Console)
Google uses sitemaps as crawl guidance, not as indexing guarantees. Submitting a sitemap signals which URLs exist and when they were last modified, helping Googlebot prioritise crawl scheduling. Sitemaps with errors cause Google to distrust the submitted URL data. For large sites (10,000+ pages), a well-maintained sitemap index is essential for ensuring even-handed crawl coverage.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems do not access sitemaps directly. However, sitemaps are a mechanism for ensuring content is discovered and indexed by Google. Pages not indexed by Google are not available to retrieval-augmented AI systems. A comprehensive, accurate sitemap is therefore an indirect prerequisite for AI discoverability.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A corporate site with 3,000 product pages submitted a sitemap that had not been updated since the initial site build. 800 new product pages added over two years were not in the sitemap and had no internal links. After regenerating and resubmitting the sitemap, all 800 pages were indexed within three weeks.
- A news blog's auto-generated sitemap included 1,200 URLs that had been deleted but not redirected, all returning 404 errors. Google Search Console showed sitemap errors. After purging the dead URLs from the sitemap, Googlebot's crawl efficiency improved significantly and crawl stats showed more time allocated to live content.
Mobile Usability Failures
Mobile-specific usability problems that trigger ranking penalties in Google's mobile-first indexing system and damage user experience on small screens.
Mobile Usability Failures
Mobile-specific usability problems that trigger ranking penalties in Google's mobile-first indexing system and damage user experience on small screens.
Mobile-specific usability problems that trigger ranking penalties in Google's mobile-first indexing system and damage user experience on small screens.
Reality Check
Google has used mobile-first indexing for all new sites since 2019 and for virtually all existing sites since 2021. This means the mobile version of your page is the version Google indexes and uses for ranking — not the desktop version. A site that looks perfect on desktop but has usability issues on mobile is being evaluated by Google at its worst.
Symptoms
- Search Console's "Mobile Usability" report flags specific error types: clickable elements too close, text too small, content wider than screen.
- Mobile bounce rate in analytics is significantly higher than desktop bounce rate for the same pages.
- Mobile rankings are noticeably lower than desktop rankings for the same queries.
- Google's Mobile-Friendly Test returns failures or warnings for key pages.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Touch targets too small or too close together: buttons, links, and form elements do not meet Google's minimum 48x48px target size with 8px spacing.
- Medium Viewport not configured or incorrectly configured: the meta viewport tag is missing, uses a fixed pixel width, or prevents user scaling.
- Low Content wider than screen: fixed-width HTML or CSS elements overflow the viewport, requiring horizontal scrolling.
Diagnostic Steps
Work through each question to identify the root cause.
- Open Google Search Console and navigate to Experience > Mobile Usability. Are any URLs listed with specific error types?
- Using Chrome DevTools Device Toolbar (set to a 375px width iPhone simulation), do your primary landing pages display correctly without horizontal scrolling, and are all interactive elements easy to tap?
Fixes
Set minimum button height to 44–48px and minimum tap target spacing to 8px using CSS. Review navigation links, CTAs, form submit buttons, and pagination. Use Chrome DevTools' "Accessibility" panel to identify small touch targets automatically.
Add <meta name="viewport" content="width=device-width, initial-scale=1"> to every page's <head>. Do not use a fixed pixel value or user-scalable=no. Verify the tag is present in the actual rendered HTML (not just in templates) using URL Inspection.
Add img { max-width: 100%; height: auto; } to your global stylesheet. Audit all HTML elements for inline width attributes. Replace fixed-width containers with max-width + width: 100%. Test all advertising and embedded third-party widgets for overflow.
AI Context
Google (Googlebot / Search Console)
Google's mobile-first indexing means the Googlebot Smartphone crawler is the primary crawler that determines ranking. Pages with mobile usability failures may see their Core Web Vitals scores and Page Experience signals degraded. Google's ranking systems include mobile usability as a tiebreaker signal in competitive verticals.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Mobile usability does not directly affect LLM training data quality, but indirectly it matters: pages that suffer ranking penalties due to mobile failures accumulate fewer backlinks and citations over time, reducing their authority signals. AI retrieval systems that weight authority and citation count will progressively deprioritise these pages.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A B2C services company built on a legacy desktop CMS had 1,400 pages flagged in Search Console for "clickable elements too close together". Their mobile CTR was 40% lower than desktop despite identical rankings. Fixing touch target spacing across their template lifted mobile conversions by 18% over 8 weeks.
- An international news site had a fixed-width article template (960px) that forced horizontal scrolling on mobile. After switching to a fluid max-width layout, mobile bounce rate dropped by 22% and mobile rankings improved across their top 50 keywords within one crawl cycle.
Neglecting Robots.txt Configuration
Incorrect or missing robots.txt directives either block crawling of essential pages or permit indexing of sensitive content that should remain private.
Neglecting Robots.txt Configuration
Incorrect or missing robots.txt directives either block crawling of essential pages or permit indexing of sensitive content that should remain private.
Incorrect or missing robots.txt directives either block crawling of essential pages or permit indexing of sensitive content that should remain private.
Reality Check
Robots.txt is not a magic wand — misconfiguration either locks your site in a dungeon or throws the door wide open. The critical misunderstanding is that robots.txt prevents crawling, not indexing. A URL blocked in robots.txt can still appear in search results if it is linked from other pages — Google just cannot see what is on it. This creates a particularly damaging situation where your page ranks for nothing useful but appears in results with no snippet.
Symptoms
- Important pages not indexed despite being well-linked — URL Inspection in Search Console shows "Blocked by robots.txt".
- Staging, development, or admin URLs appearing in Google search results.
- A site-wide crawl tool shows entire sections of the site returning "blocked" status.
- New site launch where the "coming soon" robots.txt blocking all crawlers was never reverted.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Overzealous Disallow rules: a wildcard or directory-level Disallow blocks important content sections — commonly deployed accidentally during development and never reverted.
- Medium No restrictions on sensitive directories: private areas, admin panels, or staging subfolders accessible to crawlers without authentication.
- Low Syntax errors or incorrect file placement: the robots.txt file contains formatting errors, uses wrong user-agent capitalisations, or is not located at the domain root.
Diagnostic Steps
Work through each question to identify the root cause.
- Fetch your robots.txt file directly (yourdomain.com/robots.txt) and review all Disallow rules. Are any rules blocking important site sections — product directories, blog posts, category pages, or the entire site?
- Are there directories on your site — admin areas, /wp-admin/, /staging/, /dev/, CMS internals — that are not blocked by robots.txt?
Fixes
Remove or narrow blocking rules to only cover URLs that genuinely should not be crawled. Replace broad directory blocks with specific path patterns. Test every change using Google Search Console's robots.txt Tester before deployment. After correcting, request recrawl of affected pages via URL Inspection.
Add Disallow entries for all internal-only directories: /admin/, /wp-admin/, /staging/, /cgi-bin/, and any other server paths that should not be crawled. Combine with server-level access controls — robots.txt is a polite request that Googlebot honours, but malicious bots ignore it.
Ensure robots.txt is located at exactly yourdomain.com/robots.txt (not in a subdirectory). Validate syntax using the Google robots.txt Tester. Each rule block must start with "User-agent:" followed by "Disallow:" or "Allow:" lines. Comments use the # character. Empty Disallow: lines mean "allow all".
AI Context
Google (Googlebot / Search Console)
Googlebot respects robots.txt directives for crawling but not for indexing. A URL blocked by robots.txt can still be added to Google's index if it is linked from external pages — Google simply cannot see its content. This produces index entries with no snippet and no ranking potential. Conversely, failing to block development environments means Google may index duplicate or unfinished content, creating duplicate content problems.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Robots.txt is irrelevant to how LLMs process content once it is indexed. However, accidentally blocking pages prevents indexing, making that content invisible to retrieval-augmented AI systems. Failing to block development or staging content risks that content being indexed and potentially surfaced in AI-generated answers, where inaccurate or placeholder content could mislead users.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A company launched a new website with Disallow: / in robots.txt to prevent Google indexing during development. The development block was never removed before launch. The entire site had zero organic traffic for six weeks before the issue was identified during a technical audit.
- A media company's staging environment was accidentally made public. The staging site — containing draft articles and template pages — was indexed by Google, creating thousands of duplicate content instances. Adding robots.txt to the staging domain and submitting a removal request cleared the duplicates over four weeks.
Orphan Pages
Pages on your site that have no internal links pointing to them, making them effectively invisible to both users and search engines.
Orphan Pages
Pages on your site that have no internal links pointing to them, making them effectively invisible to both users and search engines.
Pages on your site that have no internal links pointing to them, making them effectively invisible to both users and search engines.
Reality Check
If you cannot find a page from your own navigation or from a contextually relevant post, neither can Google — regardless of how good the content is. Googlebot follows links. If no link points to the page, Googlebot has no path to walk. A sitemap entry is not a substitute for internal links; it is a clue, not a highway.
Symptoms
- Pages exist and return 200 but receive zero organic traffic after months of being live.
- Screaming Frog or Ahrefs Site Audit reports a large number of URLs with zero inbound internal links.
- The page appears in your sitemap.xml but has no referring internal URLs in a crawl export.
- Search Console confirms impressions are zero despite the page being indexed.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Poor site architecture: navigation and content hierarchy were designed without considering deep pages.
- Medium Content published without integration: editors create new pages but never link to them from existing posts or category pages.
- Low CMS migration or platform change: pages were imported but their contextual links in body content were not recreated.
Diagnostic Steps
Work through each question to identify the root cause.
- Run a full crawl of your site (Screaming Frog, Sitebulb, or Ahrefs). Do any URLs appear in your sitemap but show zero inbound internal links in the crawl?
- For each orphaned page you want to keep: does it belong to a logical topic cluster that already has a hub or category page on your site?
Fixes
Audit your site structure. Map every important page to a parent category or hub. Rebuild navigation so every tier-2 and tier-3 page has at least one link from a tier-1 page.
Implement a publishing checklist: before any post goes live, the author must identify and add at least two internal links from existing content to the new page and vice versa.
After any migration, run a before-and-after crawl comparison. Re-create missing body-content links manually. Do not rely solely on automated redirect mapping.
AI Context
Google (Googlebot / Search Console)
Googlebot discovers pages by following links. An orphaned page without inbound links may eventually be found via a sitemap submission, but it will receive minimal crawl priority and will struggle to accumulate PageRank. Google treats link equity as a trust signal; pages with no internal PageRank flow are implicitly low-value.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs and retrieval-augmented generation (RAG) systems surface content based on what the broader web links to and indexes. An orphaned page that never earned links or citations is statistically unlikely to appear in training data or in retrieval results. Fixing orphan pages improves both traditional SEO and the odds of being cited by AI-generated answers.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A large e-commerce site completed a category reorganisation and orphaned 600 product pages. Six months later those pages had zero impressions. Re-linking them from the new category pages recovered 80% of their prior traffic within 10 weeks.
- A B2B SaaS blog had 120 legacy "how-to" articles published before the current topic-cluster strategy was in place. None were linked from the new pillar pages. An internal linking audit and update lifted the entire cluster's average position by 4 places.
Overuse of Noindex Tags
Excessive or incorrectly applied noindex directives remove valuable pages from search results, causing significant and often silent traffic losses.
Overuse of Noindex Tags
Excessive or incorrectly applied noindex directives remove valuable pages from search results, causing significant and often silent traffic losses.
Excessive or incorrectly applied noindex directives remove valuable pages from search results, causing significant and often silent traffic losses.
Reality Check
Noindex is not a blunt instrument — apply it liberally and you might as well delete your site from Google. The insidious thing about over-use of noindex is that it produces no error in Search Console. The pages are simply absent. Sites discover the problem months later wondering why their content has vanished from search results with no algorithmic update to blame.
Symptoms
- Significant unexplained drop in organic traffic with no corresponding algorithmic update or ranking change.
- Google Search Console Coverage report shows large numbers of URLs under "Excluded — noindex tag".
- Key commercial pages, blog posts, or category pages are absent from search results.
- Site search or audit tools reveal noindex tags on pages that should be indexable.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Site-wide noindex applied by mistake: a global robots meta tag or X-Robots-Tag HTTP header set during development was never removed for production.
- Medium Noindex applied incorrectly to important content types: a CMS plugin setting, theme option, or template misconfiguration adds noindex to categories, tags, product pages, or blog posts en masse.
- Low Conflicting noindex and canonical directives: applying both noindex and a canonical tag creates confusion — the noindex is honoured, meaning the content is excluded from the index, while the canonical points elsewhere unnecessarily.
Diagnostic Steps
Work through each question to identify the root cause.
- Check your most important pages for noindex tags using View Page Source or a browser extension like SEO Meta in 1 Click. Are noindex directives present on pages you want indexed?
- Is the noindex tag being injected by a plugin, CMS setting, or global template rather than page-specific configuration?
Fixes
Search your codebase and server configuration for any global noindex directives. Check robots meta tags in layout templates, HTTP header configurations, and .htaccess or nginx configuration files. If running WordPress, verify Settings > Reading > "Discourage search engines" is unchecked. Deploy the fix and use Google Search Console's URL Inspection to verify pages are now indexable.
Audit your SEO plugin configuration for each content type (posts, pages, categories, tags, custom post types, archives). Enable indexing for all content types with SEO value. Set noindex only for: thank-you pages, account/login pages, internal search results, duplicate parameter URLs, and admin-facing content.
Do not use noindex and canonical together on the same page. If you want a URL excluded from the index, use noindex and remove the canonical. If you want link equity consolidated to another URL, use canonical alone and remove noindex. Mixing both sends contradictory signals and the noindex wins, meaning the content is excluded.
AI Context
Google (Googlebot / Search Console)
Google honours noindex directives but continues crawling noindexed pages to check if the directive has been removed. A page with noindex is excluded from search results but still consumes crawl budget. Noindex does not pass PageRank — internal links to noindexed pages lead to dead ends in the link equity flow. Widespread incorrect noindex is particularly damaging for large sites where the loss may take months to identify.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Noindex is irrelevant to how LLMs generate content, but pages that are noindexed and therefore not in Google's index are invisible to retrieval-augmented AI systems. Noindexed content also tends to be excluded from major web crawls used for training data, meaning the content does not contribute to how AI models represent a domain's expertise.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A developer set noindex on a new e-commerce site during staging and deployed without removing it. The site launched and received no organic traffic for 11 weeks. The issue was discovered only when the SEO team noticed zero impressions in Search Console despite a marketing campaign driving paid traffic.
- A WordPress blog using Yoast SEO had "noindex" enabled for the Posts category in the plugin's Search Appearance settings. Over 400 blog posts were effectively invisible to Google. Enabling indexing for the Posts type and requesting a recrawl recovered 65% of the lost traffic within six weeks.
Page Speed Issues Not Resolving
Core Web Vitals scores remain poor despite optimisation attempts, or PageSpeed Insights scores improve but Search Console field data does not.
Page Speed Issues Not Resolving
Core Web Vitals scores remain poor despite optimisation attempts, or PageSpeed Insights scores improve but Search Console field data does not.
Core Web Vitals scores remain poor despite optimisation attempts, or PageSpeed Insights scores improve but Search Console field data does not.
Reality Check
Google uses field data (real user measurements from Chrome) for ranking, not lab data (Lighthouse scores). A page can score 95 in PageSpeed Insights and still have poor Core Web Vitals in Search Console if real users on mobile connections experience it differently.
Symptoms
- PageSpeed Insights lab score is high but Search Console field data remains poor.
- LCP, INP, or CLS scores are not improving after implementing recommended fixes.
- Mobile CWV scores are significantly worse than desktop scores.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Lab vs. Field Data Mismatch: Optimisations improved lab scores but real users on slower connections still experience poor performance.
- high probability Third-Party Scripts: Analytics, advertising, or chat scripts are blocking the main thread and causing INP failures.
- high probability Unoptimised Images: Images are not compressed, not in modern formats (WebP/AVIF), or not lazy-loaded.
- medium probability Render-Blocking Resources: CSS or JavaScript files are blocking the browser from rendering the page.
- medium probability Slow Server Response: Time to First Byte (TTFB) is high, delaying all subsequent loading metrics.
Diagnostic Steps
Work through each question to identify the root cause.
- Which metric is failing — LCP, INP, or CLS?
- Is the failing metric worse in field data (Search Console) than in lab data (PageSpeed Insights)?
- Are there third-party scripts loading on the page?
- What is the Time to First Byte (TTFB)?
AI Context
Google (Googlebot / Search Console)
Core Web Vitals are measured using real Chrome user data. A page that is fast in a lab environment but slow for real users will be evaluated on the real-user data.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Page speed is not directly visible to LLMs, but slow pages that frustrate users generate negative engagement signals that reduce the probability of future citation.
At a Glance
Rendering Issues and JavaScript SEO
Search engines fail to fully render JavaScript-dependent content, leaving key page text, links, or metadata invisible to crawlers.
Rendering Issues and JavaScript SEO
Search engines fail to fully render JavaScript-dependent content, leaving key page text, links, or metadata invisible to crawlers.
Search engines fail to fully render JavaScript-dependent content, leaving key page text, links, or metadata invisible to crawlers.
Reality Check
Googlebot can execute JavaScript, but it does so in a second wave of rendering that can lag days or weeks behind the initial crawl. Content that only exists in the DOM after JS execution is invisible until that second wave completes — if it completes at all. For sites where revenue depends on fast indexing (news, e-commerce launches, time-sensitive content), client-side rendering is an existential SEO risk.
Symptoms
- Google's URL Inspection tool shows the rendered HTML is missing content that appears in your browser.
- Pages are indexed but rankings are poor despite rich, relevant content that is JS-rendered.
- Internal links rendered by JavaScript are not followed by Googlebot — discovered URLs are lower than expected.
- Search Console reports "Crawled - currently not indexed" for JS-heavy pages despite them being technically accessible.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Heavy client-side rendering (CSR): the full page content is generated by JavaScript in the browser, leaving the initial HTML response nearly empty.
- Medium Incorrect lazy loading: images or content blocks using IntersectionObserver thresholds set outside Googlebot's viewport simulation.
- Low JS resources blocked in robots.txt: Googlebot cannot fetch the JavaScript files needed to render the page.
Diagnostic Steps
Work through each question to identify the root cause.
- Use Google Search Console's URL Inspection tool to fetch and render a representative page. Does the rendered HTML shown in the tool contain the same body content visible in your browser?
- Check your robots.txt. Are any JS file paths (e.g. /_next/static/, /static/js/) blocked with Disallow rules?
Fixes
Migrate critical pages to server-side rendering (SSR) or static site generation (SSG). Frameworks: Next.js (React), Nuxt (Vue), SvelteKit. The server should return a fully populated HTML response without requiring JavaScript execution for core content.
Set IntersectionObserver rootMargin to "200px" to trigger loading before content enters the viewport. For images, add loading="lazy" only to below-fold images. Test the rendered output via URL Inspection after changes.
Remove Disallow rules for JS, CSS, and image directories. Google needs to access these to render modern pages. Test using Fetch as Google in Search Console and compare the rendered output to the browser view.
AI Context
Google (Googlebot / Search Console)
Googlebot uses a headless Chromium instance to render JavaScript, but rendering is deferred to a second crawl wave due to compute constraints. The delay can be days to weeks. During this window, JS-dependent content is invisible to ranking systems. Google has repeatedly advised developers to use SSR or pre-rendering for SEO-critical content.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI crawlers used to build training datasets often lack full JavaScript rendering capability. Content that requires JS execution may be entirely absent from LLM training data. Sites adopting client-side rendering for content — not just UI enhancement — risk being underrepresented in AI model knowledge and retrieval indices.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A React SPA built without SSR had its product pages indexed with only the navigation shell — body content was absent from GSC's rendered HTML view. After migrating to Next.js with SSR, product pages went from "Crawled - currently not indexed" to fully indexed within 3 weeks.
- A headless e-commerce build had product descriptions loaded via a GraphQL call in the browser. The rendered output in URL Inspection showed only the page scaffold. Adding a server-side product data pre-fetch that populated the HTML before delivery resolved the issue.
Site Migration Caused Ranking Drop
Organic rankings dropped significantly following a website migration — a change of domain, URL structure, CMS, or protocol.
Site Migration Caused Ranking Drop
Organic rankings dropped significantly following a website migration — a change of domain, URL structure, CMS, or protocol.
Organic rankings dropped significantly following a website migration — a change of domain, URL structure, CMS, or protocol.
Reality Check
Some ranking fluctuation is normal and expected during a migration. A drop of 10-20% in the first 2-4 weeks is not necessarily a sign of failure. A sustained drop beyond 8 weeks, or a drop exceeding 30%, indicates a specific technical problem that needs diagnosis.
Symptoms
- Pages that previously ranked are no longer appearing in search results.
- Search Console shows a spike in 404 errors or redirect errors after migration.
- New URLs are not being indexed despite being live.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Incomplete Redirect Mapping: Old URLs are returning 404 errors instead of redirecting to their new equivalents.
- high probability Redirect Chains: Old URLs redirect to intermediate URLs which redirect to new URLs — PageRank is lost at each hop.
- medium probability New Site Not Indexed: The new site was launched with noindex tags or robots.txt blocks that were not removed.
- medium probability Internal Links Not Updated: Internal links still point to old URLs, creating redirect chains throughout the site.
- low probability Change of Address Tool Not Used: For domain migrations, the Search Console Change of Address tool was not submitted.
Diagnostic Steps
Work through each question to identify the root cause.
- Are old URLs returning 404 errors or 301 redirects?
- Are the 301 redirects going directly to the new URL, or through intermediate redirects?
- Are the new URLs indexed in Google Search Console?
- Was the Change of Address tool submitted in Search Console (for domain migrations)?
AI Context
Google (Googlebot / Search Console)
A migration is a signal that the site's identity has changed. Google needs to re-crawl, re-index, and re-evaluate the new URLs. This takes time. Broken redirects are a direct signal of poor technical execution.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
AI systems that have cached your old URLs will continue to reference them until they are updated. Correct redirects ensure that users following AI-cited links reach the correct destination.
At a Glance
Slow Server Response Times
Lagging server response delays page load for both users and Googlebot, directly damaging Core Web Vitals scores and reducing crawl efficiency.
Slow Server Response Times
Lagging server response delays page load for both users and Googlebot, directly damaging Core Web Vitals scores and reducing crawl efficiency.
Lagging server response delays page load for both users and Googlebot, directly damaging Core Web Vitals scores and reducing crawl efficiency.
Reality Check
No amount of on-page SEO compensates for a sluggish server — Google hates waiting as much as your users do. Time To First Byte (TTFB) is the first metric in the loading chain. A slow TTFB delays every subsequent asset load: CSS, JavaScript, images. Google's own guidance treats TTFB above 800ms as poor. Sites consistently above this threshold face structural disadvantages in Core Web Vitals assessments.
Symptoms
- Time To First Byte (TTFB) consistently above 600ms when measured via Chrome DevTools or WebPageTest.
- Poor Largest Contentful Paint (LCP) scores in Core Web Vitals report despite optimised images and assets.
- Google Search Console crawl stats show Googlebot spending a disproportionate amount of time downloading responses rather than discovering new URLs.
- Elevated bounce rates on entry pages that correlate with traffic spikes, suggesting the server struggles under load.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Underpowered or shared hosting: the server lacks CPU, memory, or network resources to handle concurrent requests efficiently.
- Medium Unoptimised backend processes: slow database queries, missing query caching, or inefficient application code adds latency before the first byte is sent.
- Low Geographic distance between server and users: a server located far from the primary audience adds network round-trip latency that no amount of code optimisation can fully overcome.
Diagnostic Steps
Work through each question to identify the root cause.
- Measure TTFB using WebPageTest (webpagetest.org) from a location matching your primary audience. Is TTFB consistently above 600ms on repeated tests?
- Does TTFB improve significantly when tested from a server geographically close to your hosting location versus a distant location?
Fixes
Upgrade to a VPS, dedicated server, or cloud hosting tier with guaranteed CPU and memory resources. For high-traffic sites, evaluate auto-scaling infrastructure. Benchmark TTFB before and after migration to confirm improvement.
Profile your application stack to identify slow queries. Add database indexes on frequently queried columns. Implement server-side caching (Redis, Memcached) for expensive operations. Enable opcode caching for PHP applications. Cache rendered HTML for pages that change infrequently.
Deploy a CDN (Cloudflare, Fastly, AWS CloudFront) to serve cached assets and, where possible, edge-cached HTML from nodes close to your audience. For dynamic content that cannot be cached, consider a hosting provider with data centres in your primary market.
AI Context
Google (Googlebot / Search Console)
Googlebot has a crawl budget and a patience threshold. Slow server responses reduce the number of pages Googlebot crawls per day and increase the time between recrawls. Google's Page Experience signals include Core Web Vitals, which are directly affected by TTFB. Pages with consistently poor loading performance face ranking disadvantages compared to competitors with equivalent content and faster servers.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
LLMs rely on indexed content. Slow servers reduce Googlebot's crawl frequency, meaning content updates take longer to appear in the index and therefore longer to influence AI systems that retrieve content from Google's index. For real-time retrieval-augmented systems, slow servers increase latency in the retrieval pipeline.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A news portal running on shared hosting saw TTFB spike to 2.4 seconds during breaking news traffic surges. Googlebot crawl stats showed a 60% drop in pages crawled per day during peak periods. Migrating to a cloud provider with auto-scaling restored TTFB to under 400ms and crawl frequency returned to baseline within two weeks.
- A small business site on a budget shared hosting plan suffered a persistent 1.8-second TTFB for all pages. The root cause was a misconfigured WordPress database generating full table scans on every page load. Adding query caching and fixing the database configuration reduced TTFB to 320ms without changing hosting.
Soft 404s
Pages that display a "not found" or empty message but return an HTTP 200 OK status code, misleading search engines into treating them as valid content.
Soft 404s
Pages that display a "not found" or empty message but return an HTTP 200 OK status code, misleading search engines into treating them as valid content.
Pages that display a "not found" or empty message but return an HTTP 200 OK status code, misleading search engines into treating them as valid content.
Reality Check
A soft 404 is not a crawl error — it is a lie. The server tells Google "everything is fine here" while the user sees an empty or useless page. Google has to spend compute time detecting this discrepancy. Sites with many soft 404s train Google to trust them less and crawl them less efficiently.
Symptoms
- Google Search Console shows "Page with redirect" or "Soft 404" in the Coverage or Indexing report.
- Pages return HTTP 200 but display messages like "No results found", "This product is no longer available", or "Coming soon".
- Crawl tools report the page as healthy (200) but content analysis reveals minimal or no body text.
- Rankings for affected URLs drop steadily despite no change in on-page content.
Likely Causes
Ranked by probability. Highest probability cause first.
- High Custom error pages returning 200: the server or CMS is configured to show a styled "not found" page but does not set the correct HTTP status code.
- Medium Empty or placeholder pages with generic messaging: product, event, or job pages left live after the content is removed.
- Low Redirect chains terminating on non-existent content: a 302 chain that ends at a page with no content still registered as 200.
Diagnostic Steps
Work through each question to identify the root cause.
- Check the HTTP status of the suspect URL using a tool like httpstatus.io or the browser's developer tools Network tab. Does it return 200 even though the page shows an error or empty state?
- Is the content genuinely gone (deleted product, expired event, removed article) with no plans to replace it?
Fixes
Update your server configuration, CMS template, or application code to return the correct HTTP status code for error states. For "not found" content, return 404. For permanently removed content, return 410.
Audit all pages with very low word counts (under 100 words). For discontinued products, redirect to the category page. For expired events, archive with a 410. Never leave a live URL serving an empty template.
Trace redirect chains to their final destination. If the destination is empty, fix the destination first (add content or return proper status), then shorten the redirect chain to a single hop.
AI Context
Google (Googlebot / Search Console)
Google's quality systems actively detect soft 404s using content analysis heuristics. Pages identified as soft 404s are removed from the index or demoted. Soft 404s also waste crawl budget because Googlebot must request the page, parse the HTML, and run quality checks before concluding the page is empty.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Soft 404 pages that Google de-indexes are effectively invisible to retrieval-augmented AI systems. Pages that slip through with low-quality "not found" content may be included in training data as low-signal noise, reducing the authority of the domain overall.
At a Glance
Related Problems
Case Patterns
Real-world scenarios seen in practice.
- A news publisher removed 4,000 articles but left the URLs live returning 200 with a "story not available" template. Google flagged the entire domain as low-quality. After returning 410 for removed articles and 301-redirecting relevant ones, crawl budget recovered within 6 weeks.
- An e-commerce site's discontinued product pages showed "currently unavailable" while returning 200. 8,000 URLs were being crawled regularly with zero ranking value. Implementing proper 404s and redirecting pages with 10+ backlinks recovered 12% of previously lost organic revenue.
Structured Data Not Triggering Rich Results
Structured data (Schema.org markup) has been implemented on a page but the expected rich results — star ratings, FAQs, product prices, breadcrumbs — are not appearing in search results.
Structured Data Not Triggering Rich Results
Structured data (Schema.org markup) has been implemented on a page but the expected rich results — star ratings, FAQs, product prices, breadcrumbs — are not appearing in search results.
Structured data (Schema.org markup) has been implemented on a page but the expected rich results — star ratings, FAQs, product prices, breadcrumbs — are not appearing in search results.
Reality Check
Structured data is a hint to Google, not a directive. Implementing valid markup does not guarantee rich results. Google applies additional quality checks before displaying rich results, and eligibility can be lost for policy violations even when the markup is technically correct.
Symptoms
- Rich results appeared previously but have stopped appearing.
- Search Console's Rich Results report shows errors or warnings.
- Structured data is implemented but the Rich Results Test shows it is not eligible.
Likely Causes
Ranked by probability. Highest probability cause first.
- high probability Markup Doesn't Match Page Content: The structured data describes content that is not visible on the page — a spam violation.
- high probability Rich Result Policy Violation: The page violates Google's rich result policies (e.g., fake reviews, misleading ratings).
- medium probability Markup Errors: Required properties are missing or incorrectly formatted.
- medium probability Page Quality Issues: Google does not consider the page high enough quality to display rich results.
- low probability Incorrect Structured Data Type: Using a Schema.org type that does not have a corresponding Google rich result.
Diagnostic Steps
Work through each question to identify the root cause.
- Does the Rich Results Test show any errors or warnings?
- Does the structured data accurately describe the visible content on the page?
- Does Search Console's Rich Results report show the page as eligible?
AI Context
Google (Googlebot / Search Console)
Structured data helps Google understand the content of a page. Even when it does not trigger visible rich results, it contributes to Google's entity understanding of the page.
LLMs (ChatGPT, Claude, Perplexity, AI Overviews)
Structured data is machine-readable metadata that AI systems can use to understand and extract information from pages. Correct implementation improves AI citation probability regardless of whether rich results appear.
At a Glance
No issues match your filters.