Technical SEO Reference Architecture for D2C

Technical SEO stopped being a checklist in 2024. With Google's Helpful Content System, Search Generative Experience, ChatGPT Search, Perplexity, and Claude all picking different signals to index, the architecture under the website now determines visibility more than the keyword strategy on top of it.

This piece is the technical SEO reference architecture we apply on D2C engagements above Rs 25 crore ARR. It covers Core Web Vitals targets in 2026, schema markup that drives rich results, faceted-navigation crawl-budget control, and the architecture that gets cited by AI search engines, not just Google.

What technical SEO actually means in 2026

Three layers stacked on top of each other:

Layer	What lives here	Who owns it
Discovery	Sitemap.xml, robots.txt, crawl-budget control, canonical handling	Engineering + SEO
Rendering	Server-side render, hydration strategy, Core Web Vitals	Engineering
Indexability	Schema markup, internal linking, content quality signals	SEO + content

Each layer breaks differently. A great content strategy fails when the rendering layer tanks Largest Contentful Paint. A clean rendering setup wastes itself when faceted navigation drains crawl budget into infinite filter combinations.

Why the AI-search shift changed the rules

Pre-2024, Google's crawler was the only audience that mattered for organic visibility. Post-2024:

Google's own SGE / AI Overviews pull from your content to answer queries directly. Content gets cited; clicks shrink; visibility shifts from rankings to mentions.
ChatGPT Search + Perplexity + Claude crawl independently. Each has its own signals; structured data + schema-rich content gets cited more.
Bing's Copilot uses Bing's index but layers LLM-based selection on top.

The technical SEO architecture now serves multiple audiences with overlapping but distinct preferences. Single-crawler optimisation is over.

Core Web Vitals: the 2026 thresholds

Google's Core Web Vitals targets tightened in 2024-2025. The current pass thresholds for mobile (the primary audience for most Indian D2C):

Metric	Good	Needs Improvement	Poor
Largest Contentful Paint (LCP)	<= 2.5s	<= 4.0s	> 4.0s
Interaction to Next Paint (INP)	<= 200ms	<= 500ms	> 500ms
Cumulative Layout Shift (CLS)	<= 0.1	<= 0.25	> 0.25

The bigger 2024 change: INP replaced FID as the third Core Web Vital. INP measures EVERY interaction (not just the first), so a slow second-tap-to-respond now hurts your score where FID would have ignored it.

What moves the needle on each metric

LCP is dominated by the hero image. The fixes:

Serve the LCP image in next-gen format (AVIF or WebP)
Pre-load the LCP image via <link rel="preload">
Use fetchpriority="high" on the LCP image element
Self-host LCP images via Cloudflare Images or Vercel Image Optimisation rather than third-party CDNs

INP is dominated by long JavaScript tasks during user interaction. The fixes:

Break large client-side handlers into smaller chunks
Defer non-critical JavaScript until after first interaction
Reduce React re-renders via memoisation + state-slice discipline
Audit third-party scripts (Klaviyo modal, Hotjar, Mouseflow, FullStory). each blocks INP

CLS is dominated by layout shift from late-loading content. The fixes:

Reserve space for images via explicit width + height attributes
Reserve space for embedded video / iframe
Avoid inserting content above existing content (e.g., cookie banners appearing AFTER fold paint)
Use CSS font-display: swap with size-adjusted fallback fonts

For Next.js + Vercel D2C stacks, hitting Good across all three is realistic. The shop should aim for Good at p75, not just at median.

Schema markup: the layer most stores skip

Schema markup (JSON-LD) is structured data that tells crawlers + LLMs the SHAPE of your content. Without it, the crawler infers; with it, the crawler knows.

The schema types every D2C store should ship

Schema type	Where it lives	What it enables
Organization	Site-wide footer / global script	Knowledge panel eligibility, brand entity recognition
Product	Every product page	Rich results in search, price + availability + review snippets
BreadcrumbList	Every non-home page	Breadcrumb display in search results
FAQPage	FAQ pages + relevant blog posts	FAQ rich results, People Also Ask coverage
Article	Every blog post + news post	Article rich results, top-stories eligibility
Review + AggregateRating	Product pages with reviews	Star ratings in search results
WebSite (with SearchAction)	Site-wide	Site-link search box in Google results

Schema patterns that get over-engineered

Stacked schema. putting Product + ItemList + BreadcrumbList + Review + Organization all in one page in a tangled @graph structure. Cleaner to ship separate <script type="application/ld+json"> blocks per schema type.
Mismatched data. schema says price Rs 1,499; visible page says Rs 1,299. Google catches this and ignores the entire schema block. The fix: schema generation must source from the same data the page uses, not separate static files.
Reviews schema without reviews. declaring AggregateRating on a product with no actual reviews on the page. This is a manual action waiting to happen.

What changed for AI search

Beyond Google rich results, schema markup increasingly determines whether your content gets CITED by ChatGPT / Perplexity / Claude when users ask product or comparison questions. The mechanism is opaque, but well-marked structured data correlates with citation frequency.

The most useful schema for AI-search citation is Article + FAQPage: clear authorship + clear question-answer structure. Add author with a Person URL and publisher with an Organization URL.

Faceted navigation: the crawl budget killer

Shopify, Magento, and Salesforce Commerce Cloud all generate faceted navigation URLs (/collections/skincare?filter.color=blue&filter.size=medium&sort_by=price-ascending). Without controls, the crawler discovers + indexes thousands of URL combinations that all show essentially the same content.

The damage:

Crawl budget gets consumed by low-value filter URLs
High-value URLs (product pages, category pages, blog posts) get crawled less often
Duplicate / near-duplicate content signals confuse ranking algorithms

The five controls every D2C should ship

robots.txt disallow on filter-only URL patterns. Block crawlers from ?filter.* patterns at the source.
rel="canonical" pointing to the clean category URL. Tell crawlers that the faceted version is a variant of the main category page.
noindex meta tag on any faceted URL Google might still discover via internal linking.
hreflang declarations for region-specific category pages (en-IN, en-AE, en-GB).
Sitemap.xml curation. submit ONLY the URLs you want indexed. Do not submit every filter combination.

For headless setups (Next.js + Shopify Storefront API), all five controls require deliberate engineering. They are not free.

Internal linking: the most under-used signal

Internal links from one page to another tell crawlers about page-importance hierarchy + content topical clustering. Most D2C sites under-do internal linking by an order of magnitude.

Patterns that work

Hub-and-spoke topic clusters. A pillar page (e.g., "skincare routines for Indian climates") links out to 8-15 spoke pages (specific routines, ingredient deep-dives, regional considerations). Each spoke links back to the pillar.
Related products from category + product pages. Standard ecommerce affordance; under-used on blog + content pages.
Contextual links in body content. Every blog post should link to 3-5 related Dcrayons URLs (this very blog has internal links to glossary, related blogs, and a contact CTA).
Footer / mega-menu links. Cross-cutting navigation; carries less weight per link but high reach.

Patterns to avoid

Sitewide same-anchor links to one URL. Anchor diversity signals topical breadth; a single repeated anchor signals manipulation.
Orphan pages. A page no other page links to is hard for crawlers to discover + for users to navigate to. Either link it or retire it.

The AI-search visibility playbook

Getting cited by ChatGPT, Perplexity, and Claude requires different signals than ranking on Google. Three patterns we see working:

Comprehensive coverage of a specific question

The query "how do I migrate from Recharge to Stay AI" sees citations to pages that comprehensively answer THAT specific question. Generic "subscription platforms overview" pages do not get cited. The deeper + more specific the page, the more citation-worthy.

Our Recharge to Stay AI migration playbook is structured this way: every section answers a sub-question. AI search engines extract those answers + cite the source.

Schema-rich Article + FAQ structure

Pages that ship Article schema with clear authorship + FAQPage schema for the question blocks get cited more often than pages without. Whether causation or correlation, the cost of adding schema is low; the upside is real.

Author + publisher entity signals

Pages whose author field links to a real Person entity (with credentials + bio + social profiles) carry more authority signal than pages with no author or "admin" as author. The same applies to publisher linking to a real Organization with sameAs references to LinkedIn, Wikipedia, Crunchbase.

For Indian D2C brands, the simplest version: every blog post has a real human byline with a Person schema entity, the company has an Organization entity, both link to each other + to real external profiles (LinkedIn at minimum).

Production checklist

For a D2C technical SEO programme at the Rs 25 crore+ ARR scale:

Core Web Vitals at Good for p75 mobile across LCP + INP + CLS
Schema markup: Organization + Product + BreadcrumbList + Article + FAQPage + Review where applicable
Schema data sourced from the same source as visible content (no mismatch)
Faceted navigation locked: robots.txt + canonical + noindex + sitemap discipline
Sitemap.xml curated: only indexable URLs submitted
Internal linking: every blog has 3-5 related-content links; topic clusters defined
hreflang on multi-region category pages (en-IN, en-AE, en-GB)
Author + publisher schema entities established + cross-linked to real external profiles
Crawl-budget monitoring via Search Console (Crawl Stats + Index Coverage)
AI-search visibility tracking: monitor citations on ChatGPT Search, Perplexity, Claude monthly
Quarterly technical-SEO audit covering all of the above
CI gate enforcing Lighthouse 90+ on every PR that touches the storefront

References + linked context

Dcrayons glossary: edge-runtime, isr, content-modelling
Dcrayons CMS architectures: Enterprise Vercel reference architecture, Enterprise Contentful + Next.js
Mid-market practitioner guide: How to Pick a Headless CMS in 2026

Technical SEO in 2026 is the foundation under everything else. If your D2C storefront is fighting a Core Web Vitals plateau, faceted-navigation crawl drain, or AI-search visibility gap, reach out via the contact form for a 30-minute review of your current setup.

Tagstechnical-seocore-web-vitalsschema-markupd2cai-searchenterpriseblog

Technical SEO Architecture for D2C in 2026: Core Web Vitals, Schema, and the Crawl Budget Discipline