Technical SEO stopped being a checklist in 2024. With Google's Helpful Content System, Search Generative Experience, ChatGPT Search, Perplexity, and Claude all picking different signals to index, the architecture under the website now determines visibility more than the keyword strategy on top of it.
This piece is the technical SEO reference architecture we apply on D2C engagements above Rs 25 crore ARR. It covers Core Web Vitals targets in 2026, schema markup that drives rich results, faceted-navigation crawl-budget control, and the architecture that gets cited by AI search engines, not just Google.
What technical SEO actually means in 2026
Three layers stacked on top of each other:
| Layer | What lives here | Who owns it |
|---|---|---|
| Discovery | Sitemap.xml, robots.txt, crawl-budget control, canonical handling | Engineering + SEO |
| Rendering | Server-side render, hydration strategy, Core Web Vitals | Engineering |
| Indexability | Schema markup, internal linking, content quality signals | SEO + content |
Each layer breaks differently. A great content strategy fails when the rendering layer tanks Largest Contentful Paint. A clean rendering setup wastes itself when faceted navigation drains crawl budget into infinite filter combinations.
Why the AI-search shift changed the rules
Pre-2024, Google's crawler was the only audience that mattered for organic visibility. Post-2024:
- Google's own SGE / AI Overviews pull from your content to answer queries directly. Content gets cited; clicks shrink; visibility shifts from rankings to mentions.
- ChatGPT Search + Perplexity + Claude crawl independently. Each has its own signals; structured data + schema-rich content gets cited more.
- Bing's Copilot uses Bing's index but layers LLM-based selection on top.
The technical SEO architecture now serves multiple audiences with overlapping but distinct preferences. Single-crawler optimisation is over.
Core Web Vitals: the 2026 thresholds
Google's Core Web Vitals targets tightened in 2024-2025. The current pass thresholds for mobile (the primary audience for most Indian D2C):
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| Largest Contentful Paint (LCP) | <= 2.5s | <= 4.0s | > 4.0s |
| Interaction to Next Paint (INP) | <= 200ms | <= 500ms | > 500ms |
| Cumulative Layout Shift (CLS) | <= 0.1 | <= 0.25 | > 0.25 |
The bigger 2024 change: INP replaced FID as the third Core Web Vital. INP measures EVERY interaction (not just the first), so a slow second-tap-to-respond now hurts your score where FID would have ignored it.
What moves the needle on each metric
LCP is dominated by the hero image. The fixes:
- Serve the LCP image in next-gen format (AVIF or WebP)
- Pre-load the LCP image via
<link rel="preload"> - Use
fetchpriority="high"on the LCP image element - Self-host LCP images via Cloudflare Images or Vercel Image Optimisation rather than third-party CDNs
INP is dominated by long JavaScript tasks during user interaction. The fixes:
- Break large client-side handlers into smaller chunks
- Defer non-critical JavaScript until after first interaction
- Reduce React re-renders via memoisation + state-slice discipline
- Audit third-party scripts (Klaviyo modal, Hotjar, Mouseflow, FullStory). each blocks INP
CLS is dominated by layout shift from late-loading content. The fixes:
- Reserve space for images via explicit width + height attributes
- Reserve space for embedded video / iframe
- Avoid inserting content above existing content (e.g., cookie banners appearing AFTER fold paint)
- Use CSS
font-display: swapwith size-adjusted fallback fonts
For Next.js + Vercel D2C stacks, hitting Good across all three is realistic. The shop should aim for Good at p75, not just at median.
Schema markup: the layer most stores skip
Schema markup (JSON-LD) is structured data that tells crawlers + LLMs the SHAPE of your content. Without it, the crawler infers; with it, the crawler knows.
The schema types every D2C store should ship
| Schema type | Where it lives | What it enables |
|---|---|---|
| Organization | Site-wide footer / global script | Knowledge panel eligibility, brand entity recognition |
| Product | Every product page | Rich results in search, price + availability + review snippets |
| BreadcrumbList | Every non-home page | Breadcrumb display in search results |
| FAQPage | FAQ pages + relevant blog posts | FAQ rich results, People Also Ask coverage |
| Article | Every blog post + news post | Article rich results, top-stories eligibility |
| Review + AggregateRating | Product pages with reviews | Star ratings in search results |
| WebSite (with SearchAction) | Site-wide | Site-link search box in Google results |
Schema patterns that get over-engineered
- Stacked schema. putting Product + ItemList + BreadcrumbList + Review + Organization all in one page in a tangled @graph structure. Cleaner to ship separate
<script type="application/ld+json">blocks per schema type. - Mismatched data. schema says price Rs 1,499; visible page says Rs 1,299. Google catches this and ignores the entire schema block. The fix: schema generation must source from the same data the page uses, not separate static files.
- Reviews schema without reviews. declaring AggregateRating on a product with no actual reviews on the page. This is a manual action waiting to happen.
What changed for AI search
Beyond Google rich results, schema markup increasingly determines whether your content gets CITED by ChatGPT / Perplexity / Claude when users ask product or comparison questions. The mechanism is opaque, but well-marked structured data correlates with citation frequency.
The most useful schema for AI-search citation is Article + FAQPage: clear authorship + clear question-answer structure. Add author with a Person URL and publisher with an Organization URL.
Faceted navigation: the crawl budget killer
Shopify, Magento, and Salesforce Commerce Cloud all generate faceted navigation URLs (/collections/skincare?filter.color=blue&filter.size=medium&sort_by=price-ascending). Without controls, the crawler discovers + indexes thousands of URL combinations that all show essentially the same content.
The damage:
- Crawl budget gets consumed by low-value filter URLs
- High-value URLs (product pages, category pages, blog posts) get crawled less often
- Duplicate / near-duplicate content signals confuse ranking algorithms
The five controls every D2C should ship
robots.txtdisallow on filter-only URL patterns. Block crawlers from?filter.*patterns at the source.rel="canonical"pointing to the clean category URL. Tell crawlers that the faceted version is a variant of the main category page.noindexmeta tag on any faceted URL Google might still discover via internal linking.hreflangdeclarations for region-specific category pages (en-IN, en-AE, en-GB).- Sitemap.xml curation. submit ONLY the URLs you want indexed. Do not submit every filter combination.
For headless setups (Next.js + Shopify Storefront API), all five controls require deliberate engineering. They are not free.
Internal linking: the most under-used signal
Internal links from one page to another tell crawlers about page-importance hierarchy + content topical clustering. Most D2C sites under-do internal linking by an order of magnitude.
Patterns that work
- Hub-and-spoke topic clusters. A pillar page (e.g., "skincare routines for Indian climates") links out to 8-15 spoke pages (specific routines, ingredient deep-dives, regional considerations). Each spoke links back to the pillar.
- Related products from category + product pages. Standard ecommerce affordance; under-used on blog + content pages.
- Contextual links in body content. Every blog post should link to 3-5 related Dcrayons URLs (this very blog has internal links to glossary, related blogs, and a contact CTA).
- Footer / mega-menu links. Cross-cutting navigation; carries less weight per link but high reach.
Patterns to avoid
- Sitewide same-anchor links to one URL. Anchor diversity signals topical breadth; a single repeated anchor signals manipulation.
- Orphan pages. A page no other page links to is hard for crawlers to discover + for users to navigate to. Either link it or retire it.
The AI-search visibility playbook
Getting cited by ChatGPT, Perplexity, and Claude requires different signals than ranking on Google. Three patterns we see working:
Comprehensive coverage of a specific question
The query "how do I migrate from Recharge to Stay AI" sees citations to pages that comprehensively answer THAT specific question. Generic "subscription platforms overview" pages do not get cited. The deeper + more specific the page, the more citation-worthy.
Our Recharge to Stay AI migration playbook is structured this way: every section answers a sub-question. AI search engines extract those answers + cite the source.
Schema-rich Article + FAQ structure
Pages that ship Article schema with clear authorship + FAQPage schema for the question blocks get cited more often than pages without. Whether causation or correlation, the cost of adding schema is low; the upside is real.
Author + publisher entity signals
Pages whose author field links to a real Person entity (with credentials + bio + social profiles) carry more authority signal than pages with no author or "admin" as author. The same applies to publisher linking to a real Organization with sameAs references to LinkedIn, Wikipedia, Crunchbase.
For Indian D2C brands, the simplest version: every blog post has a real human byline with a Person schema entity, the company has an Organization entity, both link to each other + to real external profiles (LinkedIn at minimum).
Production checklist
For a D2C technical SEO programme at the Rs 25 crore+ ARR scale:
- Core Web Vitals at Good for p75 mobile across LCP + INP + CLS
- Schema markup: Organization + Product + BreadcrumbList + Article + FAQPage + Review where applicable
- Schema data sourced from the same source as visible content (no mismatch)
- Faceted navigation locked: robots.txt + canonical + noindex + sitemap discipline
- Sitemap.xml curated: only indexable URLs submitted
- Internal linking: every blog has 3-5 related-content links; topic clusters defined
- hreflang on multi-region category pages (en-IN, en-AE, en-GB)
- Author + publisher schema entities established + cross-linked to real external profiles
- Crawl-budget monitoring via Search Console (Crawl Stats + Index Coverage)
- AI-search visibility tracking: monitor citations on ChatGPT Search, Perplexity, Claude monthly
- Quarterly technical-SEO audit covering all of the above
- CI gate enforcing Lighthouse 90+ on every PR that touches the storefront
References + linked context
- Dcrayons glossary: edge-runtime, isr, content-modelling
- Dcrayons CMS architectures: Enterprise Vercel reference architecture, Enterprise Contentful + Next.js
- Mid-market practitioner guide: How to Pick a Headless CMS in 2026
Technical SEO in 2026 is the foundation under everything else. If your D2C storefront is fighting a Core Web Vitals plateau, faceted-navigation crawl drain, or AI-search visibility gap, reach out via the contact form for a 30-minute review of your current setup.



