Skip to main content

Production Claude on AWS Mumbai for Enterprise Data Residency: Reference Architecture, ZDR Boundaries, and the Audit-Trail Pattern We Run

May 31, 2026 | 10 min read

Anjali (Technical Content Writer), reviewed by Pallavi (Senior Content Strategist)

Anjali (Technical Content Writer), reviewed by Pallavi (Senior Content Strategist)

Content Writer at Dcrayons

Production Claude on AWS Mumbai for Enterprise Data Residency: Reference Architecture, ZDR Boundaries, and the Audit-Trail Pattern We Run

Context: why Indian and GCC enterprises hit this wall

We see the same pattern in every enterprise Claude evaluation that crosses our desk in 2026. The product team has built an impressive prototype. The CISO and the compliance lead read the proposal and ask three questions. Where does the inference run? What logs do we keep? What happens to the prompts and completions after the response is returned?

If those questions do not have crisp, documented answers backed by deployment evidence, the project does not ship. The Indian Digital Personal Data Protection Act (DPDP) makes data-residency a board-level concern. SOC 2, ISO 27001, and the GCC equivalents (UAE PDPL, Saudi PDPL) raise the bar on audit trail. And the engineering team, regardless of the regulatory landscape, has a real interest in knowing what their AI-augmented features are doing in production.

This piece is the reference architecture Dcrayons deploys for enterprise Claude in India and the GCC. It covers four things: where the model runs (data-residency routing), what Anthropic does and does not retain (the Zero Data Retention feature boundary), how we keep an immutable audit trail without exfiltrating prompts unnecessarily, and the governance baseline we apply before any model change ships. None of this is theoretical. It is the pattern we run on Dcrayons-AI engagements today.

Architecture: the deployment topology

The unit of deployment is a stateless API service that wraps Claude calls, fronted by an enterprise authentication layer and logged to a write-once audit store. Three deployment shapes cover the bulk of what we see in production:

Shape A: direct Anthropic API + AWS Mumbai compute. The client app calls our API service on AWS Mumbai (ap-south-1), which forwards to Anthropic's Claude API. The Claude API in turn supports the inference_geo parameter, which lets us specify "global" (default routing) or "us" (US-only inference). For India-bound customer data we configure the model call with the appropriate residency token, log the residency choice in our audit trail, and Anthropic routes inference accordingly. This shape is the simplest and the most common; it gives data-residency awareness without a separate cloud hop.

Shape B: Claude on Amazon Bedrock + AWS Mumbai compute. For enterprises where the procurement, billing, and contractual relationship must all be with AWS rather than Anthropic directly, we deploy Claude via Bedrock in the AWS region of choice. Bedrock supports Claude (Opus 4.7, Sonnet 4.6, Haiku 4.5 family) and gives the customer AWS-native billing, IAM integration, and the AWS-operated regional inference. The trade-off is that not every Claude feature ships on Bedrock at the same time as the Anthropic-first API, so the feature matrix needs to be checked per use case at procurement time.

Shape C: Claude on Microsoft Foundry or Vertex AI for cross-cloud customers. Less common in our Indian work but real for enterprises with Microsoft or Google cloud as their primary. the model is the same Claude, the integration plane is the cloud provider's. We support it but recommend AWS-Mumbai-direct for the latency and operational simplicity wins in the Indian context.

In all three shapes the wrapper API service runs on AWS Mumbai (typically EC2 m6i or Lightsail medium-bundle behind an ALB for high availability, with auto-scaling for predictable load shapes). The wrapper authenticates the calling application via OAuth2 client-credentials, enforces per-tenant rate limits, attaches the residency parameter, logs the call to an audit store, and forwards to Anthropic or Bedrock.

ZDR boundaries: what Anthropic does and does not retain

Anthropic's Zero Data Retention (ZDR) commitment is the technical anchor for most enterprise governance conversations. Under ZDR-eligible features, Anthropic does not store the prompts or completions after the response is returned. This is documented per-feature in the Claude platform docs and matters because not every feature is ZDR-eligible.

A pragmatic enterprise feature-eligibility table for production-time decisions:

Feature ZDR eligible Notes for enterprise use
Standard Messages API (text, vision, PDF) Yes The default request shape. Safe for sensitive content.
Adaptive thinking + Extended thinking Yes Reasoning traces not retained beyond the response.
Prompt caching (5m + 1h) Yes Cache contents stored ephemerally per the cache window only.
Citations Yes Source-attribution mode preserves ZDR posture.
Structured outputs (JSON / strict tool) Qualified Prompts and outputs not stored; JSON schemas cached up to 24h since last use.
Web search + Web fetch Qualified ZDR-eligible except when dynamic filtering is enabled. Verify per use case.
Batch processing Not ZDR Designed for asynchronous bulk workloads where the cost saving is the point. Treat as non-ZDR for sensitive content.
Code execution sandbox Not ZDR Sandbox environment necessarily persists state during run. Treat as non-ZDR.
Files API Not ZDR Uploaded files persist for retrieval. Treat as non-ZDR.
MCP connector Not ZDR Connector calls to external MCP servers cross trust boundary. Treat as non-ZDR.
Agent Skills Not ZDR Skill execution involves persistent capability. Treat as non-ZDR.

The pattern we apply is: every enterprise tenant has a default-ZDR-only configuration on the wrapper API. Non-ZDR features are opt-in per integration after a written governance review. The opt-in is logged. This avoids the failure mode where a developer enables Files API for convenience and the next compliance audit surfaces that customer documents are being retained.

Audit-trail pattern: every prompt, every completion, immutable

The audit-trail discipline matters as much as the residency choice. The pattern we deploy:

1. Pre-call logging. Before the wrapper API calls Anthropic, it writes a record to the audit store: timestamp, tenant ID, user ID (or service ID for agent calls), model identifier, residency parameter, ZDR-eligibility flag, hashed prompt content (sha256), and prompt length in tokens. The full prompt is also stored, optionally PII-redacted if the integration is on the sensitive-content path.

2. Post-call logging. When the response returns, the wrapper writes a paired record: latency, completion length, token counts (input + output + cached), cost per Anthropic's pricing, success or error code, and the full completion content (again optionally PII-redacted).

3. Storage choice. The audit store is an append-only, encrypted-at-rest log. Two patterns work in production: - AWS S3 with object-lock enabled, encrypted with a customer-managed KMS key, with a 90-day-minimum retention policy. Cheap, durable, queryable via Athena. - A dedicated PostgreSQL table (in our reference architecture, sitting alongside the application database on the same RDS instance) with a write-only role for the wrapper API and a separate read-only role for compliance queries. Faster query path, more expensive at high volume.

We use S3 + object-lock for high-volume, compliance-led customers and the PostgreSQL pattern for moderate-volume, ops-led customers who want SQL queryability without the Athena cost.

4. PII redaction policy. For the customer-data integrations that involve known-PII fields (email, phone, address, customer ID), the wrapper API applies a pre-call redaction pass: regex-detected PII tokens are replaced with stable hashed identifiers in the audit-store record. The Anthropic API call still receives the original prompt (otherwise the model can't reason about the customer's actual question), but the audit log holds the redacted version. The hash is reversible only with a customer-held key, which means even our own engineering team cannot link an audit-log entry back to a specific customer without explicit access to the key.

This is the layer that consistently surprises enterprise architects. The standard "log everything" pattern leaks PII into the audit store, which then becomes its own compliance liability. The pattern we run resolves it.

Cost discipline: caching, batching, model-tier selection

Enterprise Claude bills get expensive fast if the deployment is naive. Three levers we apply on every Dcrayons engagement:

Prompt caching. Any prompt component that is stable across requests. system prompt, RAG-retrieved background, conversation history. is marked cacheable. Anthropic's 5-minute cache reads cost roughly 10 percent of the original input price; the 1-hour extended cache costs the same but holds longer for less frequently accessed contexts. On a typical Dcrayons enterprise agentic loop, prompt caching cuts effective input-token spend by 60-80 percent. We measure cache hit rate as a weekly KPI; sub-50 percent hit rate is treated as a prompt-architecture defect.

Batch processing for non-interactive workloads. Anthropic's Batch API runs async at 50 percent of standard pricing. Any workload that does not require immediate response. nightly document analysis, bulk classification, training-data generation, periodic report writing. routes through Batch. We have seen enterprise customers cut their Claude bill by 30-40 percent purely by reclassifying which workloads can tolerate batch latency.

Model-tier selection. Not every call needs Opus 4.7. The Dcrayons default is Sonnet 4.6 for the workhorse 70 percent of features, Opus 4.7 reserved for the high-stakes reasoning tasks (legal review, agentic loops, multi-document synthesis), and Haiku 4.5 for high-volume latency-sensitive surfaces (autocomplete, classification, chat acknowledgement). The Models API is the source of truth for what your account can actually call; we encode tier selection per integration in the wrapper API configuration rather than in application code, so model changes are governance-controlled.

Governance: change management, red-team gate, model-version pinning

A production Claude integration is a production dependency. We treat model and prompt changes the same way we treat database schema changes. through a documented change-management gate.

Model-version pinning. All Claude calls in production specify the exact model version (e.g., claude-opus-4-7-20260315 rather than claude-opus-latest). New model versions are evaluated in a staging environment against the enterprise customer's actual eval suite before any production deployment. We have seen non-trivial behaviour changes between minor version updates; pinning is the only safe default.

Eval-suite-led upgrades. Every Dcrayons enterprise engagement ships with a customer-specific eval suite (typically 50-200 evaluation examples covering the integration's use cases, scored against the customer's success criteria). When Anthropic releases a new model version, we run the eval suite against the new version, present the diff to the customer's product + compliance owners, and get a written go-ahead before the production switch.

Red-team gate. For any prompt-level change that could affect output behaviour at scale (system prompt edit, tool-list change, RAG retrieval-strategy change), a structured red-team review runs before deployment. The red-team checks for prompt-injection vulnerability, jailbreak surface, and brand-safety regressions using a Dcrayons-maintained library of adversarial inputs.

Audit-log compliance review. Every quarter (or whatever the customer's compliance cycle requires), the audit logs are reviewed for unexpected access patterns, PII leakage, and cost outliers. Findings flow back into the prompt-change and tooling-change pipeline.

Production checklist (the rollout sequence we run)

For an enterprise Claude integration on the AWS Mumbai + direct-API shape:

  1. Procurement: Anthropic enterprise contract signed with data-processing addendum specifying ZDR-eligible features only by default
  2. AWS environment: dedicated VPC, KMS customer-managed key, S3 audit bucket with object-lock, RDS PostgreSQL for application data
  3. Wrapper API: deployed (Lightsail or EC2), OAuth2 client-credentials issuance, per-tenant rate limits, per-tenant model-tier configuration, residency parameter enforcement
  4. Audit store wired: pre-call + post-call logging functions tested full-scope, PII redaction validated against a sensitive-content fixture
  5. Eval suite built: 50-200 evaluation examples scored against customer success criteria, runnable on demand
  6. Prompt-change pipeline: red-team review process documented, owner assigned per change type
  7. Cost monitoring: cache hit rate, daily cost, per-tenant cost breakdown, alerts on +20 percent week-over-week deviation
  8. Model-version pinning: exact model identifier in all production code, upgrade SOP documented
  9. Compliance handover: audit-log access pattern documented for the customer's compliance team, quarterly review cadence agreed
  10. Runbook: on-call escalation path, Anthropic incident-channel subscription, fallback model strategy if a model version regresses

References + linked context

  • Anthropic Claude platform docs: build-with-claude overview, ZDR feature eligibility, data residency
  • AWS Bedrock + Claude integration documentation
  • Dcrayons internal pattern: D:/Projects/dotnetcrm/docs/AI-CONTENT-WORKFLOW.md (the audit-trail pattern we use on our own production)
  • Dcrayons glossary: prompt caching, MCP connector, compaction, citations

If your enterprise Claude programme has hit the data-residency or audit-trail wall, this is the architecture we deploy. Reach out via the contact form for a 30-minute architecture review against your current setup.

Tagsanthropicawsai-emerging-techenterpriseindia-focusgcc-focusimplementationblog
Share

Related Articles

More insights from the Dcrayons desk.

Want to grow your digital presence?

Let's discuss how we can help your business.