The Backend Should Not Be the Risk Surface

Most secure product architectures fail in a predictable way. They treat privacy as a feature of encryption, but not a property of the system boundary.

That is usually the wrong abstraction.

If the backend still stores plaintext evidence, plaintext phone numbers, or enough metadata to reconstruct who received what, the product is not meaningfully private under pressure. It may be encrypted in transit. It may even be encrypted at rest. But operationally, the server is still the place where sensitive data can be recovered, logged, subpoenaed, or mishandled.

So the design goal for Anchor's vault backend was narrower and more practical: keep the backend simple, keep it serverless, and make sure it never becomes the long-lived holder of the most sensitive data. In practice, that meant one Cloudflare Worker, ciphertext-only object storage, relational data only where relational data actually mattered, and per-incident alert state isolated in Durable Objects. It also meant being disciplined about what the server could do, and just as important, what it could not do.

One Worker was the architecture, not a convenience

The backend is a single TypeScript Cloudflare Worker, anchor-vault-api. That one service handles vault initialization and signed URLs for uploads, trusted-contact and incident-access records, push token registration, SMS opt-in ciphertext storage, and per-incident delivery orchestration through Durable Objects. There are no VMs, no idle compute layer, and no separate fleet for "media," "notifications," or "auth glue." State lives in four places: R2, KV, D1, and Durable Object storage.

That matters for more than cost. A single Worker forces architectural clarity. There is one deploy surface, one request path, and one place to reason about privacy boundaries. The common mistake in systems like this is premature service decomposition. Teams split storage, delivery, and access logic into separate services before they have nailed the data boundary. That tends to increase failure modes faster than it increases reliability.

Here, the tradeoff was straightforward: use Cloudflare's storage primitives intentionally enough that a single Worker can coordinate the system without ever becoming the place where plaintext accumulates.

R2 stores ciphertext, not "files"

R2 is the vault. But the important detail is what kind of vault it is.

The Worker does not proxy media bytes. Clients request signed PUT URLs for upload and signed GET URLs for playback. The object keys follow a simple path pattern such as u/{userId}/{incidentId}/chunk_{index}.enc, plus a manifest when needed. The client uploads ciphertext directly to R2 and later fetches ciphertext directly from R2. The Worker issues URLs and coordinates access, but it is not in the body path for chunk reads or writes.

That design does three useful things at once.

First, it keeps infra simple. You do not need a separate media tier just to move bytes between client and object storage.

Second, it keeps cost predictable. R2 is the durable store, and the Worker stays in the control plane.

Third, and most important, it keeps the backend out of the decryption path. Vault chunks are already encrypted client-side with a key derived from the 6-digit access code plus incidentId using PBKDF2-HMAC-SHA256. The server stores ciphertext and serves ciphertext. It never has the vault key.

This is not academic zero knowledge. It is operational boundary design. Under compulsion, the server can produce encrypted chunks and signed URLs. It cannot produce decrypted evidence because it was never given the key in the first place.

KV is for fast lookup, not identity leakage

KV handles the distributed lookup cases that do not need relational joins.

ALERT_KV stores install state, push tokens, optional install metadata, RSA public keys for SMS opt-in, and RSA-OAEP ciphertexts for phone numbers. It explicitly does not store plaintext contacts or smsEnabledContactE164 in new or updated install blobs. IDENTITY_KV holds mappings such as incidentId → userId where that helps path resolution. DELIVERY_CACHE stores only { contactKey, incidentId, token }, with no plaintext destination number. OPT_OUT_KV uses HMAC(E.164) as the key, which means the system can answer "is this number opted out?" when given a number, but it cannot enumerate the full phone-number set from storage.

That distinction is easy to miss. Privacy failures rarely come from one spectacular breach. They usually come from quiet convenience fields that were "temporarily" persisted and then never removed. The install blob, delivery cache, and opt-out store were designed specifically to avoid that pattern.

D1 stores relationships, hashes, and encrypted identifiers

Relational data still matters. Trusted contacts, incident access records, and viewer tokens have real relationships and lifecycle. That is what D1 is for.

TRUSTED_CONTACTS_DB stores anchor invites, trusted-contact relationships, and incident access records. The shape is the point: token_hash = SHA256(token), code_hash = SHA256(accessCode), and enc_incident_id encrypted with INCIDENT_ID_ENC_KEY. What it does not store is plaintext E.164. The optional METRICS_DB holds short-retention operational telemetry such as alert-chain latency.

This is the right use of a relational database in a sensitive system. Store what must be queryable. Hash what only needs verification. Encrypt what must be recoverable for control-plane behavior. Do not let the schema drift into becoming a shadow archive of user identity.

Durable Objects are the consistency layer

The most important state in this backend is not "user state." It is incident state.

Each incident gets its own Durable Object, with the object name derived from incidentId via HMAC so it is not enumerable. That Durable Object handles the event lifecycle: ARM, DISARM, HEARTBEAT, FINALIZE, SEND_NOW, CONTACTS_UPDATED, and EVIDENCE_READY. It is the coordination point for push delivery and SMS delivery, which is exactly the sort of per-key serialized workflow Durable Objects are good at.

The subtle but important design choice is how SMS delivery works. The server stores only RSA public keys and RSA-OAEP ciphertexts for opted-in phone numbers. At arm time, the app fetches those ciphertexts, decrypts them locally with the inviter's private key, validates E.164, and sends sendListE164 in the arm request. The Worker passes that list into the incident Durable Object. The DO keeps it only for the delivery retry window, uses it for Twilio sends, and wipes it in cleanupTerminal. It is not written to KV or D1.

That is the difference between "we encrypt phone numbers" and "we do not have a long-lived plaintext phone-number store." The first is table stakes. The second is architecture.

The flows are simple because the boundaries are strict

The vault flow is intentionally boring: POST /api/vault/init, get incidentId and signed PUT URLs, upload ciphertext chunks directly to R2, finalize when done. The alert flow is similarly direct: register push token and optional SMS public key, decrypt opted-in phone numbers on device, start the incident with sendListE164, let the Durable Object coordinate push and SMS, and give recipients a token plus a 6-digit code to fetch playback URLs and decrypt on the client side.

This simplicity is not aesthetic. It is what keeps the privacy model legible. You can trace where plaintext exists, when it exists, and when it is supposed to be gone.

At scale, that is what architecture reviews should optimize for. Not abstraction count. Not service count. Comprehensibility under failure.

Observability is part of the privacy model

One of the easier mistakes in secure systems is to do the crypto correctly and then leak the sensitive parts in logs.

That is why observability was treated as part of the architecture, not a separate ops concern. Worker logs persist, but invocation_logs = false prevents request and response bodies from being captured. More broadly, the operating rule is that logs are exportable and persistent, so sensitive data must never show up there in the first place. That means no signed URLs, no auth tokens, no phone numbers, and no incident IDs as correlation handles. Use status codes, endpoint names, hostname-only URLs, timings, and synthetic debug labels instead.

That is an operational discipline, not a polishing pass. Systems do not become privacy-preserving because the database is encrypted. They become privacy-preserving when the whole workflow, including debugging, respects the same boundary.

The tradeoff is deliberate

There are still accepted tradeoffs here. Push tokens are stored because alert delivery requires them. The access code is 6 digits because the product needs a recovery path recipients can actually use. Incident access still requires enough metadata to resolve token-to-incident and issue playback URLs. This is not a claim of perfect secrecy. It is a claim that the backend was designed so the most sensitive user data does not sit in long-lived plaintext storage and the server is not part of the evidence decryption path.

That is the architecture lesson I'd emphasize to a hiring manager.

The interesting part is not that the stack is modern. Plenty of systems use Workers, object storage, KV, and SQLite. The interesting part is that the storage model, control-plane design, and delivery workflow all line up around one principle: the backend should coordinate access and delivery without becoming the place where sensitive truth lives.

That is what makes a privacy story credible. Not the claim that the system is secure, but the fact that the system is structurally limited in what it can reveal.

← Back to Updates