Offline First Was Not a Nice-to-Have. It Was the Product Constraint.

Most systems are designed around the assumption that the device will still be there in five minutes. This one was not.

The core constraint was simpler and harsher: the device may be lost or taken at any time. That changes the architecture immediately. Recording cannot depend on network. Encryption cannot wait for the backend. Upload cannot be a best-effort afterthought. And if only one segment ever makes it off the device, that first segment has to be the one that matters.

That is why the system was built as an offline-first evidence pipeline with a resilient vault backup path. The app records locally, encrypts locally, segments locally, and only then opportunistically pushes ciphertext to remote storage. The server does not orchestrate the upload queue. It issues signed URLs and stores encrypted chunks. The client owns durability.

In practice, this is the difference between a feature and a system. A feature assumes good conditions. A system is designed for hostile ones.

Start with the failure mode, not the happy path

Teams usually design media flows from the center outward. They think about capture quality, playback polish, and backend organization. The common mistake is that they treat connectivity as the default and disruption as the exception.

Here, disruption was the default case.

Recording starts and runs with no internet. Camera, mic, encoder, and segmenter all run on-device. Output is written as encrypted fMP4 segments into the app sandbox, not the Photos gallery. Saving to Photos is explicit, not automatic, because privacy and survivability matter more than convenience in the default path. The core recording flow also does not require login or backend auth just to begin capture. Identity can be attached later when network is available.

That decision sounds narrow, but it cascades into everything else. Once recording is independent of network, you stop treating the backend as part of capture. It becomes what it should be: a remote durability layer for ciphertext, not a prerequisite for the user to create evidence at all.

The first chunk matters more than the last one

The most important design choice in the vault path was not encryption. It was ordering.

The queue persists its state to upload_queue.enc in the app sandbox so it survives restarts and process death. But persistence alone is not enough. If the device disappears thirty seconds into a session, the system does not get credit for having an elegant queue if the wrong data was uploaded first. So the upload policy is explicit: chunk 0 first, then remaining chunks in index order, then the manifest.

That priority rule captures the actual product logic. Earliest evidence is highest value because it is the least replaceable. A later manifest is useful. Chunk 7 is useful. But neither matters if chunk 0 never left the device.

This is where a lot of otherwise competent systems fall apart. They optimize for throughput when they should optimize for survivability. Those are not the same thing.

The upload flow itself stays deliberately simple. The client initializes the vault, gets an incidentId plus signed PUT URLs, requests upload URLs per chunk, and PUTs ciphertext directly to R2. After all segments are uploaded, it finalizes the vault. The server never sees the decryption key and never needs to proxy media bytes. That keeps the server path operationally simpler and keeps plaintext out of long-lived backend storage.

Reliability is mostly retry discipline

Offline-first systems do not fail because they lack retries. They fail because their retry model is naive.

This queue uses a clear policy. Transient failures such as network issues, 5xxs, 408s, and 429s go through exponential backoff with jitter. Terminal 4xx failures, excluding 408 and 429, do not. Retry delays are fixed and understandable: 5 seconds, 15 seconds, 45 seconds, 2 minutes, and 2 minutes capped, with ±20% jitter to avoid synchronization effects. Max retries per item is 5.

That matters for two reasons.

First, predictability beats cleverness in failure handling. You want operators and engineers to know what the queue will do without reading a novel. Second, offline is not treated as a retry-worthy failure. If the user is offline, the queue pauses. It does not burn through attempts while the environment is behaving exactly as expected.

That distinction is easy to miss and expensive to get wrong. Counting intentional offline time as retry failure is how systems create false exhaustion, bad UX, and support debt.

The state machine is equally disciplined: Queued → Uploading → Complete, or Retrying, or Failed, with a paused state when offline or optionally under low battery. A single mutex-protected loop picks the next eligible item, respects concurrency limits, and advances state after each attempt. On WiFi, it allows up to two concurrent uploads. On cellular, one. That is not because higher concurrency is impossible. It is because thermal load, battery pressure, and network contention are real product constraints, not theoretical footnotes.

There is also stalled-upload handling. If an item sits in Uploading too long, it can be marked retrying and re-queued. This is another unglamorous decision that matters more in production than in demos. Users do not care whether the state machine was elegant. They care whether a hung upload recovered.

fMP4 is what makes the whole model viable

The media container choice is not an implementation detail here. It is central to the resilience model.

Regular MP4 is optimized for completed files. The moov atom is typically at the end, which means you generally need the full recording before playback is practical unless you remux. That is the wrong shape for a system trying to get meaningful evidence off-device as early as possible. fMP4 changes that. With the moov metadata at the start and independently playable fragments, segments can be finalized and uploaded while recording is still in progress.

That means the system can start pushing chunk 0 while chunk 1 is still being recorded. If the device is gone after thirty seconds, there is still a plausible chance the earliest segment is already in the vault and playable. That is the real value of segmented recording here. Not elegance. Not streaming purity. Survivability.

This is also why the architecture keeps upload client-driven. There is no server-side queue trying to reconstruct partial media state. The client has the segments, the encryption context, the queue order, and the local knowledge of what is ready. The backend issues signed URLs and stores ciphertext. That is the right boundary.

Alerts and evidence have to decouple cleanly

A subtle but important part of the design is that upload progress and alerting are related, but not tightly coupled.

When the first segment or a sentinel is uploaded, the app can notify the backend that evidence is ready. That allows the incident's Durable Object to transition state and, if alerts are already armed, proceed with sending the vault link and code. But the upload queue runs independently of arming. If the user arms after some chunks are already uploaded, evidence-ready can be sent again so the backend catches up to the actual vault state.

That separation is good systems design. It avoids making user safety flows brittle just because network timing is messy. Capture, upload, and alerting are coordinated, but they are not allowed to deadlock each other.

The real takeaway

What I would want a hiring manager to see in this architecture is not just that it uses encryption, retries, and signed URLs. Lots of systems do that. The point is that the design starts from a real product constraint and then stays honest all the way down.

If the device may disappear, recording must work offline. If early evidence matters most, chunk 0 must upload first. If mobile networks are unreliable, offline must pause rather than consume retries. If partial survival matters, fMP4 is the right container. If the backend should not become a liability, it should store ciphertext and issue URLs, not own the queue. Those are not isolated technical choices. They are one coherent argument expressed through the architecture.

That is usually what separates a prototype from a production system. Prototypes optimize for capability. Production systems optimize for the failure mode that matters most.

And in this case, that failure mode was clear from the start: the device may be gone, but the evidence should not be.