Plan: File Upload & Storage
1. Goal & scope
Add a general-purpose media upload/storage capability to the platform core: a server endpoint that accepts files, a pluggable storage backend, a Media entity in the event-sourced model, a schema field type so any service can declare file fields, and the SDK/Admin UI wiring to make the existing MediaUploader real.
In scope: multipart upload, local-disk + S3-compatible backends, Media metadata model, download/serve, auth + validation, tenant scoping, codegen field type, SDK method, dogfood + conformance tests.
Non-goals (deferred, documented): image transforms/thumbnails, virus scanning, resumable/chunked uploads, CDN integration.
2. Architecture — mirror the registry pattern
Atomo already has a precedent: the worker token registry (WorkerTokenStore + job_routes + read-only blob dir). The upload feature follows that shape plus a write path.
2a. Storage abstraction (crates/atomo_server/src/storage.rs)
trait StorageBackend: Send + Sync {
async fn put(&self, key: &str, bytes: Bytes, content_type: &str) -> Result<()>;
async fn get(&self, key: &str) -> Result<Option<(Bytes, String)>>;
async fn delete(&self, key: &str) -> Result<()>;
fn presign_get(&self, key: &str, ttl: Duration) -> Option<String>; // S3 only
}LocalStorage— files under ablob_dir(reuseRegistryStore'stokio::fs+ relative-path approach). Storage key ={tenant}/{yyyy}/{mm}/{uuid}{ext}(never the user's filename — kills path traversal).S3Storage— behind a Cargo featurestorage-s3usingaws-sdk-s3(new dep).- Selected via env (see §6).
2b. Metadata model — MediaStore
Idempotent DDL like RegistryStore::init():
CREATE TABLE media (
id TEXT PRIMARY KEY, tenant_id TEXT, filename TEXT, content_type TEXT,
size BIGINT, storage_key TEXT NOT NULL, checksum TEXT,
uploaded_by TEXT, created_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ
);Emit MediaUploaded / MediaDeleted events through the existing event log so media participates in audit/history/projections — this is what makes Atomo's upload event-sourced.
2c. Routes (upload_routes.rs, wired into create_router)
POST /media— multipart upload → store bytes + insert metadata + emit event →{id, url, contentType, size}. Behindauth_middleware; also accepts anX-Worker-Token(viaoptional_worker_auth_middleware) so external workers can store generated artifacts without a user session (owner mapped toworker:{id}, no tenant).GET /media/{id}— serve bytes (local) or 302 → presigned URL (S3). Gated by read access + tenant scope. The local proxy path honors HTTP Range requests (206 /Content-Range, 416 for an unsatisfiable range) sovideo/audiocan seek, advertisesAccept-Ranges: bytes, and emits a strongETag(the immutable media id) for conditional GETs (If-None-Match→ 304). On the S3 redirect path, S3 serves Range natively against the presigned URL.POST /media/presign— (S3 only) get a presigned PUT URL for a large/out-of-band upload →{ id, key, uploadUrl }.501when the backend can't presign (local disk). The client PUTs bytes directly touploadUrl(they never pass through the server), then calls commit.POST /media/commit— register a presigned upload:{ id, key, filename?, contentType?, checksum? }. Validateskeybelongs to the caller's tenant, confirms the object exists + measures it via S3HEAD, dedups onchecksum, records metadata + emitsMediaCreated →{ id, url, deduped }.DELETE /media/{id}— soft-delete +MediaDeletedevent.- Requires axum's
multipartfeature +DefaultBodyLimitsize cap.
3. Schema-driven integration
- New field type in
atomo_schema(typescript_parser.rs): aFiletype mapping to aTEXTcolumn storing the media id (soft FK tomedia.id), surfaced in/meta/schema. - Added through the unified
parse_model_metadatapath (not a new brace-walk) with a parser unit test — the parse/codegen layer is the conformance plan's known fragile spot. - A service declares
avatar: Fileorphotos: File[]and codegen + Admin UI handle it.
4. SDK + Admin UI
- Admin UI:
MediaUploader.tsxalready POSTs multipart touploadEndpointand expects{ url }— point it atPOST /media, send the auth token, render the returnedurl. Wire the dynamic form renderer forFile-typed fields. Fix the fakedretryUpload. - TS SDK: add
uploadMedia(file): Promise<{id,url}>andgetMediaUrl(id). - No Dart SDK — out of scope; document the raw
POST /mediacontract.
5. Security (network-exposed write endpoint)
- Auth required on upload/delete; reads gated by model access +
tenant_idscope. - Content-type allow-list + magic-byte sniff; enforce
maxFileSizeserver-side. - Generated storage keys only — never use the client filename in the path.
- Presigned, expiring URLs for S3; private bucket.
- Rate-limit uploads (existing
middleware.rs). - A future "fetch-from-URL" variant = SSRF risk → host allow-list (deferred).
6. Config (.env.example)
STORAGE_BACKEND=local # local | s3
STORAGE_LOCAL_DIR=./.atomo/media
STORAGE_MAX_FILE_SIZE=10485760
# S3 (when STORAGE_BACKEND=s3)
STORAGE_S3_BUCKET=...; STORAGE_S3_REGION=...; AWS_ACCESS_KEY_ID=...; AWS_SECRET_ACCESS_KEY=...Constructed once at boot (like OAuthManager::from_env()), injected into AppState.
7. Phased delivery (each phase ends with a passing test)
- Phase A — Local backend, happy path.
storage.rstrait +LocalStorage,MediaStore+ migration,POST /media(auth + size + multipart) andGET /media/{id}. Test: upload a fixture, read it back, assert metadata. Enable axummultipart. - Phase B — Event-sourcing + audit. Emit
MediaUploaded/MediaDeleted;DELETE /media/{id}soft-delete. Test: upload→delete reconstructs viaentity_history; audit records actor. - Phase C — Schema field type.
Filetype in the unified parser +/meta/schema. Test: parser unit test + a dogfood model field (Contact.avatar: File). - Phase D — Admin UI + SDK. Repoint
MediaUploader, wireFilefields, add SDKuploadMedia. Test: extend Playwright e2e — upload on a CRM entity, assert url renders. - Phase E — S3 backend.
S3Storagebehindstorage-s3+ presigned reads. Test:#[ignore]integration gated on S3 creds (MinIO in CI), mirroring the pgvector pattern. - Phase SEC — Hardening. Allow-list + magic-byte check, tenant-scope read enforcement, rate limit. Tests: reject oversized, reject disallowed type, cross-tenant read denied.
8. Risks / open questions
- Multi-file fields (
File[]) — ✅ parses toArray(File)(JSONB), renders the multi-file uploader. - GraphQL vs REST split — uploads/serves stay REST (
/media, multipart + bytes/redirect); GraphQL only stores/returns the media id (a TEXTFilefield), resolved to a URL byapiClient.getMediaUrl. Multipart over GraphQL is intentionally avoided. - Orphan cleanup —
MediaState::purge_deleted(older_than)GCs old soft-deleted rows (housekeeping; bytes are freed on soft-delete). Reference-based orphans (media whose referencing entity was deleted) are intentionally not auto-GC'd — needs per-schema reference tracking and would risk deleting in-use media. Wirepurge_deletedto a scheduler under a retention policy. - Roadmap honesty — capability marked delivered in README with scope; S3 verified via MinIO.
Smallest shippable slice with real value: Phases A + C + D (local storage, schema File type, Admin UI wired) — uploads work end-to-end through the dogfood; S3 + full hardening as fast-follows.
Delivery status (honesty)
- Phase A (local backend) — ✅ done + tested (storage unit tests, HTTP lifecycle).
- Phase B (event-sourcing/audit) — ✅ done (Media Created/Deleted events; DB-gated test).
- Phase C (schema File type) — ✅ done (parser maps
File/File[]to string-backed TEXT). - Phase D (Admin UI + SDK) — ✅ done:
apiClient.uploadMedia/getMediaUrl+MediaUploaderposts to real/mediawith auth + real retry. AFieldType::Filevariant (TEXT-backed in all codegen/DB paths) makes the server emit a distinctfilemetadata type, andFormFieldauto-rendersMediaUploaderfor it. Test:field_type_str(File) == "file". - Phase E (S3) — ✅ implemented behind
storage-s3feature and verified against MinIO:put/get/deleteroundtrip + presigned-URL read both pass.GET /media/{id}302-redirects to a short-lived presigned URL when the backend provides one (S3); local proxies bytes. - Phase SEC — ✅ magic-byte content sniffing + opt-in tenant read scoping (
STORAGE_PRIVATE_READS); rate limiting is inherited from the app-level middleware. - Range / streaming serve — ✅
GET /media/{id}supports HTTP Range (RFC 7233) on the local proxy path: single-range206withContent-Range,416for unsatisfiable ranges,Accept-Rangeson every response, and a strongETag(immutable media id) enablingIf-None-Match→304. This makesvideo/audioseekable. Tested bymedia_http_supports_range_requests+range_parsing_covers_rfc_cases. - Content checksum + dedup — ✅ every upload records a sha256
checksum(returned in the upload response). Identical content for the same tenant dedups to the existing media id — nothing is re-stored (e.g. re-uploading the same reference image is free). Dedup is tenant-scoped (never shares bytes across tenants) and ignores soft-deleted rows. Tested bymedia_http_dedups_identical_content_per_tenant. - Presigned direct upload (S3) — ✅
POST /media/presignreturns a presigned PUT URL; the client uploads bytes directly to S3 (never through the server), thenPOST /media/commitvalidates the tenant-prefixed key, confirms + measures the object via S3HEAD, dedups, and records metadata.presigned_put_url+sizeon theStorageBackendtrait (S3 = presign/HEAD, local = None/stat). Verified against MinIO (s3_presigned_put_is_uploadable,media_presign_commit_roundtrip). For large media a worker bypasses the server entirely. (Thestorage-s3feature needs rustc ≥ 1.91 — the latest aws-sdk MSRV.)
Verifying the S3 backend locally (MinIO)
MinIO is a single static binary — no Docker. To run the storage-s3 tests:
# 1. Get + launch MinIO (localhost, ephemeral creds)
curl -sSL -o /tmp/minio https://dl.min.io/server/minio/release/linux-amd64/minio && chmod +x /tmp/minio
curl -sSL -o /tmp/mc https://dl.min.io/client/mc/release/linux-amd64/mc && chmod +x /tmp/mc
MINIO_ROOT_USER=atomotest MINIO_ROOT_PASSWORD=atomotest123 \
setsid bash -c '/tmp/minio server /tmp/minio-data --address 127.0.0.1:9000' &
# 2. Create the bucket
/tmp/mc alias set local http://127.0.0.1:9000 atomotest atomotest123
/tmp/mc mb --ignore-existing local/atomo-media
# 3. Run the feature-gated, #[ignore]d S3 tests
STORAGE_S3_BUCKET=atomo-media STORAGE_S3_ENDPOINT=http://127.0.0.1:9000 \
AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=atomotest AWS_SECRET_ACCESS_KEY=atomotest123 \
cargo test -p atomo_server --features storage-s3 --test media_s3 -- --ignoredCI runs the same way (MinIO service + these env vars), mirroring how the pgvector/AI tests are gated. Tip: stop MinIO with pkill -x minio (not pkill -f 'minio …' — -f matches your own command line).