File Uploads & Object Storage
S3, presigned URLs, multipart uploads, and how to handle files without your server becoming the bottleneck.
Why You Don't Want Files Going Through Your Server
The naïve way to handle uploads: client sends the file to your API, your API saves it to disk or forwards it to storage. This works for tiny uploads. It scales catastrophically.
Problems with proxying files through your backend:
- Bandwidth doubles — the file goes client→server, then server→storage. You pay for the same bytes twice.
- Memory and CPU pressure — your API process holds the file (or streams it) while it uploads further. Concurrent uploads exhaust threads.
- Timeouts — a 100 MB upload over a slow connection might take minutes. Your API server now has a long-lived connection it can't kill cleanly.
- Scale ceiling — your API box becomes the bottleneck for upload bandwidth.
The right pattern: get the bytes from the client to the storage system DIRECTLY, with your API only signing the permission slip. This is what S3 presigned URLs (and equivalents on every cloud) are designed for.
Object Storage — What S3 Actually Is
S3 (Simple Storage Service) is Amazon's object store, and "S3" has become a generic term for object storage everywhere — Google Cloud Storage, Azure Blob Storage, Cloudflare R2, MinIO all expose S3-compatible APIs.
Object storage isn't a filesystem. It's a key-value store optimized for files:
- Buckets — named containers (like top-level folders)
- Objects — files identified by a key (path-like string), with associated bytes and metadata
- No real folders —
users/alice/avatar.jpgis just one key with slashes in it
Key properties:
- Practically unlimited capacity — petabytes are routine
- Per-object durability is extremely high — typically 99.999999999% (eleven nines)
- Cheap at rest — pennies per GB per month
- Costs by transfer — you pay for bytes flowing OUT of the bucket
- Eventually consistent reads in some cases — most operations are now strongly consistent on S3, but old code may assume otherwise
What it's good for: user uploads, generated content (PDFs, exports), static site assets, media files, backups, data lakes.
What it's not for: data you query frequently with filters (use a database), small frequent updates (use a key-value store like Redis).
Presigned URLs — The Whole Trick
A presigned URL is a regular S3 URL with a cryptographic signature appended. The signature says "the holder of this URL is authorized to do exactly THIS operation on THIS object until THIS time." S3 verifies the signature on every request.
The flow for an upload:
┌──────────┐ 1. POST /upload-url ┌──────────┐
│ │ ──────────────────────────────► │ │
│ Client │ │ Your API │
│ (browser │ 2. ← presigned URL (10 min) │ │
│ or app) │ ◄────────────────────────────── │ │
│ │ └────┬─────┘
│ │ │
│ │ 3. PUT bytes directly │ (signs the URL,
│ │ ─────────────────────────────────► │ doesn't touch
│ │ │ the bytes)
│ │ 4. ← 200 OK │
│ │ ◄───────────────────────────────── │
└──────────┘ │
▼
┌──────────────────────┐
│ S3 / Object Store │
└──────────────────────┘
Your API:
1. Verifies the user is allowed to upload (auth, validation, file type check)
2. Generates a presigned URL — typically 5-15 minutes valid
3. Returns the URL to the client
The client:
4. Uploads bytes directly to S3 using the URL
5. (Optionally) tells your API "upload done, here's the key" so you can save the reference in your database
Your API never sees the bytes. The work it does is signing — milliseconds.
// Node.js example
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
const s3 = new S3Client({ region: 'us-east-1' });
app.post('/upload-url', async (req, res) => {
const { filename, contentType } = req.body;
// 1. Validate
const allowed = ['image/jpeg', 'image/png', 'application/pdf'];
if (!allowed.includes(contentType)) {
return res.status(400).json({ error: 'File type not allowed' });
}
// 2. Generate a unique key (don't trust client filename)
const key = `uploads/${req.user.id}/${randomUUID()}-${filename}`;
// 3. Sign
const command = new PutObjectCommand({
Bucket: 'my-bucket',
Key: key,
ContentType: contentType
});
const url = await getSignedUrl(s3, command, { expiresIn: 600 });
res.json({ url, key });
});
Multipart Uploads for Large Files
A single PUT works fine for files up to a few hundred MB. For bigger uploads, use multipart upload — the file is split into parts that upload in parallel and can resume after a failure.
Big file (1 GB)
│
Split into parts
│
┌───────────────┼───────────────┐
▼ ▼ ▼
Part 1 (5 MB) Part 2 (5 MB) ... Part 200 (5 MB)
│ │ │
▼ ▼ ▼
Presigned URL Presigned URL Presigned URL
│ │ │
▼ ▼ ▼
Upload in parallel
│
▼
CompleteMultipartUpload
(S3 stitches parts into one object)
Benefits:
• Parallel uploads — many parts in flight at once = faster
• Resumable — if part 73 fails, just retry part 73, not the whole file
• Memory friendly — client streams 5 MB at a time, not 1 GB
Most cloud SDKs handle this transparently with high-level upload helpers. You only deal with multipart manually when you need fine control (progress reporting, custom retry logic, browser uploads with explicit chunking).
Security Pitfalls
File uploads are a classic attack surface. Things to get right:
1. Never trust the client's content type. The browser sends image/png in the header, but the actual bytes might be a PHP script. Validate by reading the file's magic bytes server-side after upload, or use a content-type sniffer.
2. Strip metadata. Photos contain EXIF data with GPS coordinates and device info. Strip it before serving to other users.
3. Generate unique keys server-side. Don't use the client's filename directly — they can include path traversal (../../../), collide with other users' files, or include malicious characters.
4. Limit upload size with conditions. Presigned URLs can include a content-length-range condition that S3 enforces. Without this, a malicious client can fill your bucket.
5. Short URL expiration. 5-15 minutes for upload URLs. Hours-to-a-day for download URLs. Don't generate 7-day links unless you genuinely need them.
6. Scan for malware. For user-shared files, run an antivirus scan before serving them to other users. Lambda + ClamAV triggered by S3 events is the common pattern.
7. Private by default. Buckets should be private. Serve content via signed download URLs, or via a CDN with bucket-restricted access. Public buckets are a top breach source.
8. CORS configured tightly. Your bucket's CORS policy should only allow your domains, not *. Otherwise any website can upload to your bucket on the user's behalf if they have a valid signed URL leaked anywhere.
9. Re-validate after upload completes. The user uploaded a 4 MB image. Your API got a "done" callback. Verify the actual file in S3 matches expectations (size, type) before saving the reference to your database.
⁂ Back to all modules