Home
Backend from First Principles / Module 30 — File Uploads & Object Storage

File Uploads & Object Storage

S3, presigned URLs, multipart uploads, and how to handle files without your server becoming the bottleneck.


Why You Don't Want Files Going Through Your Server

The naïve way to handle uploads: client sends the file to your API, your API saves it to disk or forwards it to storage. This works for tiny uploads. It scales catastrophically.

Problems with proxying files through your backend:

The right pattern: get the bytes from the client to the storage system DIRECTLY, with your API only signing the permission slip. This is what S3 presigned URLs (and equivalents on every cloud) are designed for.


Object Storage — What S3 Actually Is

S3 (Simple Storage Service) is Amazon's object store, and "S3" has become a generic term for object storage everywhere — Google Cloud Storage, Azure Blob Storage, Cloudflare R2, MinIO all expose S3-compatible APIs.

Object storage isn't a filesystem. It's a key-value store optimized for files:

Key properties:

What it's good for: user uploads, generated content (PDFs, exports), static site assets, media files, backups, data lakes.

What it's not for: data you query frequently with filters (use a database), small frequent updates (use a key-value store like Redis).


Presigned URLs — The Whole Trick

A presigned URL is a regular S3 URL with a cryptographic signature appended. The signature says "the holder of this URL is authorized to do exactly THIS operation on THIS object until THIS time." S3 verifies the signature on every request.

The flow for an upload:

Text
   ┌──────────┐    1. POST /upload-url            ┌──────────┐
   │          │  ──────────────────────────────►  │          │
   │  Client  │                                    │ Your API │
   │ (browser │    2. ← presigned URL (10 min)    │          │
   │  or app) │  ◄──────────────────────────────  │          │
   │          │                                    └────┬─────┘
   │          │                                         │
   │          │    3. PUT bytes directly                │ (signs the URL,
   │          │  ─────────────────────────────────►     │  doesn't touch
   │          │                                         │  the bytes)
   │          │    4. ← 200 OK                          │
   │          │  ◄─────────────────────────────────     │
   └──────────┘                                         │
                                                         ▼
                                    ┌──────────────────────┐
                                    │   S3 / Object Store  │
                                    └──────────────────────┘

Your API:
1. Verifies the user is allowed to upload (auth, validation, file type check)
2. Generates a presigned URL — typically 5-15 minutes valid
3. Returns the URL to the client

The client:
4. Uploads bytes directly to S3 using the URL
5. (Optionally) tells your API "upload done, here's the key" so you can save the reference in your database

Your API never sees the bytes. The work it does is signing — milliseconds.

JavaScript
// Node.js example
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({ region: 'us-east-1' });

app.post('/upload-url', async (req, res) => {
  const { filename, contentType } = req.body;

  // 1. Validate
  const allowed = ['image/jpeg', 'image/png', 'application/pdf'];
  if (!allowed.includes(contentType)) {
    return res.status(400).json({ error: 'File type not allowed' });
  }

  // 2. Generate a unique key (don't trust client filename)
  const key = `uploads/${req.user.id}/${randomUUID()}-${filename}`;

  // 3. Sign
  const command = new PutObjectCommand({
    Bucket: 'my-bucket',
    Key: key,
    ContentType: contentType
  });
  const url = await getSignedUrl(s3, command, { expiresIn: 600 });

  res.json({ url, key });
});

Multipart Uploads for Large Files

A single PUT works fine for files up to a few hundred MB. For bigger uploads, use multipart upload — the file is split into parts that upload in parallel and can resume after a failure.

Text
                  Big file (1 GB)
                        │
                  Split into parts
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   Part 1 (5 MB)   Part 2 (5 MB)   ... Part 200 (5 MB)
        │               │               │
        ▼               ▼               ▼
   Presigned URL   Presigned URL   Presigned URL
        │               │               │
        ▼               ▼               ▼
                  Upload in parallel
                        │
                        ▼
              CompleteMultipartUpload
              (S3 stitches parts into one object)

Benefits:
• Parallel uploads — many parts in flight at once = faster
• Resumable — if part 73 fails, just retry part 73, not the whole file
• Memory friendly — client streams 5 MB at a time, not 1 GB

Most cloud SDKs handle this transparently with high-level upload helpers. You only deal with multipart manually when you need fine control (progress reporting, custom retry logic, browser uploads with explicit chunking).


Security Pitfalls

File uploads are a classic attack surface. Things to get right:

1. Never trust the client's content type. The browser sends image/png in the header, but the actual bytes might be a PHP script. Validate by reading the file's magic bytes server-side after upload, or use a content-type sniffer.

2. Strip metadata. Photos contain EXIF data with GPS coordinates and device info. Strip it before serving to other users.

3. Generate unique keys server-side. Don't use the client's filename directly — they can include path traversal (../../../), collide with other users' files, or include malicious characters.

4. Limit upload size with conditions. Presigned URLs can include a content-length-range condition that S3 enforces. Without this, a malicious client can fill your bucket.

5. Short URL expiration. 5-15 minutes for upload URLs. Hours-to-a-day for download URLs. Don't generate 7-day links unless you genuinely need them.

6. Scan for malware. For user-shared files, run an antivirus scan before serving them to other users. Lambda + ClamAV triggered by S3 events is the common pattern.

7. Private by default. Buckets should be private. Serve content via signed download URLs, or via a CDN with bucket-restricted access. Public buckets are a top breach source.

8. CORS configured tightly. Your bucket's CORS policy should only allow your domains, not *. Otherwise any website can upload to your bucket on the user's behalf if they have a valid signed URL leaked anywhere.

9. Re-validate after upload completes. The user uploaded a 4 MB image. Your API got a "done" callback. Verify the actual file in S3 matches expectations (size, type) before saving the reference to your database.


⁂ Back to all modules