Skip to content

Secure File Upload Validation

Published

Secure upload validation is a layered workflow. Magic bytes are one important layer, but they do not replace allowlists, parser checks, safe storage, scanning, and least-privilege processing.

The goal is not to prove a file is harmless. The goal is to accept only the small set of formats your product genuinely needs, process them with the right parser, and contain failures when a file is malformed or hostile.

Start with an allowlist

Define allowed formats per upload endpoint. A profile image endpoint might allow only PNG, JPG, and WEBP. A document ingestion endpoint might allow PDF and DOCX. A general archive upload endpoint might allow ZIP, but only with size and extraction limits.

Avoid broad categories like "all images" or "all documents" unless you have a reason to parse every format in that category. A narrow allowlist is easier to test and easier to defend.

Endpoint Better allowlist Risky allowlist
Avatar upload PNG, JPG, WEBP Any image/*
Invoice upload PDF PDF, DOC, DOCX, XLSX, ZIP
Document import PDF, DOCX Any file under 50 MB
Theme upload JSON ZIP with arbitrary contents

Validate content bytes on the server

Read the header bytes and compare them against the known signatures for the formats you accept. Do this after upload reaches your server or edge function, not only in browser JavaScript.

JavaScript
const ALLOWED = [
  ["png", 0, Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a])],
  ["jpg", 0, Buffer.from([0xff, 0xd8, 0xff])],
  ["pdf", 0, Buffer.from([0x25, 0x50, 0x44, 0x46, 0x2d])],
];

export function detectAllowed(head) {
  for (const [label, offset, signature] of ALLOWED) {
    const end = offset + signature.length;
    if (head.length >= end && head.subarray(offset, end).equals(signature)) {
      return label;
    }
  }
  return null;
}

You can use the lookup tool while building the allowlist, then store the exact signatures in code. If your application needs a dynamic lookup, call /api/v1/identify?hex=... and compare the response against your own allowed extensions.

Do not stop at the header for containers

Container formats share signatures. The ZIP header 50 4B 03 04 can represent a plain ZIP archive, DOCX, XLSX, PPTX, JAR, APK, EPUB, ODT, and many others. The header alone is not a safe final decision.

For container formats:

  • Enforce maximum compressed and uncompressed size.
  • Limit file count and directory depth.
  • Reject absolute paths and .. path traversal entries.
  • Inspect required internal files.
  • Avoid extracting into executable or web-served directories.
  • Treat nested archives as suspicious unless the feature explicitly requires them.

For a DOCX allowlist, require the ZIP signature plus the expected Office package files. For an EPUB allowlist, require the mimetype file and expected metadata paths. For a JAR or APK, decide whether executable archives are appropriate at all.

Re-encode risky media when possible

Images are often safer after re-encoding. If the product only needs a rendered image, decode it with a maintained library and write a fresh PNG, JPG, or WEBP. This drops metadata, appended payloads, and unusual chunks that your application does not need.

For documents, prefer server-side conversion or preview generation in a sandbox. Do not run office converters, image libraries, or archive extraction tools with broad filesystem access.

Store files as data, never as code

Safe storage rules matter as much as validation:

  • Generate server-side filenames instead of trusting client filenames.
  • Store uploads outside the application source tree.
  • Serve uploads from a separate hostname when possible.
  • Send Content-Disposition: attachment for formats that should not render inline.
  • Set X-Content-Type-Options: nosniff.
  • Do not execute, include, import, or template uploaded files.
  • Log the detected type, original name, size, hash, and validation result.

If a file later reaches another system, pass along the server-detected type rather than the original client-supplied MIME type.

Handle failure modes deliberately

Reject a file when:

  • The extension is not in the endpoint allowlist.
  • The magic bytes do not match an allowed format.
  • The signature matches a dangerous format such as EXE, DLL, JAR, or script formats.
  • A shared container cannot be confirmed by internal structure.
  • Parsing, re-encoding, or scanning fails.
  • The file is too large or expands too much when decompressed.

Return a user-facing error that names the supported formats without echoing risky internals. Keep the detailed reason in server logs.

A practical upload checklist

Use this checklist for every upload endpoint:

  1. Define a tight allowlist for that endpoint.
  2. Enforce request and file size limits before expensive parsing.
  3. Read header bytes and match signatures at documented offsets.
  4. Inspect shared containers before accepting them.
  5. Re-encode or transform media when the business flow allows it.
  6. Store with generated names outside executable paths.
  7. Serve with a safe content type and nosniff.
  8. Scan asynchronously if your risk model requires antivirus or malware detection.
  9. Log detection results for audit and incident response.

Magic bytes are the reliable first signal, not the whole security story. Use them to choose the right validation path, then keep every later step constrained.

Frequently Asked Questions

Is checking a file extension enough for upload security?

No. Extensions are user-controlled filename labels. Use them only as hints, then validate the content bytes and expected internal structure on the server.

Should uploads be blocklisted or allowlisted?

Use an allowlist. It is far easier to list the exact formats your application supports than to enumerate every dangerous or unexpected format.

What should I do with ZIP-based formats like DOCX or EPUB?

First verify the ZIP signature, then inspect required files inside the archive. A ZIP header alone does not prove the file is a DOCX, XLSX, EPUB, APK, or JAR.

Where should uploaded files be stored?

Store uploads outside executable paths, preferably on a separate origin or object store. Use generated filenames, strict content disposition, and never include uploaded files as code.