Magic Bytes vs MIME Type vs File Extension

Published 2026-06-01

File type detection usually involves three signals: the extension, the MIME type, and the magic bytes. They answer different questions and have different trust levels.

Signal	Example	Comes from	Best use	Main weakness
Extension	`.pdf`	Filename	UX, routing, user expectations	User-controlled and easy to rename
MIME type	`application/pdf`	HTTP header, OS, database, detector	Content negotiation, storage metadata	Often inferred or supplied by the client
Magic bytes	`25 50 44 46 2D`	File content	Server-side detection and validation	Shared containers need deeper inspection

The right approach is not to choose one signal forever. Use each signal for the job it is good at, and decide which one wins when they disagree.

File extensions are labels

An extension is part of the filename. It is useful because people and operating systems understand it, but it is not proof of content.

Renaming report.exe to report.pdf changes the label only. The bytes at the beginning still start with the Windows executable signature 4D 5A, not the PDF signature 25 50 44 46 2D.

Use extensions for:

Displaying filenames.
Picking a default icon.
Choosing an expected allowlist rule.
Warning users when the name and content disagree.

Do not use extensions alone for upload security or automated processing.

MIME types are metadata

A MIME type is a text label such as image/png or application/pdf. It can come from an HTTP header, a database column, an operating system registry, a library, or a browser sniffing algorithm.

On upload, the most dangerous MIME value is the one in the multipart request. The client controls it:

Bash

curl -F "file=@script.php;type=image/png" https://example.com/upload

That request tells the server the file is image/png, even though the bytes can be PHP source. A browser may also infer a MIME type from extension or limited content sniffing. That can be useful for display, but it is not a validation boundary.

Use MIME types for:

HTTP Content-Type responses after you have validated or generated the file.
Database metadata that records your own detection result.
Matching downstream processors that expect MIME labels.

Do not treat client-supplied MIME values as proof.

Magic bytes are content signals

Magic bytes are byte sequences inside the file. They are often at offset 0, but some formats store their identifying bytes later.

Examples:

Format	Magic bytes	Offset	MIME type
PDF	`25 50 44 46 2D`	0	`application/pdf`
PNG	`89 50 4E 47 0D 0A 1A 0A`	0	`image/png`
ZIP	`50 4B 03 04`	0	`application/zip`
MP4	`66 74 79 70`	4	`video/mp4`
DICOM	`44 49 43 4D`	128	`application/dicom`

Magic-byte matching is a stronger first check because it reads the content. You can try it directly with the lookup tool or query:

Bash

curl "https://filesignature.org/api/v1/identify?hex=89%2050%204E%2047%200D%200A%201A%200A"

When the signals disagree

For upload validation, treat disagreement as a warning or rejection.

Extension	Client MIME	Magic bytes	Decision
`.png`	`image/png`	PNG signature	Accept if PNG is allowed and parsing succeeds
`.png`	`image/png`	EXE `4D 5A`	Reject
`.pdf`	`application/pdf`	ZIP `50 4B 03 04`	Reject or route to ZIP-family validation
`.docx`	DOCX MIME	ZIP `50 4B 03 04`	Inspect ZIP internals before accepting
no extension	missing	PDF signature	Accept only if PDF is in the allowlist

The magic bytes should usually decide the first parser to use. The extension and MIME type can then be checked for consistency.

Shared signatures require second-level checks

Magic bytes can identify the container but not always the exact format. The ZIP signature is the classic example. DOCX, XLSX, PPTX, APK, JAR, EPUB, and many other formats all use ZIP containers.

For these files, do two checks:

Confirm the container signature.
Inspect required internal files or fields.

Examples:

DOCX requires Open Packaging Convention metadata and a word/ directory.
XLSX requires an xl/ directory.
EPUB requires a mimetype file with application/epub+zip.
WEBP requires a RIFF header plus WEBP as the form type at offset 8.

This is why a file identification API can return multiple matches for one header. It is reporting all candidates that match the bytes you supplied.

Recommended validation order

For server-side validation:

Normalize the filename and extension.
Read enough header bytes for every format in your allowlist.
Compare magic bytes at the documented offsets.
For containers, inspect the internal structure.
Confirm the detected type is allowed for that endpoint.
Store your own detected type and MIME label.
Serve the file with a safe Content-Type only after validation.

The extension is for humans. The MIME type is for metadata and HTTP. The magic bytes are the content-level starting point for detection.

Frequently Asked Questions

Which is most reliable: magic bytes, MIME type, or extension?

Magic bytes are usually the strongest first signal because they are part of the file content. Extensions and MIME types are useful metadata, but they can be missing, stale, or user-controlled.

Can a MIME type be trusted on upload?

No. Multipart upload Content-Type values are supplied by the client. Use them as hints for UX or routing, but validate the content bytes on the server.

Are magic bytes enough for security?

No. They are necessary for type detection, but shared containers, polyglot files, and malformed files require additional parsing, allowlists, and safe storage controls.

Why do databases list multiple MIME types for one extension?

Some formats have legacy, vendor, and standardized MIME labels. The bytes can identify the format while the MIME label varies by application or registry.