About FileSignature.org
Mission
FileSignature.org is the open file signature database. We document magic bytes, MIME types, byte offsets, and risk levels for 900+ file formats, providing developers, security researchers, and forensics professionals with a reliable, machine-readable reference for file type identification and validation.
Data Sources
Our signature database is aggregated from multiple authoritative sources to maximize coverage and accuracy:
- • Apache Tika — The Apache Software Foundation's content detection library, used across enterprise systems for MIME type identification.
- • Gary Kessler's File Signatures Table — One of the longest-running file signature references, maintained since 2002 and widely cited in digital forensics literature.
- • Wikipedia — File format articles that document magic numbers, specifications, and historical context for major formats.
- • Neil Harvey's format documentation — Additional format documentation and signature references used in forensic analysis.
Methodology
Signatures are collected via an automated scraper pipeline that fetches, normalizes, and cross-references data from all sources. When multiple sources report different signatures for the same extension, we store all known variants with full source attribution. The primary signature (displayed most prominently) is selected based on source authority and frequency of citation.
All hex signatures are normalized to uppercase, space-separated byte
format (e.g., 25 50 44 46).
Validation code snippets are generated from the primary signature
and tested against known-good sample files.
AI Disclosure
Format descriptions on this site are generated by AI language models and reviewed for accuracy. Magic bytes, MIME types, byte offsets, and source attributions are never AI-generated — they come directly from the authoritative sources listed above.
Contribute
FileSignature.org is open source. Report issues, suggest new formats, or contribute code on GitHub.