Skip to content

About FileSignature.org

Mission

FileSignature.org is the open file signature database. We document magic bytes, MIME types, byte offsets, and risk levels for 900+ file formats, providing developers, security researchers, and forensics professionals with a reliable, machine-readable reference for file type identification and validation.

Data Sources

Our signature database is aggregated from multiple authoritative sources to maximize coverage and accuracy:

  • Apache Tika — The Apache Software Foundation's content detection library, used across enterprise systems for MIME type identification.
  • Gary Kessler's File Signatures Table — One of the longest-running file signature references, maintained since 2002 and widely cited in digital forensics literature.
  • Wikipedia — File format articles that document magic numbers, specifications, and historical context for major formats.
  • Neil Harvey's format documentation — Additional format documentation and signature references used in forensic analysis.

Methodology

Signatures are collected via an automated scraper pipeline that fetches, normalizes, and cross-references data from all sources. When multiple sources report different signatures for the same extension, we store all known variants with full source attribution. The primary signature (displayed most prominently) is selected based on source authority and frequency of citation.

All hex signatures are normalized to uppercase, space-separated byte format (e.g., 25 50 44 46). Validation code snippets are generated from the primary signature and tested against known-good sample files.

AI Disclosure

Format descriptions on this site are generated by AI language models and reviewed for accuracy. Magic bytes, MIME types, byte offsets, and source attributions are never AI-generated — they come directly from the authoritative sources listed above.

Contribute

FileSignature.org is open source. Report issues, suggest new formats, or contribute code on GitHub.