Microsoft Office Open XML Format
application/octet-stream
Magic Bytes
Offset: 0
50 4B 03 04 50 4B 05 06 50 4B 07 08
Microsoft Office Open XML Format (DOCX) is an XML-based word processing specification developed by Microsoft and standardized by ISO and IEC. It serves as the primary document standard for Microsoft Word and is extensively utilized for creating business reports, academic manuscripts, and professional correspondence. As a compressed ZIP container of XML files, it replaced the legacy binary DOC format to provide improved data recovery and reduced file sizes.
Validation Code
How to validate .docx files in Python
Python
def is_docx(file_path: str) -> bool:
"""Check if file is a valid DOCX by magic bytes."""
signature = bytes([0x50, 0x4B, 0x03, 0x04, 0x50, 0x4B, 0x05, 0x06, 0x50, 0x4B, 0x07, 0x08])
with open(file_path, "rb") as f:
return f.read(12) == signature
How to validate .docx files in Node.js
Node.js
function isDOCX(buffer: Buffer): boolean {
const signature = Buffer.from([0x50, 0x4B, 0x03, 0x04, 0x50, 0x4B, 0x05, 0x06, 0x50, 0x4B, 0x07, 0x08]);
return buffer.subarray(0, 12).equals(signature);
}
Go
func IsDOCX(data []byte) bool {
signature := []byte{0x50, 0x4B, 0x03, 0x04, 0x50, 0x4B, 0x05, 0x06, 0x50, 0x4B, 0x07, 0x08}
if len(data) < 12 {
return false
}
return bytes.Equal(data[:12], signature)
}
API Endpoint
GET
/api/v1/docx
curl https://filesignature.org/api/v1/docx