HTML
text/html
Magic Bytes
Offset: 0
3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C
HyperText Markup Language (HTML) is the standard markup language for web browser documents, maintained jointly by the WHATWG and W3C. It functions as the foundational framework for the World Wide Web, structuring text, images, and interactive elements into cohesive pages. Although the files are simple plain text, they frequently contain executable scripts; consequently, improper handling can expose users to Cross-Site Scripting (XSS) vulnerabilities despite the format’s generally safe classification.
Validation Code
How to validate .html files in Python
Python
def is_html(file_path: str) -> bool:
"""Check if file is a valid HTML by magic bytes."""
signature = bytes([0x3C, 0x21, 0x44, 0x4F, 0x43, 0x54, 0x59, 0x50, 0x45, 0x20, 0x48, 0x54, 0x4D, 0x4C])
with open(file_path, "rb") as f:
return f.read(14) == signature
How to validate .html files in Node.js
Node.js
function isHTML(buffer: Buffer): boolean {
const signature = Buffer.from([0x3C, 0x21, 0x44, 0x4F, 0x43, 0x54, 0x59, 0x50, 0x45, 0x20, 0x48, 0x54, 0x4D, 0x4C]);
return buffer.subarray(0, 14).equals(signature);
}
Go
func IsHTML(data []byte) bool {
signature := []byte{0x3C, 0x21, 0x44, 0x4F, 0x43, 0x54, 0x59, 0x50, 0x45, 0x20, 0x48, 0x54, 0x4D, 0x4C}
if len(data) < 14 {
return false
}
return bytes.Equal(data[:14], signature)
}
API Endpoint
GET
/api/v1/html
curl https://filesignature.org/api/v1/html