WARC
application/warc
Magic Bytes
Offset: 0
57 41 52 43 2F
The Web ARChive (WARC) format is an ISO standard for storing web crawls, developed by the Internet Archive and maintained by the International Internet Preservation Consortium. It is the primary format used by organizations such as the Internet Archive and national libraries to preserve digital heritage, capturing full HTTP responses and associated metadata. As a container for raw web data, the format is considered safe, though extracted content may contain active scripts originally found on the live web.
Validation Code
How to validate .warc files in Python
Python
def is_warc(file_path: str) -> bool:
"""Check if file is a valid WARC by magic bytes."""
signature = bytes([0x57, 0x41, 0x52, 0x43, 0x2F])
with open(file_path, "rb") as f:
return f.read(5) == signature
How to validate .warc files in Node.js
Node.js
function isWARC(buffer: Buffer): boolean {
const signature = Buffer.from([0x57, 0x41, 0x52, 0x43, 0x2F]);
return buffer.subarray(0, 5).equals(signature);
}
Go
func IsWARC(data []byte) bool {
signature := []byte{0x57, 0x41, 0x52, 0x43, 0x2F}
if len(data) < 5 {
return false
}
return bytes.Equal(data[:5], signature)
}
API Endpoint
GET
/api/v1/warc
curl https://filesignature.org/api/v1/warc