WARC

application/warc

Safe

Magic Bytes

Offset: 0
57 41 52 43 2F

The Web ARChive (WARC) format is an ISO standard for storing web crawls, developed by the Internet Archive and maintained by the International Internet Preservation Consortium. It is the primary format used by organizations such as the Internet Archive and national libraries to preserve digital heritage, capturing full HTTP responses and associated metadata. As a container for raw web data, the format is considered safe, though extracted content may contain active scripts originally found on the live web.

Extension

.warc

MIME Type

application/warc

Byte Offset

0

Risk Level

Safe

Validation Code

How to validate .warc files in Python

Python
def is_warc(file_path: str) -> bool:
    """Check if file is a valid WARC by magic bytes."""
    signature = bytes([0x57, 0x41, 0x52, 0x43, 0x2F])
    with open(file_path, "rb") as f:
        return f.read(5) == signature

How to validate .warc files in Node.js

Node.js
function isWARC(buffer: Buffer): boolean {
  const signature = Buffer.from([0x57, 0x41, 0x52, 0x43, 0x2F]);
  return buffer.subarray(0, 5).equals(signature);
}
Go
func IsWARC(data []byte) bool {
    signature := []byte{0x57, 0x41, 0x52, 0x43, 0x2F}
    if len(data) < 5 {
        return false
    }
    return bytes.Equal(data[:5], signature)
}

API Endpoint

GET /api/v1/warc
curl https://filesignature.org/api/v1/warc

Related Formats