ORC
application/octet-stream
Magic Bytes
Offset: 0
4F 52 43
Optimized Row Columnar (ORC) is a data storage format originally developed by Hortonworks for the Apache Hadoop ecosystem and currently maintained by the Apache Software Foundation. This format provides efficient compression and indexing for large-scale analytical processing within distributed frameworks like Apache Hive, Presto, and Spark. Although inherently safe, software implementations must validate schema metadata during file decompression to mitigate potential resource exhaustion or buffer overflow vulnerabilities during automated data ingestion.
Validation Code
How to validate .orc files in Python
Python
def is_orc(file_path: str) -> bool:
"""Check if file is a valid ORC by magic bytes."""
signature = bytes([0x4F, 0x52, 0x43])
with open(file_path, "rb") as f:
return f.read(3) == signature
How to validate .orc files in Node.js
Node.js
function isORC(buffer: Buffer): boolean {
const signature = Buffer.from([0x4F, 0x52, 0x43]);
return buffer.subarray(0, 3).equals(signature);
}
Go
func IsORC(data []byte) bool {
signature := []byte{0x4F, 0x52, 0x43}
if len(data) < 3 {
return false
}
return bytes.Equal(data[:3], signature)
}
API Endpoint
GET
/api/v1/orc
curl https://filesignature.org/api/v1/orc