PARQUET
application/x-parquet
Magic Bytes
Offset: 0
50 41 52 31
Apache Parquet is an open-source, column-oriented data storage format created by Twitter and Cloudera and maintained by the Apache Software Foundation. This specification is primarily utilized within the Hadoop ecosystem for large-scale data processing and analytics, providing distinct compression and encoding schemes. The format is considered safe as it contains no executable code; however, users should exercise caution to ensure metadata integrity when ingesting data from untrusted or external environments.
Validation Code
How to validate .parquet files in Python
Python
def is_parquet(file_path: str) -> bool:
"""Check if file is a valid PARQUET by magic bytes."""
signature = bytes([0x50, 0x41, 0x52, 0x31])
with open(file_path, "rb") as f:
return f.read(4) == signature
How to validate .parquet files in Node.js
Node.js
function isPARQUET(buffer: Buffer): boolean {
const signature = Buffer.from([0x50, 0x41, 0x52, 0x31]);
return buffer.subarray(0, 4).equals(signature);
}
Go
func IsPARQUET(data []byte) bool {
signature := []byte{0x50, 0x41, 0x52, 0x31}
if len(data) < 4 {
return false
}
return bytes.Equal(data[:4], signature)
}
API Endpoint
GET
/api/v1/parquet
curl https://filesignature.org/api/v1/parquet