Skip to main content
Version: 1.6.1

Avro

Apache Avro is a data serialization system that provides rich data structures and a compact, fast, binary data format. Originally developed within the Apache Hadoop ecosystem, Avro is designed for schema evolution and language-neutral data exchange.

Binary Layout

SectionInternal NameDescriptionPossible Values / Format
File Headermagic4-byte magic number identifying Avro filesASCII: Obj followed by 1 byte (hex: 4F 62 6A 01)
metaMetadata map storing key-value pairs (e.g., schema, codec)Map of string keys to byte values (e.g., "avro.schema" → JSON schema string)
sync16-byte random sync marker used between blocks16 random bytes (unique per file)
Data BlockblockCountNumber of records in the blockLong (variable-length zigzag encoding)
blockSizeSize in bytes of the serialized records (after compression, if any)Long
blockDataSerialized records (optionally compressed)Binary-encoded data per schema
syncSync marker repeated after each blockSame 16-byte value as in header

Schema Types (Stored in Metadata)

TypeInternal NameDescriptionExample / Format
Primitivenull, boolean, int, long, float, double, bytes, stringBasic types`"type": "string"
RecordrecordNamed collection of fields{ "type": "record", "name": "Person", "fields": [...] }
EnumenumNamed set of symbols{ "type": "enum", "name": "Suit", "symbols": ["SPADES", "HEARTS"] }
ArrayarrayOrdered list of items{ "type": "array", "items": "string" }
MapmapKey-value pairs with string keys{ "type": "map", "values": "int" }
UnionJSON arrayMultiple possible types[ "null", "string" ]
FixedfixedFixed-size byte array{ "type": "fixed", "name": "md5", "size": 16 }

Metadata Keys (in meta)

KeyDescriptionExample Value
avro.schemaJSON-encoded schemaJSON string defining the schema
avro.codecCompression codec used (optional)"null" (default), "deflate", "snappy", "bzip2", "xz", "zstandard"

Compression Codecs

Required Codecs

CodecDescriptionBest For
nullNo compression appliedSmall files or testing
deflateStandard ZIP compression (RFC 1951)General-purpose compression

Optional Codecs

CodecDescriptionBest For
snappyFast compression/decompression with CRC32 checksumReal-time streaming applications
bzip2High compression ratioStorage-constrained environments
xzXZ compression libraryMaximum compression efficiency
zstandardFacebook's Zstandard compression with configurable levelsBest balance of speed and compression ratio