Deflate Compression

Question 1

What is Deflate? (Definition)

Answer

Deflate (sometimes spelled DEFLATE or Flate) is a lossless data compression algorithm that has become a universal standard.

Its principle combines two techniques: the LZ77 algorithm to identify and eliminate redundancies, followed by Huffman coding to represent the data more compactly.

Formally described in RFC 1951, it is the cornerstone of ubiquitous formats such as ZIP, PNG, and HTTP compression.

Question 2

How does compression with Deflate work?

Answer

Deflate compression is performed in three sequential steps:

— LZ77 analysis (sliding window): Examine the data with a search buffer to identify repeating character sequences. Replace these sequences with a short reference (a distance and a length) pointing to their first occurrence.

Example: To compress DECODEAVECDCODE, scan the sentence and find repetitions like CODE, which appears twice. LZ77 replaces it with a reference: (distance=9, length=4). The intermediate stream becomes: D,E,C,O,D,E,A,V,E,C,(10-4)

— Huffman coding: Compress the output of the LZ77 step (raw characters, lengths, and distances) a second time by assigning binary codes of varying lengths. The most frequent symbols receive the shortest codes.

Example: Count the frequency of symbols in this new stream. E, D, and C are the most frequent (E could become 01, D -> 100, etc.), and longer codes are associated with the others.

— Block structuring: Organize everything into a series of compressed data blocks. Each block has a header indicating the type of Huffman coding used (static, dynamic, or uncompressed).

Question 3

How does decompression with Deflate work?

Answer

Decompression is the reverse process of compression, and Deflate makes it very efficient:

— Read the block header to determine how the data is encoded (specifically, the Huffman table type).

— Reconstruct the Huffman trees (if the block uses dynamic tables) from the information stored in the stream.

— Read and decode the data stream: Use the Huffman trees to convert the bits into symbol sequences (raw characters, lengths, and distances).

— Apply LZ77 in reverse: For each reference (distance, length) read, copy the corresponding sequence from the already decompressed data (the output buffer) to reconstruct the original file.

Question 4

What are the three types of blocks in a Deflate stream?

Answer

Uncompressed block (BTYPE=00): The data is stored as is, with some padding to fit the bytes. Useful when a data segment is already compressed or random (and therefore incompressible).

Compressed block with static Huffman tables (BTYPE=01): Uses predefined Huffman trees optimized for generic text data. Saves space by not having to transmit the tables.

Compressed block with dynamic Huffman tables (BTYPE=10): Transmits Huffman trees optimized specifically for the block data just before the data. This is the most flexible and generally offers the best compression.

Question 5

What is the difference between Deflate, Zlib and GZIP?

Answer

It is crucial to distinguish the compression algorithm from its packaging:

— Deflate (RFC 1951): This is the raw compression algorithm, the data itself.

— Zlib (RFC 1950): This is a lightweight packaging format designed for in-memory data streams. It adds a small header and an Adler-32 checksum to verify the integrity of the compressed data.

— GZIP (RFC 1952): This is a more comprehensive file format, designed for compressing individual files. It adds a header containing the original file name and timestamp, and uses a more robust CRC-32 checksum to verify the integrity of the original data after decompression.

In summary: GZIP and Zlib both use the Deflate algorithm to compress the core data, but they package it differently depending on the intended use.

Question 6

How to recognize a Deflate stream? (Identification)

Answer

A raw deflate stream has no universal marker.

Therefore, it's difficult to find the beginning of the stream by searching for the structure of an initial deflate block (the first bits indicate BFINAL and BTYPE).

In practice, you rarely encounter a bare deflate stream. It's almost always encapsulated in a container such as a ZIP or PNG file, or in an HTTP stream.

The simplest approach is therefore to check the header of the container format (ZIP, PNG, etc.), which is easily identifiable.

Question 7

When was Deflate invented?

Answer

Deflate was designed in the early 1990s by Phil Katz to replace the patented algorithms used in early versions of PKZIP.

Its standardization in RFCs dates back to 1996, which largely contributed to its adoption in formats such as ZIP, PNG, HTTP (Content-Encoding: deflate), and others.

Deflate Compression

Deflate Decompressor

Metadata Parser

Deflate Compressor

Answers to Questions (FAQ)

What is Deflate? (Definition)

How does compression with Deflate work?

How does decompression with Deflate work?

What are the three types of blocks in a Deflate stream?

What is the difference between Deflate, Zlib and GZIP?

How to recognize a Deflate stream? (Identification)

When was Deflate invented?

Source code

Cite dCode

Need Help ?

Questions / Comments