Project Nayuki


Simple DEFLATE decompressor

Overview

This project is a clear implementation of an inflater for the DEFLATE compression format in less than 1000 lines of well-commented code, suitable as a reference for educational purposes. It is provided in Java, Python, C++, TypeScript, and is open source. The code can be used for study, and as a solid basis for modification and extension. Consequently, the codebase optimizes for readability and avoids fancy logic, and does not target the best speed/memory/performance.

The DEFLATE format is a compression standard specified in RFC 1951 in the year . The format has widespread use, and can be found in protocols and file formats such as ZIP, gzip, PNG, HTTP, and Git. The gzip container format is specified in RFC 1952 (plain text or PDF).

Source code

Browse the project’s source code at GitHub: https://github.com/nayuki/Simple-DEFLATE-decompressor

Or download a ZIP of all the files: https://github.com/nayuki/Simple-DEFLATE-decompressor/archive/master.zip

The project package includes these main items:

  • Decompressor, which provides a library function to decompress a raw DEFLATE data stream (having no headers or wrappers).

  • GzipDecompress, which is a command line program that decompresses a gzip file. It handles the gzip headers and footers, and uses the functionality Decompressor for the main data processing.

  • A very stripped-down copy of classes from my reference Huffman coding project.

  • Test vectors for the decompressor, using the JUnit testing framework.

Notes

  • DEFLATE is popular enough that essentially every important programming language has a library for it. For example, Java has java.util.zip.Deflater, Python has zlib, and zlib is available for C. While this project provides a compliant implementation, my code runs slowly so it should be used as an educational tool only.

  • My implementation dynamic allocates and deallocates memory for each dynamic Huffman block, for the sake of clarity. This is not strictly necessary because the memory usage for this feature is bounded and relatively small. A more optimized implementation would allocate those objects/arrays once per decompressor and reuse them across blocks.

More info