zstd v1.4.4 Release Notes

Release Date: 2019-11-05 // almost 2 years ago
  • πŸš€ This release includes some major performance improvements and new CLI features, which make it a recommended upgrade.

    Faster Decompression Speed

    Decompression speed has been substantially improved, thanks to @terrelln. Exact mileage obviously varies depending on files and scenarios, but the general expectation is a bump of about +10%. The benefit is considered applicable to all scenarios, and will be perceptible for most usages.

    Some benchmark figures for illustration:

    v1.4.3 v1.4.4
    silesia.tar 1440 MB/s 1600 MB/s
    enwik8 1225 MB/s 1390 MB/s
    calgary.tar 1360 MB/s 1530 MB/s

    Faster Compression Speed when Re-Using Contexts

    In server workloads (characterized by very high compression volume of relatively small inputs), the allocation and initialization of zstd's internal datastructures can become a significant part of the cost of compression. For this reason, zstd has long had an optimization (which we recommended for large-scale users, perhaps with something like this): when you provide an already-used ZSTD_CCtx to a compression operation, zstd tries to re-use the existing data structures, if possible, rather than re-allocate and re-initialize them.

    πŸš€ Historically, this optimization could avoid re-allocation most of the time, but required an exact match of internal parameters to avoid re-initialization. In this release, @felixhandte removed the dependency on matching parameters, allowing the full context re-use optimization to be applied to effectively all compressions. Practical workloads on small data should expect a ~3% speed-up.

    🐎 In addition to improving average performance, this change also has some nice side-effects on the extremes of performance.

    • 🐎 On the fast end, it is now easier to get optimal performance from zstd. In particular, it is no longer necessary to do careful tracking and matching of contexts to compressions based on detailed parameters (as discussed for example in #1796). Instead, straightforwardly reusing contexts is now optimal.
    • Second, this change ameliorates some rare, degenerate scenarios (e.g., high volume streaming compression of small inputs with varying, high compression levels), in which it was possible for the allocation and initialization work to vastly overshadow the actual compression work. These cases are up to 40x faster, and now perform in-line with similar happy cases.

    Dictionaries and Large Inputs

    In theory, using a dictionary should always be beneficial. However, due to some long-standing implementation limitations, it can actually be detrimental. Case in point: by default, dictionaries are prepared to compress small data (where they are most useful). When this prepared dictionary is used to compress large data, there is a mismatch between the prepared parameters (targeting small data) and the ideal parameters (that would target large data). This can cause dictionaries to counter-intuitively result in a lower compression ratio when compressing large inputs.

    Starting with v1.4.4, using a dictionary with a very large input will no longer be detrimental. Thanks to a patch from @senhuang42, whenever the library notices that input is sufficiently large (relative to dictionary size), the dictionary is re-processed, using the optimal parameters for large data, resulting in improved compression ratio.

    The capability is also exposed, and can be manually triggered using ZSTD_dictForceLoad.

    πŸ†• New commands

    zstd CLI extends its capabilities, providing new advanced commands, thanks to great contributions :

    • zstd generated files (compressed or decompressed) can now be automatically stored into a different directory than the source one, using --output-dir-flat=DIR command, provided by @senhuang42 .
    • 🐎 It’s possible to inform zstd about the size of data coming from stdin . @nmagerko proposed 2 new commands, allowing users to provide the exact stream size (--stream-size=# ) or an approximative one (--size-hint=#). Both only make sense when compressing a data stream from a pipe (such as stdin), since for a real file, zstd obtains the exact source size from the file system. Providing a source size allows zstd to better adapt internal compression parameters to the input, resulting in better performance and compression ratio. Additionally, providing the precise size makes it possible to embed this information in the compressed frame header, which also allows decoder optimizations.
    • In situations where the same directory content get regularly compressed, with the intention to only compress new files not yet compressed, it’s necessary to filter the file list, to exclude already compressed files. This process is simplified with command --exclude-compressed, provided by @shashank0791 . As the name implies, it simply excludes all compressed files from the list to process.

    🌐 Single-File Decoder with Web Assembly

    πŸ— Let’s complete the picture with an impressive contribution from @cwoffenden. libzstd has long offered the capability to build only the decoder, in order to generate smaller binaries that can be more easily embedded into memory-constrained devices and applications.

    πŸ— @cwoffenden built on this capability and offers a script creating a single-file decoder, as an amalgamated variant of reference Zstandard’s decoder. The package is completed with a nice build script, which compiles the one-file decoder into WASM code, for embedding into web application, and even tests it.

    As a capability example, check out the awesome WebGL demo provided by @cwoffenden in /contrib/single_file_decoder/examples directory!

    Full List

    • perf: Improved decompression speed, by > 10%, by @terrelln
    • πŸ‘ perf: Better compression speed when re-using a context, by @felixhandte
    • perf: Fix compression ratio when compressing large files with small dictionary, by @senhuang42
    • perf: zstd reference encoder can generate RLE blocks, by @bimbashrestha
    • perf: minor generic speed optimization, by @davidbolvansky
    • πŸ“œ api: new ability to extract sequences from the parser for analysis, by @bimbashrestha
    • πŸ›  api: fixed decoding of magic-less frames, by @terrelln
    • api: fixed ZSTD_initCStream_advanced() performance with fast modes, reported by @QrczakMK
    • πŸ‘ cli: Named pipes support, by @bimbashrestha
    • πŸ‘ cli: short tar's extension support, by @stokito
    • cli: command --output-dir-flat=DIE , generates target files into requested directory, by @senhuang42
    • cli: commands --stream-size=# and --size-hint=#, by @nmagerko
    • cli: command --exclude-compressed, by @shashank0791
    • βœ… cli: faster -t test mode
    • cli: improved some error messages, by @vangyzen
    • πŸ— cli: fix rare deadlock condition within dictionary builder, by @terrelln
    • πŸ— build: single-file decoder with emscripten compilation script, by @cwoffenden
    • πŸ— build: fixed zlibWrapper compilation on Visual Studio, reported by @bluenlive
    • πŸ— build: fixed deprecation warning for certain gcc version, reported by @jasonma163
    • πŸ— build: fix compilation on old gcc versions, by @cemeyer
    • πŸ— build: improved installation directories for cmake script, by Dmitri Shubin
    • πŸ‘ pack: modified pkgconfig, for better integration into openwrt, requested by @neheb
    • misc: Improved documentation : ZSTD_CLEVEL, DYNAMIC_BMI2, ZSTD_CDict, function deprecation, zstd format
    • 🚚 misc: fixed educational decoder : accept larger literals section, and removed UNALIGNED() macro