zstd v1.4.5 Release Notes
Release Date: 2020-05-22 // almost 4 years ago-
π Zstd v1.4.5 Release Notes
π This is a fairly important release which includes performance improvements and new major CLI features. It also fixes a few corner cases, making it a recommended upgrade.
Faster Decompression Speed
Decompression speed has been improved again, thanks to great contributions from @terrelln.
As usual, exact mileage varies depending on files and compilers.
Forx64
cpus, expect a speed bump of at least +5%, and up to +10% in favorable cases.
ARM
cpus receive more benefit, with speed improvements ranging from +15% vicinity, and up to +50% for certain SoCs and scenarios (ARM
βs situation is more complex due to larger differences in SoC designs).For illustration, some benchmarks run on a modern
x64
platform usingzstd -b
compiled withgcc
v9.3.0 :v1.4.4 v1.4.5 silesia.tar 1568 MB/s 1653 MB/s --- --- --- enwik8 1374 MB/s 1469 MB/s calgary.tar 1511 MB/s 1610 MB/s Same platform, using
clang
v10.0.0 compiler :v1.4.4 v1.4.5 silesia.tar 1439 MB/s 1496 MB/s --- --- --- enwik8 1232 MB/s 1335 MB/s calgary.tar 1361 MB/s 1457 MB/s Simplified integration
Presuming a project needs to integrate
libzstd
's source code (as opposed to linking a pre-compiled library), the/lib
source directory can be copy/pasted into target project. Then the local build system must setup a few include directories. Some setups are automatically provided in prepared build scripts, such asMakefile
, but any other 3rd party build system must do it on its own.
This integration is now simplified, thanks to @felixhandte, by making all dependencies within/lib
relative, meaning itβs only necessary to setup include directories for the*.h
header files that are directly included into target project (typicallyzstd.h
). Even that task can be circumvented by copy/pasting the*.h
into already established include directories.Alternatively, if you are a fan of one-file integration strategy, @cwoffenden has extended his one-file decoder script into a full feature one-file compression library. The script
create_single_file_library.sh
will generate a filezstd.c
, which contains all selected elements from the library (by default, compression and decompression). Itβs then enough to import justzstd.h
and the generatedzstd.c
into target project to access all included capabilities.--patch-from
π» Zstandard CLI is introducing a new command line option
--patch-from
, which leverages existing compressors, dictionaries and long range match finder to deliver a high speed engine for producing and applying patches to files.π
--patch-from
is based on dictionary compression. It will consider a previous version of a file as a dictionary, to better compress a new version of same file. This operation preserves fastzstd
speeds at lower compression levels. To this ends, it also increases the previous maximum limit for dictionaries from 32 MB to 2 GB, and automatically uses the long range match finder when needed (though it can also be manually overruled).
--patch-from
can also be combined with multi-threading mode at a very minimal compression ratio loss.Example usage:
`# create the patch zstd --patch-from=<oldfile> <newfile> -o <patchfile> # apply the patch zstd -d --patch-from=<oldfile> <patchfile> -o <newfile>`
Benchmarks:
β We comparedzstd
tobsdiff
, a popular industry grade diff engine. Our test corpus were tarballs of different versions of source code from popular GitHub repositories. Specifically:`repos = { # ~31mb (small file) "zstd": {"url": "https://github.com/facebook/zstd", "dict-branch": "refs/tags/v1.4.2", "src-branch": "refs/tags/v1.4.3"}, # ~273mb (medium file) "wordpress": {"url": "https://github.com/WordPress/WordPress", "dict-branch": "refs/tags/5.3.1", "src-branch": "refs/tags/5.3.2"}, # ~1.66gb (large file) "llvm": {"url": "https://github.com/llvm/llvm-project", "dict-branch": "refs/tags/llvmorg-9.0.0", "src-branch": "refs/tags/llvmorg-9.0.1"} }`
--patch-from
on level 19 (with chainLog=30 and targetLength=4kb) is comparable withbsdiff
when comparing patch sizes.
--patch-from
greatly outperformsbsdiff
in speed even on its slowest setting of level 19 boasting an average speedup of ~7X.--patch-from
is >200X faster on level 1 and >100X faster (shown below) on level 3 vsbsdiff
while still delivering patch sizes less than 0.5% of the original file size.And of course, there is no change to the fast zstd decompression speed.
--filelist=
Finally,
--filelist=
is a new CLI capability, which makes it possible to pass a list of files to operate upon from a file,
π» as opposed to listing all target files solely on the command line.
This makes it possible to prepare a list offline, save it into a file, and then provide the prepared list tozstd
.
π» Another advantage is that this method circumvents command line size limitations, which can become a problem when operating on very large directories (such situation can typically happen with shell expansion).
π In contrast, passing a very large list of filenames from within a file is free of such size limitation.Full List
- perf: Improved decompression speed (x64 >+5%, ARM >+15%), by @terrelln
- perf: Automatically downsizes
ZSTD_DCtx
when too large for too long (#2069, by @bimbashreshta) - perf: Improved fast compression speed on
aarch64
(#2040, ~+3%, by @caoyzh) - perf: Small level 1 compression speed gains (depending on compiler)
- π fix: Compression ratio regression on huge files (> 3 GB) using high levels (
--ultra
) and multithreading, by @terrelln - api:
ZDICT_finalizeDictionary()
is promoted to stable (#2111) - api: new experimental parameter
ZSTD_d_stableOutBuffer
(#2094) - π build: Generate a single-file
libzstd
library (#2065, by @cwoffenden) - π build: Relative includes, no longer require
-I
flags forzstd
lib subdirs (#2103, by @felixhandte) - π build:
zstd
now compiles cleanly under-pedantic
(#2099) - π build:
zstd
now compiles with make-4.3 - π§ build: Support
mingw
cross-compilation from Linux, by @Ericson2314 - π build: Meson multi-thread build fix on windows
- π build: Some misc
icc
fixes backed by new ci test on travis - cli: New
--patch-from
command, create and apply patches from files, by @bimbashreshta - cli:
--filelist=
: Provide a list of files to operate upon from a file - cli:
-b
can now benchmark multiple files in decompression mode - cli: New
--no-content-size
command - 0οΈβ£ cli: New
--show-default-cparams
command - misc: new diagnosis tool,
checked_flipped_bits
, incontrib/
, by @felixhandte - misc: Extend largeNbDicts benchmark to compression
- misc: experimental edit-distance match finder in
contrib/
- π doc: Improved beginner
CONTRIBUTING.md
docs - doc: New issue templates for zstd
Previous changes from v1.4.4
-
π This release includes some major performance improvements and new CLI features, which make it a recommended upgrade.
Faster Decompression Speed
Decompression speed has been substantially improved, thanks to @terrelln. Exact mileage obviously varies depending on files and scenarios, but the general expectation is a bump of about +10%. The benefit is considered applicable to all scenarios, and will be perceptible for most usages.
Some benchmark figures for illustration:
v1.4.3 v1.4.4 silesia.tar 1440 MB/s 1600 MB/s enwik8 1225 MB/s 1390 MB/s calgary.tar 1360 MB/s 1530 MB/s Faster Compression Speed when Re-Using Contexts
In server workloads (characterized by very high compression volume of relatively small inputs), the allocation and initialization of
zstd
's internal datastructures can become a significant part of the cost of compression. For this reason,zstd
has long had an optimization (which we recommended for large-scale users, perhaps with something like this): when you provide an already-usedZSTD_CCtx
to a compression operation,zstd
tries to re-use the existing data structures, if possible, rather than re-allocate and re-initialize them.π Historically, this optimization could avoid re-allocation most of the time, but required an exact match of internal parameters to avoid re-initialization. In this release, @felixhandte removed the dependency on matching parameters, allowing the full context re-use optimization to be applied to effectively all compressions. Practical workloads on small data should expect a ~3% speed-up.
π In addition to improving average performance, this change also has some nice side-effects on the extremes of performance.
- π On the fast end, it is now easier to get optimal performance from
zstd
. In particular, it is no longer necessary to do careful tracking and matching of contexts to compressions based on detailed parameters (as discussed for example in #1796). Instead, straightforwardly reusing contexts is now optimal. - Second, this change ameliorates some rare, degenerate scenarios (e.g., high volume streaming compression of small inputs with varying, high compression levels), in which it was possible for the allocation and initialization work to vastly overshadow the actual compression work. These cases are up to 40x faster, and now perform in-line with similar happy cases.
Dictionaries and Large Inputs
In theory, using a dictionary should always be beneficial. However, due to some long-standing implementation limitations, it can actually be detrimental. Case in point: by default, dictionaries are prepared to compress small data (where they are most useful). When this prepared dictionary is used to compress large data, there is a mismatch between the prepared parameters (targeting small data) and the ideal parameters (that would target large data). This can cause dictionaries to counter-intuitively result in a lower compression ratio when compressing large inputs.
Starting with v1.4.4, using a dictionary with a very large input will no longer be detrimental. Thanks to a patch from @senhuang42, whenever the library notices that input is sufficiently large (relative to dictionary size), the dictionary is re-processed, using the optimal parameters for large data, resulting in improved compression ratio.
The capability is also exposed, and can be manually triggered using
ZSTD_dictForceLoad
.π New commands
zstd
CLI extends its capabilities, providing new advanced commands, thanks to great contributions :zstd
generated files (compressed or decompressed) can now be automatically stored into a different directory than the source one, using--output-dir-flat=DIR
command, provided by @senhuang42 .- π Itβs possible to inform
zstd
about the size of data coming fromstdin
. @nmagerko proposed 2 new commands, allowing users to provide the exact stream size (--stream-size=#
) or an approximative one (--size-hint=#
). Both only make sense when compressing a data stream from a pipe (such asstdin
), since for a real file,zstd
obtains the exact source size from the file system. Providing a source size allowszstd
to better adapt internal compression parameters to the input, resulting in better performance and compression ratio. Additionally, providing the precise size makes it possible to embed this information in the compressed frame header, which also allows decoder optimizations. - In situations where the same directory content get regularly compressed, with the intention to only compress new files not yet compressed, itβs necessary to filter the file list, to exclude already compressed files. This process is simplified with command
--exclude-compressed
, provided by @shashank0791 . As the name implies, it simply excludes all compressed files from the list to process.
π Single-File Decoder with Web Assembly
π Letβs complete the picture with an impressive contribution from @cwoffenden.
libzstd
has long offered the capability to build only the decoder, in order to generate smaller binaries that can be more easily embedded into memory-constrained devices and applications.π @cwoffenden built on this capability and offers a script creating a single-file decoder, as an amalgamated variant of reference Zstandardβs decoder. The package is completed with a nice build script, which compiles the one-file decoder into
WASM
code, for embedding into web application, and even tests it.As a capability example, check out the awesome WebGL demo provided by @cwoffenden in
/contrib/single_file_decoder/examples
directory!Full List
- perf: Improved decompression speed, by > 10%, by @terrelln
- π perf: Better compression speed when re-using a context, by @felixhandte
- perf: Fix compression ratio when compressing large files with small dictionary, by @senhuang42
- perf:
zstd
reference encoder can generateRLE
blocks, by @bimbashrestha - perf: minor generic speed optimization, by @davidbolvansky
- π api: new ability to extract sequences from the parser for analysis, by @bimbashrestha
- π api: fixed decoding of magic-less frames, by @terrelln
- api: fixed
ZSTD_initCStream_advanced()
performance with fast modes, reported by @QrczakMK - π cli: Named pipes support, by @bimbashrestha
- π cli: short tar's extension support, by @stokito
- cli: command
--output-dir-flat=DIE
, generates target files into requested directory, by @senhuang42 - cli: commands
--stream-size=#
and--size-hint=#
, by @nmagerko - cli: command
--exclude-compressed
, by @shashank0791 - β
cli: faster
-t
test mode - cli: improved some error messages, by @vangyzen
- π cli: fix rare deadlock condition within dictionary builder, by @terrelln
- π build: single-file decoder with emscripten compilation script, by @cwoffenden
- π build: fixed
zlibWrapper
compilation on Visual Studio, reported by @bluenlive - π build: fixed deprecation warning for certain gcc version, reported by @jasonma163
- π build: fix compilation on old gcc versions, by @cemeyer
- π build: improved installation directories for cmake script, by Dmitri Shubin
- π pack: modified
pkgconfig
, for better integration into openwrt, requested by @neheb - misc: Improved documentation :
ZSTD_CLEVEL
,DYNAMIC_BMI2
,ZSTD_CDict
, function deprecation, zstd format - π misc: fixed educational decoder : accept larger literals section, and removed
UNALIGNED()
macro
- π On the fast end, it is now easier to get optimal performance from