LZ4 v1.9.0 Release NotesRelease Date: 2019-04-16 // almost 2 years ago
Warning : this version has a known bug in the decompression function which makes it read a few bytes beyond input limit. Upgrade to v1.9.1 is recommended.
🚀 LZ4 v1.9.0 is a performance focused release, also offering minor API updates.
Decompression speed improvements
⚡️ Dave Watson (@djwatson) managed to carefully optimize the LZ4 decompression hot loop, offering substantial speed improvements on x86 and x64 platforms.
🐧 Here are some benchmark running on a Core i7-9700K, source compiled using
gcc v8.2.0on Ubuntu 18.10 "Cosmic Cuttlefish" (
Linux 4.18.0-17-generic) :
Version v1.8.3 v1.9.0 Improvement enwik8 4090 MB/s 4560 MB/s +12% calgary.tar 4320 MB/s 4860 MB/s +13% silesia.tar 4210 MB/s 4970 MB/s +18%
Given that decompression speed has always been a strong point of
lz4, the improvement is quite substantial.
The new decoding loop is automatically enabled on x64 and x86.
For other cpu types, since our testing capabilities are more limited, the new decoding loop is disabled by default. However, anyone can manually enable it, by using the build macro
LZ4_FAST_DEC_LOOP, which accepts values
1. The outcome will vary depending on exact target and build chains. For example, in our limited tests with ARM platforms, we found that benefits vary strongly depending on cpu manufacturer, chip model, and compiler version, making it difficult to offer a "generic" statement. ARM situation may prove extreme though, due to the proliferation of variants available. Other cpu types may prove easier to assess.
⚡️ API updates
_destSize()compression variants have been promoted to stable status.
🐧 These variants reverse the logic, by trying to fit as much input data as possible into a fixed memory budget. This is used for example in WiredTiger and EroFS, which cram as much data as possible into the size of a physical sector, for improved storage density.
When compressing small inputs, the fixed cost of clearing the compression's internal data structures can become a significant fraction of the compression cost. In
v1.8.2, new LZ4 entry points have been introduced to perform this initialization at effectively zero cost.
LZ4_resetStreamHC_fast()are now promoted into stable.
They are supplemented by new entry points,
LZ4_initStream()and its corresponding
HCvariant, which must be used on any uninitialized memory segment that will be converted into an LZ4 state. After that, only
reset*_fast()is needed to start some new compression job re-using the same context. This proves especially effective when compressing a lot of small data.
decompress*_fast()variants have been moved into the deprecate section.
🔒 While they offer slightly faster decompression speed (~+5%), they are also unprotected against malicious inputs, resulting in security liability. There are some limited cases where this property could prove acceptable (perfectly controlled environment, same producer / consumer), but in most cases, the risk is not worth the benefit.
🗄 We want to discourage such usage as clearly as possible, by pushing the
_fast()variant into deprecation area.
🚀 For the time being, they will not yet generate deprecation warnings when invoked, to give time to existing applications to move towards
decompress*_safe(). But this is the next stage, and is likely to happen in a future release.
LZ4_resetStreamHC()have also been moved into the deprecate section, to emphasize the preference towards
LZ4_resetStream_fast(). Their real equivalent are actually
LZ4_initStreamHC(), which are more generic (can accept any memory area to initialize) and safer (control size and alignment). Also, the naming makes it clearer when to use
initStream()and when to use
🔄 Changes list
🚀 This release brings an assortment of small improvements and bug fixes, as detailed below :
- perf: large decompression speed improvement on x86/x64 (up to +20%) by @djwatson
- api : changed :
_destSize()compression variants are promoted to stable API
- api : new :
- 🐎 api : changed :
LZ4_resetStream(HC)as recommended reset function, for better performance on small data
- 👍 cli : support custom block sizes, by @blezsan
- 🏗 build: source code can be amalgamated, by Bing Xu
- 🏗 build: added meson build, by @lzutao
- build: new build macros :
- install: MidnightBSD, by @laffer1
- 🏁 install: msys2 on Windows 10, by @vtorri