RocksDB v6.6.0 Release Notes

Release Date: 2019-11-25 // over 4 years ago
  • ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix data corruption caused by output of intra-L0 compaction on ingested file not being placed in correct order in L0.
    • ๐Ÿ›  Fix a data race between Version::GetColumnFamilyMetaData() and Compaction::MarkFilesBeingCompacted() for access to being_compacted (#6056). The current fix acquires the db mutex during Version::GetColumnFamilyMetaData(), which may cause regression.
    • Fix a bug in DBIter that is_blob_ state isn't updated when iterating backward using seek.
    • ๐Ÿ›  Fix a bug when format_version=3, partitioned filters, and prefix search are used in conjunction. The bug could result into Seek::(prefix) returning NotFound for an existing prefix.
    • ๐Ÿ‘€ Revert the feature "Merging iterator to avoid child iterator reseek for some cases (#5286)" since it might cause strong results when reseek happens with a different iterator upper bound.
    • ๐Ÿ›  Fix a bug causing a crash during ingest external file when background compaction cause severe error (file not found).
    • ๐Ÿ›  Fix a bug when partitioned filters and prefix search are used in conjunction, ::SeekForPrev could return invalid for an existing prefix. ::SeekForPrev might be called by the user, or internally on ::Prev, or within ::Seek if the return value involves Delete or a Merge operand.
    • ๐Ÿ›  Fix OnFlushCompleted fired before flush result persisted in MANIFEST when there's concurrent flush job. The bug exists since OnFlushCompleted was introduced in rocksdb 3.8.
    • ๐Ÿ›  Fixed an sst_dump crash on some plain table SST files.
    • ๐Ÿ›  Fixed a memory leak in some error cases of opening plain table SST files.
    • ๐Ÿ›  Fix a bug when a crash happens while calling WriteLevel0TableForRecovery for multiple column families, leading to a column family's log number greater than the first corrutped log number when the DB is being opened in PointInTime recovery mode during next recovery attempt (#5856).

    ๐Ÿ†• New Features

    • Universal compaction to support options.periodic_compaction_seconds. A full compaction will be triggered if any file is over the threshold.
    • ๐Ÿ“‡ GetLiveFilesMetaData and GetColumnFamilyMetaData now expose the file number of SST files as well as the oldest blob file referenced by each SST.
    • ๐Ÿ‘ A batched MultiGet API (DB::MultiGet()) that supports retrieving keys from multiple column families.
    • ๐Ÿš€ Full and partitioned filters in the block-based table use an improved Bloom filter implementation, enabled with format_version 5 (or above) because previous releases cannot read this filter. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single (full) filter. For example, the new Bloom filter has the same false positive rate at 9.55 bits per key as the old one at 10 bits per key, and a lower false positive rate at 16 bits per key than the old one at 100 bits per key.
    • ๐Ÿ— Added AVX2 instructions to USE_SSE builds to accelerate the new Bloom filter and XXH3-based hash function on compatible x86_64 platforms (Haswell and later, ~2014).
    • Support options.ttl or options.periodic_compaction_seconds with options.max_open_files = -1. File's oldest ancester time and file creation time will be written to manifest. If it is availalbe, this information will be used instead of creation_time and file_creation_time in table properties.
    • Setting options.ttl for universal compaction now has the same meaning as setting periodic_compaction_seconds.
    • ๐Ÿ“‡ SstFileMetaData also returns file creation time and oldest ancester time.
    • ๐Ÿ‘€ The sst_dump command line tool recompress command now displays how many blocks were compressed and how many were not, in particular how many were not compressed because the compression ratio was not met (12.5% threshold for GoodCompressionRatio), as seen in the number.block.not_compressed counter stat since version 6.0.0.
    • ๐Ÿ“‡ The block cache usage is now takes into account the overhead of metadata per each entry. This results into more accurate management of memory. A side-effect of this feature is that less items are fit into the block cache of the same size, which would result to higher cache miss rates. This can be remedied by increasing the block cache size or passing kDontChargeCacheMetadata to its constuctor to restore the old behavior.
    • When using BlobDB, a mapping is maintained and persisted in the MANIFEST between each SST file and the oldest non-TTL blob file it references.
    • 0๏ธโƒฃ db_bench now supports and by default issues non-TTL Puts to BlobDB. TTL Puts can be enabled by specifying a non-zero value for the blob_db_max_ttl_range command line parameter explicitly.
    • ๐Ÿ–จ sst_dump now supports printing BlobDB blob indexes in a human-readable format. This can be enabled by specifying the decode_blob_index flag on the command line.
    • A number of new information elements are now exposed through the EventListener interface. For flushes, the file numbers of the new SST file and the oldest blob file referenced by the SST are propagated. For compactions, the level, file number, and the oldest blob file referenced are passed to the client for each compaction input and output file.

    Public API Change

    • ๐Ÿš€ RocksDB release 4.1 or older will not be able to open DB generated by the new release. 4.2 was released on Feb 23, 2016.
    • ๐Ÿ’… TTL Compactions in Level compaction style now initiate successive cascading compactions on a key range so that it reaches the bottom level quickly on TTL expiry. creation_time table property for compaction output files is now set to the minimum of the creation times of all compaction inputs.
    • With FIFO compaction style, options.periodic_compaction_seconds will have the same meaning as options.ttl. Whichever stricter will be used. With the default options.periodic_compaction_seconds value with options.ttl's default of 0, RocksDB will give a default of 30 days.
    • Added an API GetCreationTimeOfOldestFile(uint64_t* creation_time) to get the file_creation_time of the oldest SST file in the DB.
    • FilterPolicy now exposes additional API to make it possible to choose filter configurations based on context, such as table level and compaction style. See LevelAndStyleCustomFilterPolicy in db_bloom_filter_test.cc. While most existing custom implementations of FilterPolicy should continue to work as before, those wrapping the return of NewBloomFilterPolicy will require overriding new function GetBuilderWithContext(), because calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy is no longer supported.
    • ๐Ÿ— An unlikely usage of FilterPolicy is no longer supported. Calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy will now cause an assertion violation in debug builds, because RocksDB has internally migrated to a more elaborate interface that is expected to evolve further. Custom implementations of FilterPolicy should work as before, except those wrapping the return of NewBloomFilterPolicy, which will require a new override of a protected function in FilterPolicy.
    • NewBloomFilterPolicy now takes bits_per_key as a double instead of an int. This permits finer control over the memory vs. accuracy trade-off in the new Bloom filter implementation and should not change source code compatibility.
    • The option BackupableDBOptions::max_valid_backups_to_open is now only used when opening BackupEngineReadOnly. When opening a read/write BackupEngine, anything but the default value logs a warning and is treated as the default. This change ensures that backup deletion has proper accounting of shared files to ensure they are deleted when no longer referenced by a backup.
    • Deprecate snap_refresh_nanos option.
    • โž• Added DisableManualCompaction/EnableManualCompaction to stop and resume manual compaction.
    • โž• Add TryCatchUpWithPrimary() to StackableDB in non-LITE mode.
    • โž• Add a new Env::LoadEnv() overloaded function to return a shared_ptr to Env.
    • Flush sets file name to "(nil)" for OnTableFileCreationCompleted() if the flush does not produce any L0. This can happen if the file is empty thus delete by RocksDB.

    0๏ธโƒฃ Default Option Changes

    • Changed the default value of periodic_compaction_seconds to UINT64_MAX - 1 which allows RocksDB to auto-tune periodic compaction scheduling. When using the default value, periodic compactions are now auto-enabled if a compaction filter is used. A value of 0 will turn off the feature completely.
    • ๐Ÿ”„ Changed the default value of ttl to UINT64_MAX - 1 which allows RocksDB to auto-tune ttl value. When using the default value, TTL will be auto-enabled to 30 days, when the feature is supported. To revert the old behavior, you can explicitly set it to 0.

    ๐ŸŽ Performance Improvements

    • For 64-bit hashing, RocksDB is standardizing on a slightly modified preview version of XXH3. This function is now used for many non-persisted hashes, along with fastrange64() in place of the modulus operator, and some benchmarks show a slight improvement.
    • ๐Ÿ‘€ Level iterator to invlidate the iterator more often in prefix seek and the level is filtered out by prefix bloom.