All Versions
109
Latest Version
Avg Release Cycle
29 days
Latest Release
-

Changelog History
Page 4

  • v6.9.0 Changes

    March 29, 2020

    Behavior changes

    • Since RocksDB 6.8, ttl-based FIFO compaction can drop a file whose oldest key becomes older than options.ttl while others have not. This fix reverts this and makes ttl-based FIFO compaction use the file's flush time as the criterion. This fix also requires that max_open_files = -1 and compaction_options_fifo.allow_compaction = false to function properly.

    Public API Change

    • ๐Ÿ›  Fix spelling so that API now has correctly spelled transaction state name COMMITTED, while the old misspelled COMMITED is still available as an alias.
    • โšก๏ธ Updated default format_version in BlockBasedTableOptions from 2 to 4. SST files generated with the new default can be read by RocksDB versions 5.16 and newer, and use more efficient encoding of keys in index blocks.
    • A new parameter CreateBackupOptions is added to both BackupEngine::CreateNewBackup and BackupEngine::CreateNewBackupWithMetadata, you can decrease CPU priority of BackupEngine's background threads by setting decrease_background_thread_cpu_priority and background_thread_cpu_priority in CreateBackupOptions.
    • โšก๏ธ Updated the public API of SST file checksum. Introduce the FileChecksumGenFactory to create the FileChecksumGenerator for each SST file, such that the FileChecksumGenerator is not shared and it can be more general for checksum implementations. Changed the FileChecksumGenerator interface from Value, Extend, and GetChecksum to Update, Finalize, and GetChecksum. Finalize should be only called once after all data is processed to generate the final checksum. Temproal data should be maintained by the FileChecksumGenerator object itself and finally it can return the checksum string.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug where range tombstone blocks in ingested files were cached incorrectly during ingestion. If range tombstones were read from those incorrectly cached blocks, the keys they covered would be exposed.
    • ๐Ÿ›  Fix a data race that might cause crash when calling DB::GetCreationTimeOfOldestFile() by a small chance. The bug was introduced in 6.6 Release.
    • Fix a bug where a boolean value optimize_filters_for_hits was for max threads when calling load table handles after a flush or compaction. The value is correct to 1. The bug should not cause user visible problems.
    • ๐Ÿ›  Fix a bug which might crash the service when write buffer manager fails to insert the dummy handle to the block cache.

    ๐ŸŽ Performance Improvements

    • In CompactRange, for levels starting from 0, if the level does not have any file with any key falling in the specified range, the level is skipped. So instead of always compacting from level 0, the compaction starts from the first level with keys in the specified range until the last such level.
    • โฌ‡๏ธ Reduced memory copy when reading sst footer and blobdb in direct IO mode.
    • โšก๏ธ When restarting a database with large numbers of sst files, large amount of CPU time is spent on getting logical block size of the sst files, which slows down the starting progress, this inefficiency is optimized away with an internal cache for the logical block sizes.

    ๐Ÿ†• New Features

    • ๐Ÿ‘€ Basic support for user timestamp in iterator. Seek/SeekToFirst/Next and lower/upper bounds are supported. Reverse iteration is not supported. Merge is not considered.
    • ๐Ÿ”’ When file lock failure when the lock is held by the current process, return acquiring time and thread ID in the error message.
    • Added a new option, best_efforts_recovery (default: false), to allow database to open in a db dir with missing table files. During best efforts recovery, missing table files are ignored, and database recovers to the most recent state without missing table file. Cross-column-family consistency is not guaranteed even if WAL is enabled.
    • options.bottommost_compression, options.compression_opts and options.bottommost_compression_opts are now dynamically changeable.
  • v6.8.1 Changes

    April 25, 2020

    6.8.1 (03/30/2020)

    Behavior changes

    • Since RocksDB 6.8.0, ttl-based FIFO compaction can drop a file whose oldest key becomes older than options.ttl while others have not. This fix reverts this and makes ttl-based FIFO compaction use the file's flush time as the criterion. This fix also requires that max_open_files = -1 and compaction_options_fifo.allow_compaction = false to function properly.

    6.8.0 (02/24/2020)

    Java API Changes

    • Major breaking changes to Java comparators, toward standardizing on ByteBuffer for performant, locale-neutral operations on keys (#6252).
    • โž• Added overloads of common API methods using direct ByteBuffers for keys and values (#2283).

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix incorrect results while block-based table uses kHashSearch, together with Prev()/SeekForPrev().
    • ๐Ÿ›  Fix a bug that prevents opening a DB after two consecutive crash with TransactionDB, where the first crash recovers from a corrupted WAL with kPointInTimeRecovery but the second cannot.
    • ๐Ÿ›  Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
    • Add DBOptions::skip_checking_sst_file_sizes_on_db_open. It disables potentially expensive checking of all sst file sizes in DB::Open().
    • โšก๏ธ BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
    • Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.
    • WriteBatchWithIndex::DeleteRange returns Status::NotSupported. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes rocksdb_writebatch_wi_delete_range, rocksdb_writebatch_wi_delete_range_cf, rocksdb_writebatch_wi_delete_rangev, and rocksdb_writebatch_wi_delete_rangev_cf. In Java, that includes WriteBatchWithIndex::deleteRange.
    • Assign new MANIFEST file number when caller tries to create a new MANIFEST by calling LogAndApply(..., new_descriptor_log=true). This bug can cause MANIFEST being overwritten during recovery if options.write_dbid_to_manifest = true and there are WAL file(s).

    ๐ŸŽ Performance Improvements

    • Perfom readahead when reading from option files. Inside DB, options.log_readahead_size will be used as the readahead size. In other cases, a default 512KB is used.

    Public API Change

    • The BlobDB garbage collector now emits the statistics BLOB_DB_GC_NUM_FILES (number of blob files obsoleted during GC), BLOB_DB_GC_NUM_NEW_FILES (number of new blob files generated during GC), BLOB_DB_GC_FAILURES (number of failed GC passes), BLOB_DB_GC_NUM_KEYS_RELOCATED (number of blobs relocated during GC), and BLOB_DB_GC_BYTES_RELOCATED (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: BLOB_DB_GC_NUM_KEYS_OVERWRITTEN, BLOB_DB_GC_NUM_KEYS_EXPIRED, BLOB_DB_GC_BYTES_OVERWRITTEN, BLOB_DB_GC_BYTES_EXPIRED, and BLOB_DB_GC_MICROS.
    • ๐ŸŒฒ Disable recycle_log_file_num when an inconsistent recovery modes are requested: kPointInTimeRecovery and kAbsoluteConsistency

    ๐Ÿ†• New Features

    • Added the checksum for each SST file generated by Flush or Compaction. Added sst_file_checksum_func to Options such that user can plugin their own SST file checksum function via override the FileChecksumFunc class. If user does not set the sst_file_checksum_func, SST file checksum calculation will not be enabled. The checksum information inlcuding uint32_t checksum value and a checksum function name (string). The checksum information is stored in FileMetadata in version store and also logged to MANIFEST. A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST (stored in an unordered_map).
    • ๐Ÿ‘ db_bench now supports value_size_distribution_type, value_size_min, value_size_max options for generating random variable sized value. Added blob_db_compression_type option for BlobDB to enable blob compression.
    • Replace RocksDB namespace "rocksdb" with flag "ROCKSDB_NAMESPACE" which if is not defined, defined as "rocksdb" in header file rocksdb_namespace.h.
  • v6.8.0 Changes

    February 24, 2020

    Java API Changes

    • Major breaking changes to Java comparators, toward standardizing on ByteBuffer for performant, locale-neutral operations on keys (#6252).
    • โž• Added overloads of common API methods using direct ByteBuffers for keys and values (#2283).

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix incorrect results while block-based table uses kHashSearch, together with Prev()/SeekForPrev().
    • ๐Ÿ›  Fix a bug that prevents opening a DB after two consecutive crash with TransactionDB, where the first crash recovers from a corrupted WAL with kPointInTimeRecovery but the second cannot.
    • ๐Ÿ›  Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
    • Add DBOptions::skip_checking_sst_file_sizes_on_db_open. It disables potentially expensive checking of all sst file sizes in DB::Open().
    • โšก๏ธ BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
    • Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.
    • WriteBatchWithIndex::DeleteRange returns Status::NotSupported. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes rocksdb_writebatch_wi_delete_range, rocksdb_writebatch_wi_delete_range_cf, rocksdb_writebatch_wi_delete_rangev, and rocksdb_writebatch_wi_delete_rangev_cf. In Java, that includes WriteBatchWithIndex::deleteRange.
    • Assign new MANIFEST file number when caller tries to create a new MANIFEST by calling LogAndApply(..., new_descriptor_log=true). This bug can cause MANIFEST being overwritten during recovery if options.write_dbid_to_manifest = true and there are WAL file(s).

    ๐ŸŽ Performance Improvements

    • Perfom readahead when reading from option files. Inside DB, options.log_readahead_size will be used as the readahead size. In other cases, a default 512KB is used.

    Public API Change

    • The BlobDB garbage collector now emits the statistics BLOB_DB_GC_NUM_FILES (number of blob files obsoleted during GC), BLOB_DB_GC_NUM_NEW_FILES (number of new blob files generated during GC), BLOB_DB_GC_FAILURES (number of failed GC passes), BLOB_DB_GC_NUM_KEYS_RELOCATED (number of blobs relocated during GC), and BLOB_DB_GC_BYTES_RELOCATED (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: BLOB_DB_GC_NUM_KEYS_OVERWRITTEN, BLOB_DB_GC_NUM_KEYS_EXPIRED, BLOB_DB_GC_BYTES_OVERWRITTEN, BLOB_DB_GC_BYTES_EXPIRED, and BLOB_DB_GC_MICROS.
    • ๐ŸŒฒ Disable recycle_log_file_num when an inconsistent recovery modes are requested: kPointInTimeRecovery and kAbsoluteConsistency

    ๐Ÿ†• New Features

    • Added the checksum for each SST file generated by Flush or Compaction. Added sst_file_checksum_func to Options such that user can plugin their own SST file checksum function via override the FileChecksumFunc class. If user does not set the sst_file_checksum_func, SST file checksum calculation will not be enabled. The checksum information inlcuding uint32_t checksum value and a checksum function name (string). The checksum information is stored in FileMetadata in version store and also logged to MANIFEST. A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST (stored in an unordered_map).
    • ๐Ÿ‘ db_bench now supports value_size_distribution_type, value_size_min, value_size_max options for generating random variable sized value. Added blob_db_compression_type option for BlobDB to enable blob compression.
    • Replace RocksDB namespace "rocksdb" with flag "ROCKSDB_NAMESPACE" which if is not defined, defined as "rocksdb" in header file rocksdb_namespace.h.
  • v6.7.3 Changes

    March 19, 2020

    6.7.3 (2020-03-18)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a data race that might cause crash when calling DB::GetCreationTimeOfOldestFile() by a small chance. The bug was introduced in 6.6 Release.

    6.7.2 (2020-02-24)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed a bug of IO Uring partial result handling introduced in 6.7.0.

    6.7.1 (2020-02-13)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
    • Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.

    6.7.0 (2020-01-21)

    Public API Change

    • Added a rocksdb::FileSystem class in include/rocksdb/file_system.h to encapsulate file creation/read/write operations, and an option DBOptions::file_system to allow a user to pass in an instance of rocksdb::FileSystem. If its a non-null value, this will take precendence over DBOptions::env for file operations. A new API rocksdb::FileSystem::Default() returns a platform default object. The DBOptions::env option and Env::Default() API will continue to be used for threading and other OS related functions, and where DBOptions::file_system is not specified, for file operations. For storage developers who are accustomed to rocksdb::Env, the interface in rocksdb::FileSystem is new and will probably undergo some changes as more storage systems are ported to it from rocksdb::Env. As of now, no env other than Posix has been ported to the new interface.
    • A new rocksdb::NewSstFileManager() API that allows the caller to pass in separate Env and FileSystem objects.
    • ๐Ÿ”„ Changed Java API for RocksDB.keyMayExist functions to use Holder<byte[]> instead of StringBuilder, so that retrieved values need not decode to Strings.
    • ๐Ÿ”ง A new OptimisticTransactionDBOptions Option that allows users to configure occ validation policy. The default policy changes from kValidateSerial to kValidateParallel to reduce mutex contention.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug that can cause unnecessary bg thread to be scheduled(#6104).
    • ๐Ÿ›  Fix crash caused by concurrent CF iterations and drops(#6147).
    • Fix a race condition for cfd->log_number_ between manifest switch and memtable switch (PR 6249) when number of column families is greater than 1.
    • ๐Ÿ›  Fix a bug on fractional cascading index when multiple files at the same level contain the same smallest user key, and those user keys are for merge operands. In this case, Get() the exact key may miss some merge operands.
    • Delcare kHashSearch index type feature-incompatible with index_block_restart_interval larger than 1.
    • Fixed an issue where the thread pools were not resized upon setting max_background_jobs dynamically through the SetDBOptions interface.
    • ๐Ÿ›  Fix a bug that can cause write threads to hang when a slowdown/stall happens and there is a mix of writers with WriteOptions::no_slowdown set/unset.
    • ๐Ÿ›  Fixed an issue where an incorrect "number of input records" value was used to compute the "records dropped" statistics for compactions.

    ๐Ÿ†• New Features

    • It is now possible to enable periodic compactions for the base DB when using BlobDB.
    • BlobDB now garbage collects non-TTL blobs when enable_garbage_collection is set to true in BlobDBOptions. Garbage collection is performed during compaction: any valid blobs located in the oldest N files (where N is the number of non-TTL blob files multiplied by the value of BlobDBOptions::garbage_collection_cutoff) encountered during compaction get relocated to new blob files, and old blob files are dropped once they are no longer needed. Note: we recommend enabling periodic compactions for the base DB when using this feature to deal with the case when some old blob files are kept alive by SSTs that otherwise do not get picked for compaction.
    • ๐Ÿ‘ db_bench now supports the garbage_collection_cutoff option for BlobDB.
    • ๐Ÿ‘‰ MultiGet() can use IO Uring to parallelize read from the same SST file. This featuer is by default disabled. It can be enabled with environment variable ROCKSDB_USE_IO_URING.
  • v6.7.0 Changes

    January 21, 2020

    Public API Change

    • Added a rocksdb::FileSystem class in include/rocksdb/file_system.h to encapsulate file creation/read/write operations, and an option DBOptions::file_system to allow a user to pass in an instance of rocksdb::FileSystem. If its a non-null value, this will take precendence over DBOptions::env for file operations. A new API rocksdb::FileSystem::Default() returns a platform default object. The DBOptions::env option and Env::Default() API will continue to be used for threading and other OS related functions, and where DBOptions::file_system is not specified, for file operations. For storage developers who are accustomed to rocksdb::Env, the interface in rocksdb::FileSystem is new and will probably undergo some changes as more storage systems are ported to it from rocksdb::Env. As of now, no env other than Posix has been ported to the new interface.
    • A new rocksdb::NewSstFileManager() API that allows the caller to pass in separate Env and FileSystem objects.
    • ๐Ÿ”„ Changed Java API for RocksDB.keyMayExist functions to use Holder instead of StringBuilder, so that retrieved values need not decode to Strings.
    • ๐Ÿ”ง A new OptimisticTransactionDBOptions Option that allows users to configure occ validation policy. The default policy changes from kValidateSerial to kValidateParallel to reduce mutex contention.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug that can cause unnecessary bg thread to be scheduled(#6104).
    • ๐Ÿ›  Fix crash caused by concurrent CF iterations and drops(#6147).
    • Fix a race condition for cfd->log_number_ between manifest switch and memtable switch (PR 6249) when number of column families is greater than 1.
    • ๐Ÿ›  Fix a bug on fractional cascading index when multiple files at the same level contain the same smallest user key, and those user keys are for merge operands. In this case, Get() the exact key may miss some merge operands.
    • Delcare kHashSearch index type feature-incompatible with index_block_restart_interval larger than 1.
    • Fixed an issue where the thread pools were not resized upon setting max_background_jobs dynamically through the SetDBOptions interface.
    • ๐Ÿ›  Fix a bug that can cause write threads to hang when a slowdown/stall happens and there is a mix of writers with WriteOptions::no_slowdown set/unset.
    • ๐Ÿ›  Fixed an issue where an incorrect "number of input records" value was used to compute the "records dropped" statistics for compactions.
    • Fix a regression bug that causes segfault when hash is used, max_open_files != -1 and total order seek is used and switched back.

    ๐Ÿ†• New Features

    • It is now possible to enable periodic compactions for the base DB when using BlobDB.
    • BlobDB now garbage collects non-TTL blobs when enable_garbage_collection is set to true in BlobDBOptions. Garbage collection is performed during compaction: any valid blobs located in the oldest N files (where N is the number of non-TTL blob files multiplied by the value of BlobDBOptions::garbage_collection_cutoff) encountered during compaction get relocated to new blob files, and old blob files are dropped once they are no longer needed. Note: we recommend enabling periodic compactions for the base DB when using this feature to deal with the case when some old blob files are kept alive by SSTs that otherwise do not get picked for compaction.
    • ๐Ÿ‘ db_bench now supports the garbage_collection_cutoff option for BlobDB.
    • Introduce ReadOptions.auto_prefix_mode. When set to true, iterator will return the same result as total order seek, but may choose to use prefix seek internally based on seek key and iterator upper bound.
    • ๐Ÿ‘‰ MultiGet() can use IO Uring to parallelize read from the same SST file. This featuer is by default disabled. It can be enabled with environment variable ROCKSDB_USE_IO_URING.
  • v6.6.4 Changes

    January 31, 2020

    ๐ŸŒฒ Rocksdb Change Log

    6.6.4 (2020-01-31)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
  • v6.6.3 Changes

    January 28, 2020

    ๐ŸŒฒ Rocksdb Change Log

    6.6.3 (2020-01-24)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug that can cause write threads to hang when a slowdown/stall happens and there is a mix of writers with WriteOptions::no_slowdown set/unset.

    6.6.2 (2020-01-13)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed a bug where non-L0 compaction input files were not considered to compute the creation_time of new compaction outputs.

    6.6.1 (2020-01-02)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug in WriteBatchWithIndex::MultiGetFromBatchAndDB, which is called by Transaction::MultiGet, that causes due to stale pointer access when the number of keys is > 32
    • ๐Ÿ›  Fixed two performance issues related to memtable history trimming. First, a new SuperVersion is now created only if some memtables were actually trimmed. Second, trimming is only scheduled if there is at least one flushed memtable that is kept in memory for the purposes of transaction conflict checking.
    • โšก๏ธ BlobDB no longer updates the SST to blob file mapping upon failed compactions.
    • ๐Ÿ›  Fix a bug in which a snapshot read through an iterator could be affected by a DeleteRange after the snapshot (#6062).
    • ๐Ÿ›  Fixed a bug where BlobDB was comparing the ColumnFamilyHandle pointers themselves instead of only the column family IDs when checking whether an API call uses the default column family or not.
    • โœ‚ Delete superversions in BackgroundCallPurge.
    • ๐Ÿ›  Fix use-after-free and double-deleting files in BackgroundCallPurge().

    6.6.0 (2019-11-25)

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix data corruption casued by output of intra-L0 compaction on ingested file not being placed in correct order in L0.
    • ๐Ÿ›  Fix a data race between Version::GetColumnFamilyMetaData() and Compaction::MarkFilesBeingCompacted() for access to being_compacted (#6056). The current fix acquires the db mutex during Version::GetColumnFamilyMetaData(), which may cause regression.
    • Fix a bug in DBIter that is_blob_ state isn't updated when iterating backward using seek.
    • ๐Ÿ›  Fix a bug when format_version=3, partitioned fitlers, and prefix search are used in conjunction. The bug could result into Seek::(prefix) returning NotFound for an existing prefix.
    • ๐Ÿ‘€ Revert the feature "Merging iterator to avoid child iterator reseek for some cases (#5286)" since it might cause strong results when reseek happens with a different iterator upper bound.
    • ๐Ÿ›  Fix a bug causing a crash during ingest external file when background compaction cause severe error (file not found).
    • ๐Ÿ›  Fix a bug when partitioned filters and prefix search are used in conjunction, ::SeekForPrev could return invalid for an existing prefix. ::SeekForPrev might be called by the user, or internally on ::Prev, or within ::Seek if the return value involves Delete or a Merge operand.
    • ๐Ÿ›  Fix OnFlushCompleted fired before flush result persisted in MANIFEST when there's concurrent flush job. The bug exists since OnFlushCompleted was introduced in rocksdb 3.8.
    • ๐Ÿ›  Fixed an sst_dump crash on some plain table SST files.
    • ๐Ÿ›  Fixed a memory leak in some error cases of opening plain table SST files.
    • ๐Ÿ›  Fix a bug when a crash happens while calling WriteLevel0TableForRecovery for multiple column families, leading to a column family's log number greater than the first corrutped log number when the DB is being opened in PointInTime recovery mode during next recovery attempt (#5856).

    ๐Ÿ†• New Features

    • Universal compaction to support options.periodic_compaction_seconds. A full compaction will be triggered if any file is over the threshold.
    • ๐Ÿ“‡ GetLiveFilesMetaData and GetColumnFamilyMetaData now expose the file number of SST files as well as the oldest blob file referenced by each SST.
    • ๐Ÿ‘ A batched MultiGet API (DB::MultiGet()) that supports retrieving keys from multiple column families.
    • ๐Ÿš€ Full and partitioned filters in the block-based table use an improved Bloom filter implementation, enabled with format_version 5 (or above) because previous releases cannot read this filter. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single (full) filter. For example, the new Bloom filter has the same false postive rate at 9.55 bits per key as the old one at 10 bits per key, and a lower false positive rate at 16 bits per key than the old one at 100 bits per key.
    • ๐Ÿ— Added AVX2 instructions to USE_SSE builds to accelerate the new Bloom filter and XXH3-based hash function on compatible x86_64 platforms (Haswell and later, ~2014).
    • Support options.ttl or options.periodic_compaction_seconds with options.max_open_files = -1. File's oldest ancester time and file creation time will be written to manifest. If it is availalbe, this information will be used instead of creation_time and file_creation_time in table properties.
    • Setting options.ttl for universal compaction now has the same meaning as setting periodic_compaction_seconds.
    • ๐Ÿ“‡ SstFileMetaData also returns file creation time and oldest ancester time.
    • ๐Ÿ‘€ The sst_dump command line tool recompress command now displays how many blocks were compressed and how many were not, in particular how many were not compressed because the compression ratio was not met (12.5% threshold for GoodCompressionRatio), as seen in the number.block.not_compressed counter stat since version 6.0.0.
    • ๐Ÿ“‡ The block cache usage is now takes into account the overhead of metadata per each entry. This results into more accurate managment of memory. A side-effect of this feature is that less items are fit into the block cache of the same size, which would result to higher cache miss rates. This can be remedied by increasing the block cache size or passing kDontChargeCacheMetadata to its constuctor to restore the old behavior.
    • When using BlobDB, a mapping is maintained and persisted in the MANIFEST between each SST file and the oldest non-TTL blob file it references.
    • 0๏ธโƒฃ db_bench now supports and by default issues non-TTL Puts to BlobDB. TTL Puts can be enabled by specifying a non-zero value for the blob_db_max_ttl_range command line parameter explicitly.
    • ๐Ÿ–จ sst_dump now supports printing BlobDB blob indexes in a human-readable format. This can be enabled by specifying the decode_blob_index flag on the command line.
    • A number of new information elements are now exposed through the EventListener interface. For flushes, the file numbers of the new SST file and the oldest blob file referenced by the SST are propagated. For compactions, the level, file number, and the oldest blob file referenced are passed to the client for each compaction input and output file.

    Public API Change

    • ๐Ÿš€ RocksDB release 4.1 or older will not be able to open DB generated by the new release. 4.2 was released on Feb 23, 2016.
    • ๐Ÿ’… TTL Compactions in Level compaction style now initiate successive cascading compactions on a key range so that it reaches the bottom level quickly on TTL expiry. creation_time table property for compaction output files is now set to the minimum of the creation times of all compaction inputs.
    • With FIFO compaction style, options.periodic_compaction_seconds will have the same meaning as options.ttl. Whichever stricter will be used. With the default options.periodic_compaction_seconds value with options.ttl's default of 0, RocksDB will give a default of 30 days.
    • Added an API GetCreationTimeOfOldestFile(uint64_t* creation_time) to get the file_creation_time of the oldest SST file in the DB.
    • FilterPolicy now exposes additional API to make it possible to choose filter configurations based on context, such as table level and compaction style. See LevelAndStyleCustomFilterPolicy in db_bloom_filter_test.cc. While most existing custom implementations of FilterPolicy should continue to work as before, those wrapping the return of NewBloomFilterPolicy will require overriding new function GetBuilderWithContext(), because calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy is no longer supported.
    • ๐Ÿ— An unlikely usage of FilterPolicy is no longer supported. Calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy will now cause an assertion violation in debug builds, because RocksDB has internally migrated to a more elaborate interface that is expected to evolve further. Custom implementations of FilterPolicy should work as before, except those wrapping the return of NewBloomFilterPolicy, which will require a new override of a protected function in FilterPolicy.
    • NewBloomFilterPolicy now takes bits_per_key as a double instead of an int. This permits finer control over the memory vs. accuracy trade-off in the new Bloom filter implementation and should not change source code compatibility.
    • The option BackupableDBOptions::max_valid_backups_to_open is now only used when opening BackupEngineReadOnly. When opening a read/write BackupEngine, anything but the default value logs a warning and is treated as the default. This change ensures that backup deletion has proper accounting of shared files to ensure they are deleted when no longer referenced by a backup.
    • Deprecate snap_refresh_nanos option.
    • โž• Added DisableManualCompaction/EnableManualCompaction to stop and resume manual compaction.
    • โž• Add TryCatchUpWithPrimary() to StackableDB in non-LITE mode.
    • โž• Add a new Env::LoadEnv() overloaded function to return a shared_ptr to Env.
    • Flush sets file name to "(nil)" for OnTableFileCreationCompleted() if the flush does not produce any L0. This can happen if the file is empty thus delete by RocksDB.

    0๏ธโƒฃ Default Option Changes

    • Changed the default value of periodic_compaction_seconds to UINT64_MAX - 1 which allows RocksDB to auto-tune periodic compaction scheduling. When using the default value, periodic compactions are now auto-enabled if a compaction filter is used. A value of 0 will turn off the feature completely.
    • ๐Ÿ”„ Changed the default value of ttl to UINT64_MAX - 1 which allows RocksDB to auto-tune ttl value. When using the default value, TTL will be auto-enabled to 30 days, when the feature is supported. To revert the old behavior, you can explictly set it to 0.

    ๐ŸŽ Performance Improvements

    • For 64-bit hashing, RocksDB is standardizing on a slightly modified preview version of XXH3. This function is now used for many non-persisted hashes, along with fastrange64() in place of the modulus operator, and some benchmarks show a slight improvement.
    • ๐Ÿ‘€ Level iterator to invlidate the iterator more often in prefix seek and the level is filtered out by prefix bloom.
  • v6.6.2 Changes

    January 13, 2020

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fixed a bug where non-L0 compaction input files were not considered to compute the creation_time of new compaction outputs.
  • v6.6.1 Changes

    February 01, 2020

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug in WriteBatchWithIndex::MultiGetFromBatchAndDB, which is called by Transaction::MultiGet, that causes due to stale pointer access when the number of keys is > 32
    • ๐Ÿ›  Fixed two performance issues related to memtable history trimming. First, a new SuperVersion is now created only if some memtables were actually trimmed. Second, trimming is only scheduled if there is at least one flushed memtable that is kept in memory for the purposes of transaction conflict checking.
    • โšก๏ธ BlobDB no longer updates the SST to blob file mapping upon failed compactions.
    • ๐Ÿ›  Fix a bug in which a snapshot read through an iterator could be affected by a DeleteRange after the snapshot (#6062).
    • ๐Ÿ›  Fixed a bug where BlobDB was comparing the ColumnFamilyHandle pointers themselves instead of only the column family IDs when checking whether an API call uses the default column family or not.
    • โœ‚ Delete superversions in BackgroundCallPurge.
    • ๐Ÿ›  Fix use-after-free and double-deleting files in BackgroundCallPurge().
  • v6.6.0 Changes

    November 25, 2019

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix data corruption caused by output of intra-L0 compaction on ingested file not being placed in correct order in L0.
    • ๐Ÿ›  Fix a data race between Version::GetColumnFamilyMetaData() and Compaction::MarkFilesBeingCompacted() for access to being_compacted (#6056). The current fix acquires the db mutex during Version::GetColumnFamilyMetaData(), which may cause regression.
    • Fix a bug in DBIter that is_blob_ state isn't updated when iterating backward using seek.
    • ๐Ÿ›  Fix a bug when format_version=3, partitioned filters, and prefix search are used in conjunction. The bug could result into Seek::(prefix) returning NotFound for an existing prefix.
    • ๐Ÿ‘€ Revert the feature "Merging iterator to avoid child iterator reseek for some cases (#5286)" since it might cause strong results when reseek happens with a different iterator upper bound.
    • ๐Ÿ›  Fix a bug causing a crash during ingest external file when background compaction cause severe error (file not found).
    • ๐Ÿ›  Fix a bug when partitioned filters and prefix search are used in conjunction, ::SeekForPrev could return invalid for an existing prefix. ::SeekForPrev might be called by the user, or internally on ::Prev, or within ::Seek if the return value involves Delete or a Merge operand.
    • ๐Ÿ›  Fix OnFlushCompleted fired before flush result persisted in MANIFEST when there's concurrent flush job. The bug exists since OnFlushCompleted was introduced in rocksdb 3.8.
    • ๐Ÿ›  Fixed an sst_dump crash on some plain table SST files.
    • ๐Ÿ›  Fixed a memory leak in some error cases of opening plain table SST files.
    • ๐Ÿ›  Fix a bug when a crash happens while calling WriteLevel0TableForRecovery for multiple column families, leading to a column family's log number greater than the first corrutped log number when the DB is being opened in PointInTime recovery mode during next recovery attempt (#5856).

    ๐Ÿ†• New Features

    • Universal compaction to support options.periodic_compaction_seconds. A full compaction will be triggered if any file is over the threshold.
    • ๐Ÿ“‡ GetLiveFilesMetaData and GetColumnFamilyMetaData now expose the file number of SST files as well as the oldest blob file referenced by each SST.
    • ๐Ÿ‘ A batched MultiGet API (DB::MultiGet()) that supports retrieving keys from multiple column families.
    • ๐Ÿš€ Full and partitioned filters in the block-based table use an improved Bloom filter implementation, enabled with format_version 5 (or above) because previous releases cannot read this filter. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single (full) filter. For example, the new Bloom filter has the same false positive rate at 9.55 bits per key as the old one at 10 bits per key, and a lower false positive rate at 16 bits per key than the old one at 100 bits per key.
    • ๐Ÿ— Added AVX2 instructions to USE_SSE builds to accelerate the new Bloom filter and XXH3-based hash function on compatible x86_64 platforms (Haswell and later, ~2014).
    • Support options.ttl or options.periodic_compaction_seconds with options.max_open_files = -1. File's oldest ancester time and file creation time will be written to manifest. If it is availalbe, this information will be used instead of creation_time and file_creation_time in table properties.
    • Setting options.ttl for universal compaction now has the same meaning as setting periodic_compaction_seconds.
    • ๐Ÿ“‡ SstFileMetaData also returns file creation time and oldest ancester time.
    • ๐Ÿ‘€ The sst_dump command line tool recompress command now displays how many blocks were compressed and how many were not, in particular how many were not compressed because the compression ratio was not met (12.5% threshold for GoodCompressionRatio), as seen in the number.block.not_compressed counter stat since version 6.0.0.
    • ๐Ÿ“‡ The block cache usage is now takes into account the overhead of metadata per each entry. This results into more accurate management of memory. A side-effect of this feature is that less items are fit into the block cache of the same size, which would result to higher cache miss rates. This can be remedied by increasing the block cache size or passing kDontChargeCacheMetadata to its constuctor to restore the old behavior.
    • When using BlobDB, a mapping is maintained and persisted in the MANIFEST between each SST file and the oldest non-TTL blob file it references.
    • 0๏ธโƒฃ db_bench now supports and by default issues non-TTL Puts to BlobDB. TTL Puts can be enabled by specifying a non-zero value for the blob_db_max_ttl_range command line parameter explicitly.
    • ๐Ÿ–จ sst_dump now supports printing BlobDB blob indexes in a human-readable format. This can be enabled by specifying the decode_blob_index flag on the command line.
    • A number of new information elements are now exposed through the EventListener interface. For flushes, the file numbers of the new SST file and the oldest blob file referenced by the SST are propagated. For compactions, the level, file number, and the oldest blob file referenced are passed to the client for each compaction input and output file.

    Public API Change

    • ๐Ÿš€ RocksDB release 4.1 or older will not be able to open DB generated by the new release. 4.2 was released on Feb 23, 2016.
    • ๐Ÿ’… TTL Compactions in Level compaction style now initiate successive cascading compactions on a key range so that it reaches the bottom level quickly on TTL expiry. creation_time table property for compaction output files is now set to the minimum of the creation times of all compaction inputs.
    • With FIFO compaction style, options.periodic_compaction_seconds will have the same meaning as options.ttl. Whichever stricter will be used. With the default options.periodic_compaction_seconds value with options.ttl's default of 0, RocksDB will give a default of 30 days.
    • Added an API GetCreationTimeOfOldestFile(uint64_t* creation_time) to get the file_creation_time of the oldest SST file in the DB.
    • FilterPolicy now exposes additional API to make it possible to choose filter configurations based on context, such as table level and compaction style. See LevelAndStyleCustomFilterPolicy in db_bloom_filter_test.cc. While most existing custom implementations of FilterPolicy should continue to work as before, those wrapping the return of NewBloomFilterPolicy will require overriding new function GetBuilderWithContext(), because calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy is no longer supported.
    • ๐Ÿ— An unlikely usage of FilterPolicy is no longer supported. Calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy will now cause an assertion violation in debug builds, because RocksDB has internally migrated to a more elaborate interface that is expected to evolve further. Custom implementations of FilterPolicy should work as before, except those wrapping the return of NewBloomFilterPolicy, which will require a new override of a protected function in FilterPolicy.
    • NewBloomFilterPolicy now takes bits_per_key as a double instead of an int. This permits finer control over the memory vs. accuracy trade-off in the new Bloom filter implementation and should not change source code compatibility.
    • The option BackupableDBOptions::max_valid_backups_to_open is now only used when opening BackupEngineReadOnly. When opening a read/write BackupEngine, anything but the default value logs a warning and is treated as the default. This change ensures that backup deletion has proper accounting of shared files to ensure they are deleted when no longer referenced by a backup.
    • Deprecate snap_refresh_nanos option.
    • โž• Added DisableManualCompaction/EnableManualCompaction to stop and resume manual compaction.
    • โž• Add TryCatchUpWithPrimary() to StackableDB in non-LITE mode.
    • โž• Add a new Env::LoadEnv() overloaded function to return a shared_ptr to Env.
    • Flush sets file name to "(nil)" for OnTableFileCreationCompleted() if the flush does not produce any L0. This can happen if the file is empty thus delete by RocksDB.

    0๏ธโƒฃ Default Option Changes

    • Changed the default value of periodic_compaction_seconds to UINT64_MAX - 1 which allows RocksDB to auto-tune periodic compaction scheduling. When using the default value, periodic compactions are now auto-enabled if a compaction filter is used. A value of 0 will turn off the feature completely.
    • ๐Ÿ”„ Changed the default value of ttl to UINT64_MAX - 1 which allows RocksDB to auto-tune ttl value. When using the default value, TTL will be auto-enabled to 30 days, when the feature is supported. To revert the old behavior, you can explicitly set it to 0.

    ๐ŸŽ Performance Improvements

    • For 64-bit hashing, RocksDB is standardizing on a slightly modified preview version of XXH3. This function is now used for many non-persisted hashes, along with fastrange64() in place of the modulus operator, and some benchmarks show a slight improvement.
    • ๐Ÿ‘€ Level iterator to invlidate the iterator more often in prefix seek and the level is filtered out by prefix bloom.