All Versions
107
Latest Version
Avg Release Cycle
33 days
Latest Release
18 days ago

Changelog History
Page 7

  • v5.16.0 Changes

    August 21, 2018

    Public API Change

    • ๐ŸŽ The merge operands are passed to MergeOperator::ShouldMerge in the reversed order relative to how they were merged (passed to FullMerge or FullMergeV2) for performance reasons
    • GetAllKeyVersions() to take an extra argument of max_num_ikeys.
    • Using ZSTD dictionary trainer (i.e., setting CompressionOptions::zstd_max_train_bytes to a nonzero value) now requires ZSTD version 1.1.3 or later.

    ๐Ÿ†• New Features

    • ๐Ÿ”„ Changes the format of index blocks by delta encoding the index values, which are the block handles. This saves the encoding of BlockHandle::offset of the non-head index entries in each restart interval. The feature is backward compatible but not forward compatible. It is disabled by default unless format_version 4 or above is used.
    • Add a new tool: trace_analyzer. Trace_analyzer analyzes the trace file generated by using trace_replay API. It can convert the binary format trace file to a human readable txt file, output the statistics of the analyzed query types such as access statistics and size statistics, combining the dumped whole key space file to analyze, support query correlation analyzing, and etc. Current supported query types are: Get, Put, Delete, SingleDelete, DeleteRange, Merge, Iterator (Seek, SeekForPrev only).
    • Add hash index support to data blocks, which helps reducing the cpu utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless BlockBasedTableOptions::data_block_index_type is set to data_block_index_type = kDataBlockBinaryAndHash.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a bug in misreporting the estimated partition index size in properties block.
  • v5.15.0 Changes

    July 17, 2018

    Public API Change

    • โœ‚ Remove managed iterator. ReadOptions.managed is not effective anymore.
    • For bottommost_compression, a compatible CompressionOptions is added via bottommost_compression_opts. To keep backward compatible, a new boolean enabled is added to CompressionOptions. For compression_opts, it will be always used no matter what value of enabled is. For bottommost_compression_opts, it will only be used when user set enabled=true, otherwise, compression_opts will be used for bottommost_compression as default.
    • With LRUCache, when high_pri_pool_ratio > 0, midpoint insertion strategy will be enabled to put low-pri items to the tail of low-pri list (the midpoint) when they first inserted into the cache. This is to make cache entries never get hit age out faster, improving cache efficiency when large background scan presents.
    • For users of Statistics objects created via CreateDBStatistics(), the format of the string returned by its ToString() method has changed.
    • The "rocksdb.num.entries" table property no longer counts range deletion tombstones as entries.

    ๐Ÿ†• New Features

    • ๐Ÿ”„ Changes the format of index blocks by storing the key in their raw form rather than converting them to InternalKey. This saves 8 bytes per index key. The feature is backward compatible but not forward compatible. It is disabled by default unless format_version 3 or above is used.
    • Avoid memcpy when reading mmap files with OpenReadOnly and max_open_files==-1.
    • ๐Ÿ‘Œ Support dynamically changing ColumnFamilyOptions::ttl via SetOptions().
    • โž• Add a new table property, "rocksdb.num.range-deletions", which counts the number of range deletion tombstones in the table.
    • ๐Ÿ‘Œ Improve the performance of iterators doing long range scans by using readahead, when using direct IO.
    • pin_top_level_index_and_filter (default true) in BlockBasedTableOptions can be used in combination with cache_index_and_filter_blocks to prefetch and pin the top-level index of partitioned index and filter blocks in cache. It has no impact when cache_index_and_filter_blocks is false.
    • Write properties meta-block at the end of block-based table to save read-ahead IO.

    ๐Ÿ› Bug Fixes

    • Fix deadlock with enable_pipelined_write=true and max_successive_merges > 0
    • Check conflict at output level in CompactFiles.
    • ๐Ÿ›  Fix corruption in non-iterator reads when mmap is used for file reads
    • ๐Ÿ›  Fix bug with prefix search in partition filters where a shared prefix would be ignored from the later partitions. The bug could report an eixstent key as missing. The bug could be triggered if prefix_extractor is set and partition filters is enabled.
    • Change default value of bytes_max_delete_chunk to 0 in NewSstFileManager() as it doesn't work well with checkpoints.
    • ๐Ÿ›  Fix a bug caused by not copying the block trailer with compressed SST file, direct IO, prefetcher and no compressed block cache.
    • Fix write can stuck indefinitely if enable_pipelined_write=true. The issue exists since pipelined write was introduced in 5.5.0.
  • v5.14.0 Changes

    May 16, 2018

    Public API Change

    • โž• Add a BlockBasedTableOption to align uncompressed data blocks on the smaller of block size or page size boundary, to reduce flash reads by avoiding reads spanning 4K pages.
    • ๐Ÿ‘ The background thread naming convention changed (on supporting platforms) to "rocksdb:", e.g., "rocksdb:low0".
    • โž• Add a new ticker stat rocksdb.number.multiget.keys.found to count number of keys successfully read in MultiGet calls
    • โฑ Touch-up to write-related counters in PerfContext. New counters added: write_scheduling_flushes_compactions_time, write_thread_wait_nanos. Counters whose behavior was fixed or modified: write_memtable_time, write_pre_and_post_process_time, write_delay_time.
    • Posix Env's NewRandomRWFile() will fail if the file doesn't exist.
    • Now, DBOptions::use_direct_io_for_flush_and_compaction only applies to background writes, and DBOptions::use_direct_reads applies to both user reads and background reads. This conforms with Linux's open(2) manpage, which advises against simultaneously reading a file in buffered and direct modes, due to possibly undefined behavior and degraded performance.
    • ๐Ÿ‘€ Iterator::Valid() always returns false if !status().ok(). So, now when doing a Seek() followed by some Next()s, there's no need to check status() after every operation.
    • ๐Ÿ‘€ Iterator::Seek()/SeekForPrev()/SeekToFirst()/SeekToLast() always resets status().
    • 0๏ธโƒฃ Introduced CompressionOptions::kDefaultCompressionLevel, which is a generic way to tell RocksDB to use the compression library's default level. It is now the default value for CompressionOptions::level. Previously the level defaulted to -1, which gave poor compression ratios in ZSTD.

    ๐Ÿ†• New Features

    • Introduce TTL for level compaction so that all files older than ttl go through the compaction process to get rid of old data.
    • ๐Ÿ”ง TransactionDBOptions::write_policy can be configured to enable WritePrepared 2PC transactions. Read more about them in the wiki.
    • โž• Add DB properties "rocksdb.block-cache-capacity", "rocksdb.block-cache-usage", "rocksdb.block-cache-pinned-usage" to show block cache usage.
    • โž• Add Env::LowerThreadPoolCPUPriority(Priority) method, which lowers the CPU priority of background (esp. compaction) threads to minimize interference with foreground tasks.
    • โฑ Fsync parent directory after deleting a file in delete scheduler.
    • In level-based compaction, if bottom-pri thread pool was setup via Env::SetBackgroundThreads(), compactions to the bottom level will be delegated to that thread pool.
    • ๐Ÿšš prefix_extractor has been moved from ImmutableCFOptions to MutableCFOptions, meaning it can be dynamically changed without a DB restart.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ‘ท Fsync after writing global seq number to the ingestion file in ExternalSstFileIngestionJob.
    • Fix WAL corruption caused by race condition between user write thread and FlushWAL when two_write_queue is not set.
    • Fix BackupableDBOptions::max_valid_backups_to_open to not delete backup files when refcount cannot be accurately determined.
    • Fix memory leak when pin_l0_filter_and_index_blocks_in_cache is used with partitioned filters
    • ๐Ÿ”€ Disable rollback of merge operands in WritePrepared transactions to work around an issue in MyRocks. It can be enabled back by setting TransactionDBOptions::rollback_merge_operands to true.
    • ๐Ÿ›  Fix wrong results by ReverseBytewiseComparator::FindShortSuccessor()

    Java API Changes

    • โž• Add BlockBasedTableConfig.setBlockCache to allow sharing a block cache across DB instances.
    • โž• Added SstFileManager to the Java API to allow managing SST files across DB instances.
  • v5.13.0 Changes

    March 20, 2018

    Public API Change

    • RocksDBOptionsParser::Parse()'s ignore_unknown_options argument will only be effective if the option file shows it is generated using a higher version of RocksDB than the current version.
    • โœ‚ Remove CompactionEventListener.

    ๐Ÿ†• New Features

    • ๐ŸŒฒ SstFileManager now can cancel compactions if they will result in max space errors. SstFileManager users can also use SetCompactionBufferSize to specify how much space must be leftover during a compaction for auxiliary file functions such as logging and flushing.
    • Avoid unnecessarily flushing in CompactRange() when the range specified by the user does not overlap unflushed memtables.
    • If ColumnFamilyOptions::max_subcompactions is set greater than one, we now parallelize large manual level-based compactions.
    • โž• Add "rocksdb.live-sst-files-size" DB property to return total bytes of all SST files belong to the latest LSM tree.
    • NewSstFileManager to add an argument bytes_max_delete_chunk with default 64MB. With this argument, a file larger than 64MB will be ftruncated multiple times based on this size.

    ๐Ÿ› Bug Fixes

    • Fix a leak in prepared_section_completed_ where the zeroed entries would not removed from the map.
    • ๐Ÿ›  Fix WAL corruption caused by race condition between user write thread and backup/checkpoint thread.
  • v5.12.0 Changes

    February 14, 2018

    Public API Change

    • ๐Ÿ‘€ Iterator::SeekForPrev is now a pure virtual method. This is to prevent user who implement the Iterator interface fail to implement SeekForPrev by mistake.
    • Add include_end option to make the range end exclusive when include_end == false in DeleteFilesInRange().
    • Add CompactRangeOptions::allow_write_stall, which makes CompactRange start working immediately, even if it causes user writes to stall. The default value is false, meaning we add delay to CompactRange calls until stalling can be avoided when possible. Note this delay is not present in previous RocksDB versions.
    • Creating checkpoint with empty directory now returns Status::InvalidArgument; previously, it returned Status::IOError.
    • โž• Adds a BlockBasedTableOption to turn off index block compression.
    • Close() method now returns a status when closing a db.

    ๐Ÿ†• New Features

    • ๐Ÿ‘Œ Improve the performance of iterators doing long range scans by using readahead.
    • โž• Add new function DeleteFilesInRanges() to delete files in multiple ranges at once for better performance.
    • ๐Ÿ— FreeBSD build support for RocksDB and RocksJava.
    • ๐Ÿ‘Œ Improved performance of long range scans with readahead.
    • โšก๏ธ Updated to and now continuously tested in Visual Studio 2017.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix DisableFileDeletions() followed by GetSortedWalFiles() to not return obsolete WAL files that PurgeObsoleteFiles() is going to delete.
    • ๐Ÿ›  Fix Handle error return from WriteBuffer() during WAL file close and DB close.
    • ๐Ÿ›  Fix advance reservation of arena block addresses.
    • ๐Ÿ›  Fix handling of empty string as checkpoint directory.
  • v5.11.0 Changes

    August 01, 2018

    Public API Change

    • โž• Add autoTune and getBytesPerSecond() to RocksJava RateLimiter

    ๐Ÿ†• New Features

    • โž• Add a new histogram stat called rocksdb.db.flush.micros for memtable flush.
    • Add "--use_txn" option to use transactional API in db_stress.
    • ๐Ÿ Disable onboard cache for compaction output in Windows platform.
    • ๐Ÿ‘Œ Improve the performance of iterators doing long range scans by using readahead.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a stack-use-after-scope bug in ForwardIterator.
    • ๐Ÿ›  Fix builds on platforms including Linux, Windows, and PowerPC.
    • ๐Ÿ›  Fix buffer overrun in backup engine for DBs with huge number of files.
    • ๐Ÿ›  Fix a mislabel bug for bottom-pri compaction threads.
    • ๐Ÿ›  Fix DB::Flush() keep waiting after flush finish under certain condition.
  • v5.10.0 Changes

    November 12, 2017

    Public API Change

    • When running make with environment variable USE_SSE set and PORTABLE unset, will use all machine features available locally. Previously this combination only compiled SSE-related features.

    ๐Ÿ†• New Features

    • ๐Ÿง Provide lifetime hints when writing files on Linux. This reduces hardware write-amp on storage devices supporting multiple streams.
    • Add a DB stat, NUMBER_ITER_SKIP, which returns how many internal keys were skipped during iterations (e.g., due to being tombstones or duplicate versions of a key).
    • ๐Ÿ”’ Add PerfContext counters, key_lock_wait_count and key_lock_wait_time, which measure the number of times transactions wait on key locks and total amount of time waiting.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix IOError on WAL write doesn't propagate to write group follower
    • ๐Ÿ”€ Make iterator invalid on merge error.
    • ๐Ÿ›  Fix performance issue in IngestExternalFile() affecting databases with large number of SST files.
    • ๐Ÿ›  Fix possible corruption to LSM structure when DeleteFilesInRange() deletes a subset of files spanned by a DeleteRange() marker.
  • v5.9.0 Changes

    November 01, 2017

    Public API Change

    • BackupableDBOptions::max_valid_backups_to_open == 0 now means no backups will be opened during BackupEngine initialization. Previously this condition disabled limiting backups opened.
    • 0๏ธโƒฃ DBOptions::preserve_deletes is a new option that allows one to specify that DB should not drop tombstones for regular deletes if they have sequence number larger than what was set by the new API call DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum). Disabled by default.
    • API call DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum) was added, users who wish to preserve deletes are expected to periodically call this function to advance the cutoff seqnum (all deletes made before this seqnum can be dropped by DB). It's user responsibility to figure out how to advance the seqnum in the way so the tombstones are kept for the desired period of time, yet are eventually processed in time and don't eat up too much space.
    • ReadOptions::iter_start_seqnum was added; if set to something > 0 user will see 2 changes in iterators behavior 1) only keys written with sequence larger than this parameter would be returned and 2) the Slice returned by iter->key() now points to the memory that keep User-oriented representation of the internal key, rather than user key. New struct FullKey was added to represent internal keys, along with a new helper function ParseFullKey(const Slice& internal_key, FullKey* result);.
    • ๐Ÿ—„ Deprecate trash_dir param in NewSstFileManager, right now we will rename deleted files to .trash instead of moving them to trash directory
    • ๐Ÿ‘ Allow setting a custom trash/DB size ratio limit in the SstFileManager, after which files that are to be scheduled for deletion are deleted immediately, regardless of any delete ratelimit.
    • ๐Ÿ”€ Return an error on write if write_options.sync = true and write_options.disableWAL = true to warn user of inconsistent options. Previously we will not write to WAL and not respecting the sync options in this case.

    ๐Ÿ†• New Features

    • ๐ŸŽ CRC32C is now using the 3-way pipelined SSE algorithm crc32c_3way on supported platforms to improve performance. The system will choose to use this algorithm on supported platforms automatically whenever possible. If PCLMULQDQ is not supported it will fall back to the old Fast_CRC32 algorithm.
    • DBOptions::writable_file_max_buffer_size can now be changed dynamically.
    • DBOptions::bytes_per_sync, DBOptions::compaction_readahead_size, and DBOptions::wal_bytes_per_sync can now be changed dynamically, DBOptions::wal_bytes_per_sync will flush all memtables and switch to a new WAL file.
    • Support dynamic adjustment of rate limit according to demand for background I/O. It can be enabled by passing true to the auto_tuned parameter in NewGenericRateLimiter(). The value passed as rate_bytes_per_sec will still be respected as an upper-bound.
    • Support dynamically changing ColumnFamilyOptions::compaction_options_fifo.
    • Introduce EventListener::OnStallConditionsChanged() callback. Users can implement it to be notified when user writes are stalled, stopped, or resumed.
    • Add a new db property "rocksdb.estimate-oldest-key-time" to return oldest data timestamp. The property is available only for FIFO compaction with compaction_options_fifo.allow_compaction = false.
    • ๐Ÿš€ Upon snapshot release, recompact bottommost files containing deleted/overwritten keys that previously could not be dropped due to the snapshot. This alleviates space-amp caused by long-held snapshots.
    • Support lower bound on iterators specified via ReadOptions::iterate_lower_bound.
    • ๐Ÿ‘Œ Support for differential snapshots (via iterator emitting the sequence of key-values representing the difference between DB state at two different sequence numbers). Supports preserving and emitting puts and regular deletes, doesn't support SingleDeletes, MergeOperator, Blobs and Range Deletes.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix a potential data inconsistency issue during point-in-time recovery. DB:Open() will abort if column family inconsistency is found during PIT recovery.
    • ๐Ÿ›  Fix possible metadata corruption in databases using DeleteRange().
  • v5.8.0 Changes

    August 30, 2017

    Public API Change

    • ๐Ÿ‘€ Users of Statistics::getHistogramString() will see fewer histogram buckets and different bucket endpoints.
    • Slice::compare and BytewiseComparator Compare no longer accept Slices containing nullptr.
    • โšก๏ธ Transaction::Get and Transaction::GetForUpdate variants with PinnableSlice added.

    ๐Ÿ†• New Features

    • โž• Add Iterator::Refresh(), which allows users to update the iterator state so that they can avoid some initialization costs of recreating iterators.
    • ๐Ÿš€ Replace dynamic_cast<> (except unit test) so people can choose to build with RTTI off. With make, release mode is by default built with -fno-rtti and debug mode is built without it. Users can override it by setting USE_RTTI=0 or 1.
    • Universal compactions including the bottom level can be executed in a dedicated thread pool. This alleviates head-of-line blocking in the compaction queue, which cause write stalling, particularly in multi-instance use cases. Users can enable this feature via Env::SetBackgroundThreads(N, Env::Priority::BOTTOM), where N > 0.
    • ๐Ÿ‘ Allow merge operator to be called even with a single merge operand during compactions, by appropriately overriding MergeOperator::AllowSingleOperand.
    • โž• Add DB::VerifyChecksum(), which verifies the checksums in all SST files in a running DB.
    • ๐Ÿ‘ Block-based table support for disabling checksums by setting BlockBasedTableOptions::checksum = kNoChecksum.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix wrong latencies in rocksdb.db.get.micros, rocksdb.db.write.micros, and rocksdb.sst.read.micros.
    • ๐Ÿ›  Fix incorrect dropping of deletions during intra-L0 compaction.
    • ๐Ÿ›  Fix transient reappearance of keys covered by range deletions when memtable prefix bloom filter is enabled.
    • ๐Ÿ›  Fix potentially wrong file smallest key when range deletions separated by snapshot are written together.
  • v5.7.0 Changes

    July 13, 2017

    Public API Change

    • ๐Ÿ–จ DB property "rocksdb.sstables" now prints keys in hex form.

    ๐Ÿ†• New Features

    • ๐Ÿ“‡ Measure estimated number of reads per file. The information can be accessed through DB::GetColumnFamilyMetaData or "rocksdb.sstables" DB property.
    • ๐Ÿ‘ RateLimiter support for throttling background reads, or throttling the sum of background reads and writes. This can give more predictable I/O usage when compaction reads more data than it writes, e.g., due to lots of deletions.
    • ๐Ÿ‘ [Experimental] FIFO compaction with TTL support. It can be enabled by setting CompactionOptionsFIFO.ttl > 0.
    • Introduce EventListener::OnBackgroundError() callback. Users can implement it to be notified of errors causing the DB to enter read-only mode, and optionally override them.
    • Partitioned Index/Filters exiting the experimental mode. To enable partitioned indexes set index_type to kTwoLevelIndexSearch and to further enable partitioned filters set partition_filters to true. To configure the partition size set metadata_block_size.

    ๐Ÿ› Bug Fixes

    • ๐Ÿ›  Fix discarding empty compaction output files when DeleteRange() is used together with subcompactions.