RocksDB v6.13 release notes (2020-12-09)

« Changelog History

RocksDB v6.13 Release Notes

Release Date: 2020-12-09 // over 3 years ago

🐛 Bug fixes
- 🛠 Fix a performance regression introduced in 6.4 that makes a upper bound check for every Next() even if keys are within a data block that is within the upper bound.
- 🛠 Fix a possible corruption to the LSM state (overlapping files within a level) when a CompactRange() for refitting levels (CompactRangeOptions::change_level == true) and another manual compaction are executed in parallel.
- 🌲 Sanitize recycle_log_file_num to zero when the user attempts to enable it in combination with WALRecoveryMode::kTolerateCorruptedTailRecords. Previously the two features were allowed together, which compromised the user's configured crash-recovery guarantees.
- 🛠 Fix a bug where a level refitting in CompactRange() might race with an automatic compaction that puts the data to the target level of the refitting. The bug has been there for years.
- Fixed a bug in version 6.12 in which BackupEngine::CreateNewBackup could fail intermittently with non-OK status when backing up a read-write DB configured with a DBOptions::file_checksum_gen_factory.
- 🛠 Fix useless no-op compactions scheduled upon snapshot release when options.disable-auto-compactions = true.
- Fix a bug when max_write_buffer_size_to_maintain is set, immutable flushed memtable destruction is delayed until the next super version is installed. A memtable is not added to delete list because of its reference hold by super version and super version doesn't switch because of empt delete list. So memory usage keeps on increasing beyond write_buffer_size + max_write_buffer_size_to_maintain.
- Avoid converting MERGES to PUTS when allow_ingest_behind is true.
- 🛠 Fix compression dictionary sampling together with SstFileWriter. Previously, the dictionary would be trained/finalized immediately with zero samples. Now, the whole SstFileWriter file is buffered in memory and then sampled.
- Fix a bug with avoid_unnecessary_blocking_io=1 and creating backups (BackupEngine::CreateNewBackup) or checkpoints (Checkpoint::Create). With this setting and WAL enabled, these operations could randomly fail with non-OK status.
- 🛠 Fix a bug in which bottommost compaction continues to advance the underlying InternalIterator to skip tombstones even after shutdown.
🆕 New Features
- A new field std::string requested_checksum_func_name is added to FileChecksumGenContext, which enables the checksum factory to create generators for a suite of different functions.
- ✂ Added a new subcommand, ldb unsafe_remove_sst_file, which removes a lost or corrupt SST file from a DB's metadata. This command involves data loss and must not be used on a live DB.
🐎 Performance Improvements
- ⬇️ Reduce thread number for multiple DB instances by re-using one global thread for statistics dumping and persisting.
- Reduce write-amp in heavy write bursts in kCompactionStyleLevel compaction style with level_compaction_dynamic_level_bytes set.
- BackupEngine incremental backups no longer read DB table files that are already saved to a shared part of the backup directory, unless share_files_with_checksum is used with kLegacyCrc32cAndFileSize naming (discouraged).
  - For share_files_with_checksum, we are confident there is no regression (vs. pre-6.12) in detecting DB or backup corruption at backup creation time, mostly because the old design did not leverage this extra checksum computation for detecting inconsistencies at backup creation time.
  - For share_table_files without "checksum" (not recommended), there is a regression in detecting fundamentally unsafe use of the option, greatly mitigated by file size checking (under "Behavior Changes"). Almost no reason to use share_files_with_checksum=false should remain.
  - DB::VerifyChecksum and BackupEngine::VerifyBackup with checksum checking are still able to catch corruptions that CreateNewBackup does not.
Public API Change
- ⚡️ Expose kTypeDeleteWithTimestamp in EntryType and update GetEntryType() accordingly.
- Added file_checksum and file_checksum_func_name to TableFileCreationInfo, which can pass the table file checksum information through the OnTableFileCreated callback during flush and compaction.
- 🗄 A warning is added to DB::DeleteFile() API describing its known problems and deprecation plan.
- ➕ Add a new stats level, i.e. StatsLevel::kExceptTickers (PR7329) to exclude tickers even if application passes a non-null Statistics object.
- ➕ Added a new status code IOStatus::IOFenced() for the Env/FileSystem to indicate that writes from this instance are fenced off. Like any other background error, this error is returned to the user in Put/Merge/Delete/Flush calls and can be checked using Status::IsIOFenced().
Behavior Changes
- 🐎 File abstraction FSRandomAccessFile.Prefetch() default return status is changed from OK to NotSupported. If the user inherited file doesn't implement prefetch, RocksDB will create internal prefetch buffer to improve read performance.
- When retryabel IO error happens during Flush (manifest write error is excluded) and WAL is disabled, originally it is mapped to kHardError. Now,it is mapped to soft error. So DB will not stall the writes unless the memtable is full. At the same time, when auto resume is triggered to recover the retryable IO error during Flush, SwitchMemtable is not called to avoid generating to many small immutable memtables. If WAL is enabled, no behavior changes.
- When considering whether a table file is already backed up in a shared part of backup directory, BackupEngine would already query the sizes of source (DB) and pre-existing destination (backup) files. BackupEngine now uses these file sizes to detect corruption, as at least one of (a) old backup, (b) backup in progress, or (c) current DB is corrupt if there's a size mismatch.
Others
- Error in prefetching partitioned index blocks will not be swallowed. It will fail the query and return the IOError users.