RocksDB v6.13 Release Notes
Release Date: 2020-12-09 // over 3 years ago-
๐ Bug fixes
- ๐ Fix a performance regression introduced in 6.4 that makes a upper bound check for every Next() even if keys are within a data block that is within the upper bound.
- ๐ Fix a possible corruption to the LSM state (overlapping files within a level) when a
CompactRange()
for refitting levels (CompactRangeOptions::change_level == true
) and another manual compaction are executed in parallel. - ๐ฒ Sanitize
recycle_log_file_num
to zero when the user attempts to enable it in combination withWALRecoveryMode::kTolerateCorruptedTailRecords
. Previously the two features were allowed together, which compromised the user's configured crash-recovery guarantees. - ๐ Fix a bug where a level refitting in CompactRange() might race with an automatic compaction that puts the data to the target level of the refitting. The bug has been there for years.
- Fixed a bug in version 6.12 in which BackupEngine::CreateNewBackup could fail intermittently with non-OK status when backing up a read-write DB configured with a DBOptions::file_checksum_gen_factory.
- ๐ Fix useless no-op compactions scheduled upon snapshot release when options.disable-auto-compactions = true.
- Fix a bug when max_write_buffer_size_to_maintain is set, immutable flushed memtable destruction is delayed until the next super version is installed. A memtable is not added to delete list because of its reference hold by super version and super version doesn't switch because of empt delete list. So memory usage keeps on increasing beyond write_buffer_size + max_write_buffer_size_to_maintain.
- Avoid converting MERGES to PUTS when allow_ingest_behind is true.
- ๐ Fix compression dictionary sampling together with
SstFileWriter
. Previously, the dictionary would be trained/finalized immediately with zero samples. Now, the wholeSstFileWriter
file is buffered in memory and then sampled. - Fix a bug with
avoid_unnecessary_blocking_io=1
and creating backups (BackupEngine::CreateNewBackup) or checkpoints (Checkpoint::Create). With this setting and WAL enabled, these operations could randomly fail with non-OK status. - ๐ Fix a bug in which bottommost compaction continues to advance the underlying InternalIterator to skip tombstones even after shutdown.
๐ New Features
- A new field
std::string requested_checksum_func_name
is added toFileChecksumGenContext
, which enables the checksum factory to create generators for a suite of different functions. - โ Added a new subcommand,
ldb unsafe_remove_sst_file
, which removes a lost or corrupt SST file from a DB's metadata. This command involves data loss and must not be used on a live DB.
๐ Performance Improvements
- โฌ๏ธ Reduce thread number for multiple DB instances by re-using one global thread for statistics dumping and persisting.
- Reduce write-amp in heavy write bursts in
kCompactionStyleLevel
compaction style withlevel_compaction_dynamic_level_bytes
set. - BackupEngine incremental backups no longer read DB table files that are already saved to a shared part of the backup directory, unless
share_files_with_checksum
is used withkLegacyCrc32cAndFileSize
naming (discouraged).- For
share_files_with_checksum
, we are confident there is no regression (vs. pre-6.12) in detecting DB or backup corruption at backup creation time, mostly because the old design did not leverage this extra checksum computation for detecting inconsistencies at backup creation time. - For
share_table_files
without "checksum" (not recommended), there is a regression in detecting fundamentally unsafe use of the option, greatly mitigated by file size checking (under "Behavior Changes"). Almost no reason to useshare_files_with_checksum=false
should remain. DB::VerifyChecksum
andBackupEngine::VerifyBackup
with checksum checking are still able to catch corruptions thatCreateNewBackup
does not.
- For
Public API Change
- โก๏ธ Expose kTypeDeleteWithTimestamp in EntryType and update GetEntryType() accordingly.
- Added file_checksum and file_checksum_func_name to TableFileCreationInfo, which can pass the table file checksum information through the OnTableFileCreated callback during flush and compaction.
- ๐ A warning is added to
DB::DeleteFile()
API describing its known problems and deprecation plan. - โ Add a new stats level, i.e. StatsLevel::kExceptTickers (PR7329) to exclude tickers even if application passes a non-null Statistics object.
- โ Added a new status code IOStatus::IOFenced() for the Env/FileSystem to indicate that writes from this instance are fenced off. Like any other background error, this error is returned to the user in Put/Merge/Delete/Flush calls and can be checked using Status::IsIOFenced().
Behavior Changes
- ๐ File abstraction
FSRandomAccessFile.Prefetch()
default return status is changed fromOK
toNotSupported
. If the user inherited file doesn't implement prefetch, RocksDB will create internal prefetch buffer to improve read performance. - When retryabel IO error happens during Flush (manifest write error is excluded) and WAL is disabled, originally it is mapped to kHardError. Now,it is mapped to soft error. So DB will not stall the writes unless the memtable is full. At the same time, when auto resume is triggered to recover the retryable IO error during Flush, SwitchMemtable is not called to avoid generating to many small immutable memtables. If WAL is enabled, no behavior changes.
- When considering whether a table file is already backed up in a shared part of backup directory, BackupEngine would already query the sizes of source (DB) and pre-existing destination (backup) files. BackupEngine now uses these file sizes to detect corruption, as at least one of (a) old backup, (b) backup in progress, or (c) current DB is corrupt if there's a size mismatch.
Others
- Error in prefetching partitioned index blocks will not be swallowed. It will fail the query and return the IOError users.