RocksDB v6.12.6 Release Notes

Release Date: 2020-10-13 // over 3 years ago
  • 6.12.6 (2020-10-13)

    πŸ› Bug Fixes

    • Fix false positive flush/compaction Status::Corruption failure when paranoid_file_checks == true and range tombstones were written to the compaction output files.

    6.12.5 (2020-10-12)

    πŸ› Bug Fixes

    • Since 6.12, memtable lookup should report unrecognized value_type as corruption (#7121).
    • πŸ”– Fixed a bug in the following combination of features: indexes with user keys (format_version >= 3), indexes are partitioned (index_type == kTwoLevelIndexSearch), and some index partitions are pinned in memory (BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache). The bug could cause keys to be truncated when read from the index leading to wrong read results or other unexpected behavior.
    • πŸ“Œ Fixed a bug when indexes are partitioned (index_type == kTwoLevelIndexSearch), some index partitions are pinned in memory (BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache), and partitions reads could be mixed between block cache and directly from the file (e.g., with enable_index_compression == 1 and mmap_read == 1, partitions that were stored uncompressed due to poor compression ratio would be read directly from the file via mmap, while partitions that were stored compressed would be read from block cache). The bug could cause index partitions to be mistakenly considered empty during reads leading to wrong read results.

    6.12.4 (2020-09-18)

    Public API Change

    • Reworked BackupableDBOptions::share_files_with_checksum_naming (new in 6.12) with some minor improvements and to better support those who were extracting files sizes from backup file names.

    6.12.3 (2020-09-16)

    πŸ› Bug fixes

    • πŸ›  Fixed a bug in size-amp-triggered and periodic-triggered universal compaction, where the compression settings for the first input level were used rather than the compression settings for the output (bottom) level.

    6.12.2 (2020-09-14)

    Public API Change

    • πŸ“‡ BlobDB now exposes the start of the expiration range of TTL blob files via the GetLiveFilesMetaData API.

    6.12.1 (2020-08-20)

    πŸ› Bug fixes

    • BackupEngine::CreateNewBackup could fail intermittently with non-OK status when backing up a read-write DB configured with a DBOptions::file_checksum_gen_factory. This issue has been worked-around such that CreateNewBackup should succeed, but (until fully fixed) BackupEngine might not see all checksums available in the DB.

    6.12 (2020-07-28)

    Public API Change

    • Encryption file classes now exposed for inheritance in env_encryption.h
    • File I/O listener is extended to cover more I/O operations. Now class EventListener in listener.h contains new callback functions: OnFileFlushFinish(), OnFileSyncFinish(), OnFileRangeSyncFinish(), OnFileTruncateFinish(), and OnFileCloseFinish().
    • FileOperationInfo now reports duration measured by std::chrono::steady_clock and start_ts measured by std::chrono::system_clock instead of start and finish timestamps measured by system_clock. Note that system_clock is called before steady_clock in program order at operation starts.
    • DB::GetDbSessionId(std::string& session_id) is added. session_id stores a unique identifier that gets reset every time the DB is opened. This DB session ID should be unique among all open DB instances on all hosts, and should be unique among re-openings of the same or other DBs. This identifier is recorded in the LOG file on the line starting with "DB Session ID:".
    • πŸš€ DB::OpenForReadOnly() now returns Status::NotFound when the specified DB directory does not exist. Previously the error returned depended on the underlying Env. This change is available in all 6.11 releases as well.
    • A parameter verify_with_checksum is added to BackupEngine::VerifyBackup, which is false by default. If it is ture, BackupEngine::VerifyBackup verifies checksums and file sizes of backup files. Pass false for verify_with_checksum to maintain the previous behavior and performance of BackupEngine::VerifyBackup, by only verifying sizes of backup files.

    Behavior Changes

    • Best-efforts recovery ignores CURRENT file completely. If CURRENT file is missing during recovery, best-efforts recovery still proceeds with MANIFEST file(s).
    • In best-efforts recovery, an error that is not Corruption or IOError::kNotFound or IOError::kPathNotFound will be overwritten silently. Fix this by checking all non-ok cases and return early.
    • When file_checksum_gen_factory is set to GetFileChecksumGenCrc32cFactory(), BackupEngine will compare the crc32c checksums of table files computed when creating a backup to the expected checksums stored in the DB manifest, and will fail CreateNewBackup() on mismatch (corruption). If the file_checksum_gen_factory is not set or set to any other customized factory, there is no checksum verification to detect if SST files in a DB are corrupt when read, copied, and independently checksummed by BackupEngine.
    • When a DB sets stats_dump_period_sec > 0, either as the initial value for DB open or as a dynamic option change, the first stats dump is staggered in the following X seconds, where X is an integer in [0, stats_dump_period_sec). Subsequent stats dumps are still spaced stats_dump_period_sec seconds apart.
    • When the paranoid_file_checks option is true, a hash is generated of all keys and values are generated when the SST file is written, and then the values are read back in to validate the file. A corruption is signaled if the two hashes do not match.

    πŸ› Bug fixes

    • πŸ›  Compressed block cache was automatically disabled with read-only DBs by mistake. Now it is fixed: compressed block cache will be in effective with read-only DB too.
    • πŸ›  Fix a bug of wrong iterator result if another thread finishes an update and a DB flush between two statement.
    • πŸ‘€ Disable file deletion after MANIFEST write/sync failure until db re-open or Resume() so that subsequent re-open will not see MANIFEST referencing deleted SSTs.
    • πŸ›  Fix a bug when index_type == kTwoLevelIndexSearch in PartitionedIndexBuilder to update FlushPolicy to point to internal key partitioner when it changes from user-key mode to internal-key mode in index partition.
    • πŸ‘‰ Make compaction report InternalKey corruption while iterating over the input.
    • πŸ›  Fix a bug which may cause MultiGet to be slow because it may read more data than requested, but this won't affect correctness. The bug was introduced in 6.10 release.
    • 🌲 Fail recovery and report once hitting a physical log record checksum mismatch, while reading MANIFEST. RocksDB should not continue processing the MANIFEST any further.

    πŸ†• New Features

    • DB identity (db_id) and DB session identity (db_session_id) are added to table properties and stored in SST files. SST files generated from SstFileWriter and Repairer have DB identity β€œSST Writer” and β€œDB Repairer”, respectively. Their DB session IDs are generated in the same way as DB::GetDbSessionId. The session ID for SstFileWriter (resp., Repairer) resets every time SstFileWriter::Open (resp., Repairer::Run) is called.
    • Added experimental option BlockBasedTableOptions::optimize_filters_for_memory for reducing allocated memory size of Bloom filters (~10% savings with Jemalloc) while preserving the same general accuracy. To have an effect, the option requires format_version=5 and malloc_usable_size. Enabling this option is forward and backward compatible with existing format_version=5.
    • BackupableDBOptions::share_files_with_checksum_naming is added with new default behavior for naming backup files with share_files_with_checksum, to address performance and backup integrity issues. See API comments for details.
    • Added auto resume function to automatically recover the DB from background Retryable IO Error. When retryable IOError happens during flush and WAL write, the error is mapped to Hard Error and DB will be in read mode. When retryable IO Error happens during compaction, the error will be mapped to Soft Error. DB is still in write/read mode. Autoresume function will create a thread for a DB to call DB->ResumeImpl() to try the recover for Retryable IO Error during flush and WAL write. Compaction will be rescheduled by itself if retryable IO Error happens. Auto resume may also cause other Retryable IO Error during the recovery, so the recovery will fail. Retry the auto resume may solve the issue, so we use max_bgerror_resume_count to decide how many resume cycles will be tried in total. If it is <=0, auto resume retryable IO Error is disabled. Default is INT_MAX, which will lead to a infinit auto resume. bgerror_resume_retry_interval decides the time interval between two auto resumes.
    • Option max_subcompactions can be set dynamically using DB::SetDBOptions().
    • Added experimental ColumnFamilyOptions::sst_partitioner_factory to define determine the partitioning of sst files. This helps compaction to split the files on interesting boundaries (key prefixes) to make propagation of sst files less write amplifying (covering the whole key space).

    🐎 Performance Improvements

    • Eliminate key copies for internal comparisons while accessing ingested block-based tables.
    • ⬇️ Reduce key comparisons during random access in all block-based tables.
    • BackupEngine avoids unnecessary repeated checksum computation for backing up a table file to the shared_checksum directory when using share_files_with_checksum_naming = kUseDbSessionId (new default), except on SST files generated before this version of RocksDB, which fall back on using kLegacyCrc32cAndFileSize.