RocksDB v6.25.0 Release Notes

Release Date: 2021-09-20 // over 2 years ago
  • ๐Ÿ› Bug Fixes

    • ๐Ÿ‘ Allow secondary instance to refresh iterator. Assign read seq after referencing SuperVersion.
    • ๐Ÿ›  Fixed a bug of secondary instance's last_sequence going backward, and reads on the secondary fail to see recent updates from the primary.
    • ๐Ÿ›  Fixed a bug that could lead to duplicate DB ID or DB session ID in POSIX environments without /proc/sys/kernel/random/uuid.
    • ๐Ÿ›  Fix a race in DumpStats() with column family destruction due to not taking a Ref on each entry while iterating the ColumnFamilySet.
    • ๐Ÿ›  Fix a race in item ref counting in LRUCache when promoting an item from the SecondaryCache.
    • ๐Ÿ›  Fix a race in BackupEngine if RateLimiter is reconfigured during concurrent Restore operations.
    • ๐Ÿ›  Fix a bug on POSIX in which failure to create a lock file (e.g. out of space) can prevent future LockFile attempts in the same process on the same file from succeeding.
    • Fix a bug that backup_rate_limiter and restore_rate_limiter in BackupEngine could not limit read rates.
    • Fix the implementation of prepopulate_block_cache = kFlushOnly to only apply to flushes rather than to all generated files.
    • Fix WAL log data corruption when using DBOptions.manual_wal_flush(true) and WriteOptions.sync(true) together. The sync WAL should work with locked log_write_mutex_.
    • โž• Add checks for validity of the IO uring completion queue entries, and fail the BlockBasedTableReader MultiGet sub-batch if there's an invalid completion
    • โž• Add an interface RocksDbIOUringEnable() that, if defined by the user, will allow them to enable/disable the use of IO uring by RocksDB
    • ๐Ÿ›  Fix the bug that when direct I/O is used and MultiRead() returns a short result, RandomAccessFileReader::MultiRead() still returns full size buffer, with returned short value together with some data in original buffer. This bug is unlikely cause incorrect results, because (1) since FileSystem layer is expected to retry on short result, returning short results is only possible when asking more bytes in the end of the file, which RocksDB doesn't do when using MultiRead(); (2) checksum is unlikely to match.

    ๐Ÿ†• New Features

    • RemoteCompaction's interface now includes db_name, db_id, session_id, which could help the user uniquely identify compaction job between db instances and sessions.
    • โž• Added a ticker statistic, "rocksdb.verify_checksum.read.bytes", reporting how many bytes were read from file to serve VerifyChecksum() and VerifyFileChecksums() queries.
    • โž• Added ticker statistics, "rocksdb.backup.read.bytes" and "rocksdb.backup.write.bytes", reporting how many bytes were read and written during backup.
    • โž• Added properties for BlobDB: rocksdb.num-blob-files, rocksdb.blob-stats, rocksdb.total-blob-file-size, and rocksdb.live-blob-file-size. The existing property rocksdb.estimate_live-data-size was also extended to include live bytes residing in blob files.
    • ๐Ÿ‘‰ Added two new RateLimiter IOPriorities: Env::IO_USER,Env::IO_MID. Env::IO_USER will have superior priority over all other RateLimiter IOPriorities without being subject to fair scheduling constraint.
    • ๐Ÿ‘ SstFileWriter now supports Puts and Deletes with user-defined timestamps. Note that the ingestion logic itself is not timestamp-aware yet.
    • ๐Ÿ‘ Allow a single write batch to include keys from multiple column families whose timestamps' formats can differ. For example, some column families may disable timestamp, while others enable timestamp.
    • โž• Add compaction priority information in RemoteCompaction, which can be used to schedule high priority job first.
    • โž• Added new callback APIs OnBlobFileCreationStarted,OnBlobFileCreatedand OnBlobFileDeleted in EventListener class of listener.h. It notifies listeners during creation/deletion of individual blob files in Integrated BlobDB. It also log blob file creation finished event and deletion event in LOG file.
    • Batch blob read requests for DB::MultiGet using MultiRead.
    • โž• Add support for fallback to local compaction, the user can return CompactionServiceJobStatus::kUseLocal to instruct RocksDB to run the compaction locally instead of waiting for the remote compaction result.
    • Add built-in rate limiter's implementation of RateLimiter::GetTotalPendingRequest(int64_t* total_pending_requests, const Env::IOPriority pri) for the total number of requests that are pending for bytes in the rate limiter.
    • Charge memory usage during data buffering, from which training samples are gathered for dictionary compression, to block cache. Unbuffering data can now be triggered if the block cache becomes full and strict_capacity_limit=true for the block cache, in addition to existing conditions that can trigger unbuffering.

    Public API change

    • โœ‚ Remove obsolete implementation details FullKey and ParseFullKey from public API
    • Change SstFileMetaData::size from size_t to uint64_t.
    • Made Statistics extend the Customizable class and added a CreateFromString method. Implementations of Statistics need to be registered with the ObjectRegistry and to implement a Name() method in order to be created via this method.
    • Extended FlushJobInfo and CompactionJobInfo in listener.h to provide information about the blob files generated by a flush/compaction and garbage collected during compaction in Integrated BlobDB. Added struct members blob_file_addition_infos and blob_file_garbage_infos that contain this information.
    • Extended parameter output_file_names of CompactFiles API to also include paths of the blob files generated by the compaction in Integrated BlobDB.
    • โšก๏ธ Most BackupEngine functions now return IOStatus instead of Status. Most existing code should be compatible with this change but some calls might need to be updated.
    • Add a new field level_at_creation in TablePropertiesCollectorFactory::Context to capture the level at creating the SST file (i.e, table), of which the properties are being collected.

    Miscellaneous

    • โž• Add a paranoid check where in case FileSystem layer doesn't fill the buffer but returns succeed, checksum is unlikely to match even if buffer contains a previous block. The byte modified is not useful anyway, so it isn't expected to change any behavior when FileSystem is satisfying its contract.