ClickHouse v22.2 Release Notes
Release Date: 2022-02-17 // about 2 years ago-
β¬οΈ Upgrade Notes
- Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting
use_skip_indexes_if_final
is introduced and disabled by default). #34243 (Azat Khuzhin).
π New Feature
- Projections are production ready. Set
allow_experimental_projection_optimization
by default and deprecate this setting. #34456 (Nikolai Kochetov). - 0οΈβ£ An option to create a new files on insert for
File
/S3
/HDFS
engines. Allow to overwrite a file inHDFS
. Throw an exception in attempt to overwrite a file inS3
by default. Throw an exception in attempt to append data to file in formats that have a suffix (and thus don't support appends, likeParquet
,ORC
). Closes #31640 Closes #31622 Closes #23862 Closes #15022 Closes #16674. #33302 (Kruglov Pavel). - β Add a setting that allows a user to provide own deduplication semantic in
MergeTree
/ReplicatedMergeTree
If provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov). - β Add support of
DEFAULT
keyword for INSERT statements. Closes #6331. #33141 (Andrii Buriachevskyi). EPHEMERAL
column specifier is added toCREATE TABLE
query. Closes #9436. #34424 (yakov-olkhovskiy).- π Support
IF EXISTS
clause forTTL expr TO [DISK|VOLUME] [IF EXISTS] 'xxx'
feature. Parts will be moved to disk or volume only if it exists on replica, soMOVE TTL
rules will be able to behave differently on replicas according to the existing storage policies. Resolves #34455. #34504 (Anton Popov). - π Allow set default table engine and to create tables without specifying ENGINE. #34187 (Ilya Yatsishin).
- β Add table function
format(format_name, data)
. #34125 (Kruglov Pavel). - Detect format in
clickhouse-local
by file name even in the case when it is passed to stdin. #33829 (Kruglov Pavel). - β Add schema inference for
values
table function. Closes #33811. #34017 (Kruglov Pavel). - Dynamic reload of server TLS certificates on config reload. Closes #15764. #15765 (johnskopis). #31257 (Filatenkov Artur).
- Now ReplicatedMergeTree can recover data when some of its disks are broken. #13544 (Amos Bird).
- Fault-tolerant connections in clickhouse-client:
clickhouse-client ... --host host1 --host host2 --port port2 --host host3 --port port --host host4
. #34490 (Kruglov Pavel). #33824 (Filippov Denis). - β Add
DEGREES
andRADIANS
functions for MySQL compatibility. #33769 (Bharat Nallan). - β Add
h3ToCenterChild
function. #33313 (Bharat Nallan). Add new h3 miscellaneous functions:edgeLengthKm
,exactEdgeLengthKm
,exactEdgeLengthM
,exactEdgeLengthRads
,numHexagons
. #33621 (Bharat Nallan). - β Add function
bitSlice
to extract bit subsequences from String/FixedString. #33360 (RogerYK). - β
Implemented
meanZTest
aggregate function. #33354 (achimbab). - β Add confidence intervals to T-tests aggregate functions. #33260 (achimbab).
- β Add function
addressToLineWithInlines
. Close #26211. #33467 (SuperDJY). - β Added
#!
and#
as a recognised start of a single line comment. Closes #34138. #34230 (Aaron Katz).
Experimental Feature
- π Functions for text classification: language and charset detection. See #23271. #33314 (Nikolay Degterinsky).
- Add memory overcommit to
MemoryTracker
. Addedguaranteed
settings for memory limits which represent soft memory limits. In case when hard memory limit is reached,MemoryTracker
tries to cancel the most overcommited query. New settingmemory_usage_overcommit_max_wait_microseconds
specifies how long queries may wait another query to stop. Closes #28375. #31182 (Dmitry Novik). - Enable stream to table join in WindowView. #33729 (vxider).
- π Support
SET
,YEAR
,TIME
andGEOMETRY
data types inMaterializedMySQL
(experimental feature). Fixes #18091, #21536, #26361. #33429 (zzsmdfj). - π Fix various issues when projection is enabled by default. Each issue is described in separate commit. This is for #33678 . This fixes #34273. #34305 (Amos Bird).
π Performance Improvement
- Support
optimize_read_in_order
if prefix of sorting key is already sorted. E.g. if we have sorting keyORDER BY (a, b)
in table and query withWHERE a = const ORDER BY b
clauses, now it will be applied reading in order of sorting key instead of full sort. #32748 (Anton Popov). - π Improve performance of partitioned insert into table functions
URL
,S3
,File
,HDFS
. Closes #34348. #34510 (Maksim Kita). - π Multiple performance improvements of clickhouse-keeper. #34484 #34587 (zhanglistar).
- π
FlatDictionary
improve performance of dictionary data load. #33871 (Maksim Kita). - π Improve performance of
mapPopulateSeries
function. Closes #33944. #34318 (Maksim Kita). _file
and_path
virtual columns (in file-like table engines) are madeLowCardinality
- it will make queries for multiple files faster. Closes #34300. #34317 (flynn).- Speed up loading of data parts. It was not parallelized before: the setting
part_loading_threads
did not have effect. See #4699. #34310 (alexey-milovidov). - π Improve performance of
LineAsString
format. This closes #34303. #34306 (alexey-milovidov). - β‘οΈ Optimize
quantilesExact{Low,High}
to usenth_element
instead ofsort
. #34287 (Danila Kutenin). - π Slightly improve performance of
Regexp
format. #34202 (alexey-milovidov). - Minor improvement for analysis of scalar subqueries. #34128 (Federico Rodriguez).
- π Make ORDER BY tuple almost as fast as ORDER BY columns. We have special optimizations for multiple column ORDER BY: https://github.com/ClickHouse/ClickHouse/pull/10831 . It's beneficial to also apply to tuple columns. #34060 (Amos Bird).
- Rework and reintroduce the scalar subqueries cache to Materialized Views execution. #33958 (RaΓΊl MarΓn).
- π Slightly improve performance of
ORDER BY
by adding x86-64 AVX-512 support formemcmpSmall
functions to accelerate memory comparison. It works only if you compile ClickHouse by yourself. #33706 (hanqf-git). - π Improve
range_hashed
dictionary performance if for key there are a lot of intervals. Fixes #23821. #33516 (Maksim Kita). - π For inserts and merges into S3, write files in parallel whenever possible (TODO: check if it's merged). #33291 (Nikolai Kochetov).
- π Improve
clickhouse-keeper
performance and fix several memory leaks in NuRaft library. #33329 (alesapin).
π Improvement
- π Support asynchronous inserts in
clickhouse-client
for queries with inlined data. #34267 (Anton Popov). - Functions
dictGet
,dictHas
implicitly cast key argument to dictionary key structure, if they are different. #33672 (Maksim Kita). - π Improvements for
range_hashed
dictionaries. Improve performance of load time if there are multiple attributes. Allow to create a dictionary without attributes. Added option to specify strategy when intervalsstart
andend
haveNullable
typeconvert_null_range_bound_to_open
by default istrue
. Closes #29791. Allow to specifyFloat
,Decimal
,DateTime64
,Int128
,Int256
,UInt128
,UInt256
as range types.RangeHashedDictionary
added support for range values that extendInt64
type. Closes #28322. Added optionrange_lookup_strategy
to specify range lookup typemin
,max
by default ismin
. Closes #21647. Fixed allocated bytes calculations. Fixed type name insystem.dictionaries
in case ofComplexKeyHashedDictionary
. #33927 (Maksim Kita). - π
flat
,hashed
,hashed_array
dictionaries now support creating with empty attributes, with support of reading the keys and usingdictHas
. Fixes #33820. #33918 (Maksim Kita). - β Added support for
DateTime64
data type in dictionaries. #33914 (Maksim Kita). - Allow to write
s3(url, access_key_id, secret_access_key)
(autodetect of data format and table structure, but with explicit credentials). #34503 (Kruglov Pavel). - β Added sending of the output format back to client like it's done in HTTP protocol as suggested in #34362. Closes #34362. #34499 (Vitaly Baranov).
- Send ProfileEvents statistics in case of INSERT SELECT query (to display query metrics in
clickhouse-client
for this type of queries). #34498 (Dmitry Novik). - Recognize
.jsonl
extension for JSONEachRow format. #34496 (Kruglov Pavel). - π Improve schema inference in clickhouse-local. Allow to write just
clickhouse-local -q "select * from table" < data.format
. #34495 (Kruglov Pavel). - Privileges CREATE/ALTER/DROP ROW POLICY now can be granted on a table or on
database.*
as well as globally*.*
. #34489 (Vitaly Baranov). - Allow to export arbitrary large files to
s3
. Add two new settings:s3_upload_part_size_multiply_factor
ands3_upload_part_size_multiply_parts_count_threshold
. Now each times3_upload_part_size_multiply_parts_count_threshold
uploaded to S3 from a single querys3_min_upload_part_size
multiplied bys3_upload_part_size_multiply_factor
. Fixes #34244. #34422 (alesapin). - π Allow to skip not found (404) URLs for globs when using URL storage / table function. Also closes #34359. #34392 (Kseniia Sumarokova).
- 0οΈβ£ Default input and output formats for
clickhouse-local
that can be overriden by --input-format and --output-format. Close #30631. #34352 (ζζ¬). - Add options for
clickhouse-format
. Which close #30528 -max_query_size
-max_parser_depth
. #34349 (ζζ¬). - π Better handling of pre-inputs before client start. This is for #34308. #34336 (Amos Bird).
REGEXP_MATCHES
andREGEXP_REPLACE
function aliases for compatibility with PostgreSQL. Close #30885. #34334 (ζζ¬).- Some servers expect a User-Agent header in their HTTP requests. A
User-Agent
header entry has been added to HTTP requests of the form: User-Agent: ClickHouse/VERSION_STRING. #34330 (Saad Ur Rahman). - π Cancel merges before acquiring table lock for
TRUNCATE
query to avoidDEADLOCK_AVOIDED
error in some cases. Fixes #34302. #34304 (tavplubix). - π Change severity of the "Cancelled merging parts" message in logs, because it's not an error. This closes #34148. #34232 (alexey-milovidov).
- β Add ability to compose PostgreSQL-style cast operator
::
with expressions using[]
and.
operators (array and tuple indexing). #34229 (Nikolay Degterinsky). - π Recognize
YYYYMMDD-hhmmss
format inparseDateTimeBestEffort
function. This closes #34206. #34208 (alexey-milovidov). - π Allow carriage return in the middle of the line while parsing by
Regexp
format. This closes #34200. #34205 (alexey-milovidov). - π Allow to parse dictionary's
PRIMARY KEY
asPRIMARY KEY (id, value)
; previously supported onlyPRIMARY KEY id, value
. Closes #34135. #34141 (Maksim Kita). - An optional argument for
splitByChar
to limit the number of resulting elements. close #34081. #34140 (ζζ¬). - Improving the experience of multiple line editing for clickhouse-client. This is a follow-up of #31123. #34114 (Amos Bird).
- β Add
UUID
suport inMsgPack
input/output format. #34065 (Kruglov Pavel). - π Tracing context (for OpenTelemetry) is now propagated from GRPC client metadata (this change is relevant for GRPC client-server protocol). #34064 (andremarianiello).
- π Supports all types of
SYSTEM
queries withON CLUSTER
clause. #34005 (ε°θ·―). - Improve memory accounting for queries that are using less than
max_untracker_memory
. #34001 (Azat Khuzhin). - π Fixed UTF-8 string case-insensitive search when lowercase and uppercase characters are represented by different number of bytes. Example is
αΊ
andΓ
. This closes #7334. #33992 (Harry Lee). - Detect format and schema from stdin in
clickhouse-local
. #33960 (Kruglov Pavel). - Correctly handle the case of misconfiguration when multiple disks are using the same path on the filesystem. #29072. #33905 (zhongyuankai).
- Try every resolved IP address while getting S3 proxy. S3 proxies are rarely used, mostly in Yandex Cloud. #33862 (Nikolai Kochetov).
- π Support EXPLAIN AST CREATE FUNCTION query
EXPLAIN AST CREATE FUNCTION mycast AS (n) -> cast(n as String)
will returnEXPLAIN AST CREATE FUNCTION mycast AS n -> CAST(n, 'String')
. #33819 (ζζ¬). - β Added support for cast from
Map(Key, Value)
toArray(Tuple(Key, Value))
. #33794 (Maksim Kita). - β Add some improvements and fixes for
Bool
data type. Fixes #33244. #33737 (Kruglov Pavel). - π Parse and store OpenTelemetry trace-id in big-endian order. #33723 (Frank Chen).
- π Improvement for
fromUnixTimestamp64
family functions.. They now accept any integer value that can be converted toInt64
. This closes: #14648. #33505 (Andrey Zvonov). - Reimplement
_shard_num
from constants (see #7624) withshardNum()
function (seee #27020), to avoid possible issues (like those that had been found in #16947). #33392 (Azat Khuzhin). - β Enable binary arithmetic (plus, minus, multiply, division, least, greatest) between Decimal and Float. #33355 (flynn).
- Respect cgroups limits in max_threads autodetection. #33342 (JaySon).
- Add new clickhouse-keeper setting
min_session_timeout_ms
. Now clickhouse-keeper will determine client session timeout according tomin_session_timeout_ms
andsession_timeout_ms
settings. #33288 (JackyWoo). - β Added
UUID
data type support for functionshex
andbin
. #32170 (Frank Chen). - π Fix reading of subcolumns with dots in their names. In particular fixed reading of
Nested
columns, if their element names contain dots (e.gNested(`keys.name` String, `keys.id` UInt64, values UInt64)
). #34228 (Anton Popov). - Fixes
parallel_view_processing = 0
not working when inserting into a table usingVALUES
. - Fixesview_duration_ms
in thequery_views_log
not being set correctly for materialized views. #34067 (RaΓΊl MarΓn). - π Fix parsing tables structure from ZooKeeper: now metadata from ZooKeeper compared with local metadata in canonical form. It helps when canonical function names can change between ClickHouse versions. #33933 (sunny).
- Properly escape some characters for interaction with LDAP. #33401 (IlyaTsoi).
π Build/Testing/Packaging Improvement
- β Remove unbundled build support. #33690 (Azat Khuzhin).
- β Ensure that tests don't depend on the result of non-stable sorting of equal elements. Added equal items ranges randomization in debug after sort to prevent issues when we rely on equal items sort order. #34393 (Maksim Kita).
- β Add verbosity to a style check. #34289 (Mikhail f. Shiryaev).
- β Remove
clickhouse-test
debian package because it's obsolete. #33948 (Ilya Yatsishin). - π· Multiple improvements for build system to remove the possibility of occasionally using packages from the OS and to enforce hermetic builds. #33695 (Amos Bird).
π Bug Fix (user-visible misbehaviour in official stable or prestable release)
- Fixed the assertion in case of using
allow_experimental_parallel_reading_from_replicas
withmax_parallel_replicas
equals to 1. This fixes #34525. #34613 (Nikita Mikhaylov). - π Fix rare bug while reading of empty arrays, which could lead to
Data compressed with different methods
error. It can reproduce if you have mostly empty arrays, but not always. And reading is performed in backward direction with ORDER BY ... DESC. This error is extremely unlikely to happen. #34327 (Anton Popov). - π Fix wrong result of
round
/roundBankers
if integer values of small types are rounded. Closes #33267. #34562 (ζζ¬). - π Sometimes query cancellation did not work immediately when we were reading multiple files from s3 or HDFS. Fixes #34301 Relates to #34397. #34539 (Dmitry Novik).
- Fix exception
Chunk should have AggregatedChunkInfo in MergingAggregatedTransform
(in case ofoptimize_aggregation_in_order = 1
anddistributed_aggregation_memory_efficient = 0
). Fixes #34526. #34532 (Anton Popov). - π Fix comparison between integers and floats in index analysis. Previously it could lead to skipping some granules for reading by mistake. Fixes #34493. #34528 (Anton Popov).
- π Fix compression support in URL engine. #34524 (Frank Chen).
- π Fix possible error 'file_size: Operation not supported' in files' schema autodetection. #34479 (Kruglov Pavel).
- π Fixes possible race with table deletion. #34416 (Kseniia Sumarokova).
- π Fix possible error
Cannot convert column Function to mask
in short circuit function evaluation. Closes #34171. #34415 (Kruglov Pavel). - π Fix potential crash when doing schema inference from url source. Closes #34147. #34405 (Kruglov Pavel).
- For UDFs access permissions were checked for database level instead of global level as it should be. Closes #34281. #34404 (Maksim Kita).
- π Fix wrong engine syntax in result of
SHOW CREATE DATABASE
query for databases with engineMemory
. This closes #34335. #34345 (alexey-milovidov). - π Fixed a couple of extremely rare race conditions that might lead to broken state of replication queue and "intersecting parts" error. #34297 (tavplubix).
- π Fix progress bar width. It was incorrectly rounded to integer number of characters. #34275 (alexey-milovidov).
- π Fix current_user/current_address client information fields for inter-server communication (before this patch current_user/current_address will be preserved from the previous query). #34263 (Azat Khuzhin).
- Fix memory leak in case of some Exception during query processing with
optimize_aggregation_in_order=1
. #34234 (Azat Khuzhin). - π Fix metric
Query
, which shows the number of executing queries. In last several releases it was always 0. #34224 (Anton Popov). - π Fix schema inference for table runction
s3
. #34186 (Kruglov Pavel). - π Fix rare and benign race condition in
HDFS
,S3
andURL
storage engines which can lead to additional connections. #34172 (alesapin). - π Fix bug which can rarely lead to error "Cannot read all data" while reading LowCardinality columns of MergeTree table engines family which stores data on remote file system like S3 (virtual filesystem over s3 is an experimental feature that is not ready for production). #34139 (alesapin).
- π Fix inserts to distributed tables in case of a change of native protocol. The last change was in the version 22.1, so there may be some failures of inserts to distributed tables after upgrade to that version. #34132 (Anton Popov).
- π Fix possible data race in
File
table engine that was introduced in #33960. Closes #34111. #34113 (Kruglov Pavel). - π Fixed minor race condition that might cause "intersecting parts" error in extremely rare cases after ZooKeeper connection loss. #34096 (tavplubix).
- π Fix asynchronous inserts with
Native
format. #34068 (Anton Popov). - Fix bug which lead to inability for server to start when both replicated access storage and keeper (embedded in clickhouse-server) are used. Introduced two settings for keeper socket timeout instead of settings from default user:
keeper_server.socket_receive_timeout_sec
andkeeper_server.socket_send_timeout_sec
. Fixes #33973. #33988 (alesapin). - π Fix segfault while parsing ORC file with corrupted footer. Closes #33797. #33984 (Kruglov Pavel).
- π Fix parsing IPv6 from query parameter (prepared statements) and fix IPv6 to string conversion. Closes #33928. #33971 (Kruglov Pavel).
- π Fix crash while reading of nested tuples. Fixes #33838. #33956 (Anton Popov).
- π Fix usage of functions
array
andtuple
with literal arguments in distributed queries. Previously it could lead toNot found columns
exception. #33938 (Anton Popov). - Aggregate function combinator
-If
did not correctly processNullable
filter argument. This closes #27073. #33920 (alexey-milovidov). - π Fix potential race condition when doing remote disk read (virtual filesystem over s3 is an experimental feature that is not ready for production). #33912 (Amos Bird).
- π Fix crash if SQL UDF is created with lambda with non identifier arguments. Closes #33866. #33868 (Maksim Kita).
- Fix usage of sparse columns (which can be enabled by experimental setting
ratio_of_defaults_for_sparse_serialization
). #33849 (Anton Popov). - π Fixed
replica is not readonly
logical error onSYSTEM RESTORE REPLICA
query when replica is actually readonly. Fixes #33806. #33847 (tavplubix). - π Fix memory leak in
clickhouse-keeper
in case of compression is used (default). #33840 (Azat Khuzhin). - π Fix index analysis with no common types available. #33833 (Amos Bird).
- π Fix schema inference for
JSONEachRow
andJSONCompactEachRow
. #33830 (Kruglov Pavel). - π Fix usage of external dictionaries with
redis
source and large number of keys. #33804 (Anton Popov). - π Fix bug in client that led to 'Connection reset by peer' in server. Closes #33309. #33790 (Kruglov Pavel).
- π Fix parsing query INSERT INTO ... VALUES SETTINGS ... (...), ... #33776 (Kruglov Pavel).
- π Fix bug of check table when creating data part with wide format and projection. #33774 (ζζ¬).
- Fix tiny race between count() and INSERT/merges/... in MergeTree (it is possible to return incorrect number of rows for SELECT with optimize_trivial_count_query). #33753 (Azat Khuzhin).
- π» Throw exception when directory listing request has failed in storage HDFS. #33724 (LiuNeng).
- π Fix mutation when table contains projections. This fixes #33010. This fixes #33275. #33679 (Amos Bird).
- Correctly determine current database if
CREATE TEMPORARY TABLE AS SELECT
is queried inside a named HTTP session. This is a very rare use case. This closes #8340. #33676 (alexey-milovidov). - π Allow some queries with sorting, LIMIT BY, ARRAY JOIN and lambda functions. This closes #7462. #33675 (alexey-milovidov).
- π Fix bug in "zero copy replication" (a feature that is under development and should not be used in production) which lead to data duplication in case of TTL move. Fixes #33643. #33642 (alesapin).
- Fix
Chunk should have AggregatedChunkInfo in GroupingAggregatedTransform
(in case ofoptimize_aggregation_in_order = 1
). #33637 (Azat Khuzhin). - π Fix error
Bad cast from type ... to DB::DataTypeArray
which may happen when table hasNested
column with dots in name, and default value is generated for it (e.g. during insert, when column is not listed). Continuation of #28762. #33588 (Alexey Pavlenko). - π Export into
lz4
files has been fixed. Closes #31421. #31862 (Kruglov Pavel). - Fix potential crash if
group_by_overflow_mode
was set toany
(approximate GROUP BY) and aggregation was performed by single column of typeLowCardinality
. #34506 (DR). - π Fix inserting to temporary tables via gRPC client-server protocol. Fixes #34347, issue
#2
. #34364 (Vitaly Baranov). - π Fix issue #19429. #34225 (Vitaly Baranov).
- π Fix issue #18206. #33977 (Vitaly Baranov).
- β This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). #33574 (Vitaly Baranov).
- Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting