All Versions
345
Latest Version
Avg Release Cycle
33 days
Latest Release
377 days ago
Changelog History
Page 1
Changelog History
Page 1
-
v22.3-lts Changes
March 17, 2022Backward Incompatible Change
- โช Make
arrayCompact
function behave as other higher-order functions: perform compaction not of lambda function results but on the original array. If you're using nontrivial lambda functions in arrayCompact you may restore old behaviour by wrappingarrayCompact
arguments intoarrayMap
. Closes #34010 #18535 #14778. #34795 (Alexandre Snarskii). - ๐ Change implementation specific behavior on overflow of function
toDatetime
. It will be saturated to the nearest min/max supported instant of datetime instead of wraparound. This change is highlighted as "backward incompatible" because someone may unintentionally rely on the old behavior. #32898 (HaiBo Li). - Make function
cast(value, 'IPv4')
,cast(value, 'IPv6')
behave same astoIPv4
,toIPv6
functions. Changed behavior of incorrect IP address passed into functionstoIPv4
,toIPv6
, now if invalid IP address passes into this functions exception will be raised, before this function return default value. Added functionsIPv4StringToNumOrDefault
,IPv4StringToNumOrNull
,IPv6StringToNumOrDefault
,IPv6StringOrNull
toIPv4OrDefault
,toIPv4OrNull
,toIPv6OrDefault
,toIPv6OrNull
. FunctionsIPv4StringToNumOrDefault
,toIPv4OrDefault
,toIPv6OrDefault
should be used if previous logic relied onIPv4StringToNum
,toIPv4
,toIPv6
returning default value for invalid address. Added settingcast_ipv4_ipv6_default_on_conversion_error
, if this setting enabled, then IP address conversion functions will behave as before. Closes #22825. Closes #5799. Closes #35156. #35240 (Maksim Kita).
๐ New Feature
- ๐ Support for caching data locally for remote filesystems. It can be enabled for
s3
disks. Closes #28961. #33717 (Kseniia Sumarokova). In the meantime, we enabled the test suite on s3 filesystem and no more known issues exist, so it is started to be production ready. - โ Add new table function
hive
. It can be used as followshive('<hive metastore url>', '<hive database>', '<hive table name>', '<columns definition>', '<partition columns>')
for exampleSELECT * FROM hive('thrift://hivetest:9083', 'test', 'demo', 'id Nullable(String), score Nullable(Int32), day Nullable(String)', 'day')
. #34946 (lgbo). - ๐ Support authentication of users connected via SSL by their X.509 certificate. #31484 (eungenue).
- ๐ Support schema inference for inserting into table functions
file
/hdfs
/s3
/url
. #34732 (Kruglov Pavel). - Now you can read
system.zookeeper
table without restrictions on path or usinglike
expression. This reads can generate quite heavy load for zookeeper so to enable this ability you have to enable settingallow_unrestricted_reads_from_keeper
. #34609 (Sergei Trifonov). - Display CPU and memory metrics in clickhouse-local. Close #34545. #34605 (ๆๆฌ).
- Implement
startsWith
andendsWith
function for arrays, closes #33982. #34368 (usurai). - โ Add three functions for Map data type: 1.
mapReplace(map1, map2)
- replaces values for keys in map1 with the values of the corresponding keys in map2; adds keys from map2 that don't exist in map1. 2.mapFilter
3.mapMap
. mapFilter and mapMap are higher order functions, accepting two arguments, the first argument is a lambda function with k, v pair as arguments, the second argument is a column of type Map. #33698 (hexiaoting). - ๐ Allow getting default user and password for clickhouse-client from the
CLICKHOUSE_USER
andCLICKHOUSE_PASSWORD
environment variables. Close #34538. #34947 (DR).
Experimental Feature
- ๐ New data type
Object(<schema_format>)
, which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.gdata.key1.key2
or with cast operatordata.key1.key2::Int64
. - Add
database_replicated_allow_only_replicated_engine
setting. When enabled, it only allowed to only createReplicated
tables or tables with stateless engines inReplicated
databases. #35214 (Nikolai Kochetov). Note thatReplicated
database is still an experimental feature.
๐ Performance Improvement
- ๐ Improve performance of insertion into
MergeTree
tables by optimizing sorting. Up to 2x improvement is observed on realistic benchmarks. #34750 (Maksim Kita). - Columns pruning when reading Parquet, ORC and Arrow files from URL and S3. Closes #34163. #34849 (Kseniia Sumarokova).
- Columns pruning when reading Parquet, ORC and Arrow files from Hive. #34954 (lgbo).
- ๐ A bunch of performance optimizations from a performance superhero. Improve performance of processing queries with large
IN
section. Improve performance ofdirect
dictionary if its source isClickHouse
. Improve performance ofdetectCharset
,detectLanguageUnknown
functions. #34888 (Maksim Kita). - ๐ Improve performance of
any
aggregate function by using more batching. #34760 (Raรบl Marรญn). - ๐ Multiple improvements for performance of
clickhouse-keeper
: less locking #35010 (zhanglistar), lower memory usage by streaming reading and writing of snapshot instead of full copy. #34584 (zhanglistar), optimizing compaction of log store in the RAFT implementation. #34534 (zhanglistar), versioning of the internal data structure #34486 (zhanglistar).
๐ Improvement
- ๐ Allow asynchronous inserts to table functions. Fixes #34864. #34866 (Anton Popov).
- Implicit type casting of the key argument for functions
dictGetHierarchy
,dictIsIn
,dictGetChildren
,dictGetDescendants
. Closes #34970. #35027 (Maksim Kita). EXPLAIN AST
query can output AST in form of a graph in Graphviz format:EXPLAIN AST graph = 1 SELECT * FROM system.parts
. #35173 (ๆๆฌ).- When large files were written with
s3
table function or table engine, the content type on the files was mistakenly set toapplication/xml
due to a bug in the AWS SDK. This closes #33964. #34433 (Alexey Milovidov). - ๐ Change restrictive row policies a bit to make them an easier alternative to permissive policies in easy cases. If for a particular table only restrictive policies exist (without permissive policies) users will be able to see some rows. Also
SHOW CREATE ROW POLICY
will always showAS permissive
orAS restrictive
in row policy's definition. #34596 (Vitaly Baranov). - ๐ Improve schema inference with globs in File/S3/HDFS/URL engines. Try to use the next path for schema inference in case of error. #34465 (Kruglov Pavel).
- ๐ป Play UI now correctly detects the preferred light/dark theme from the OS. #35068 (peledni).
- Added
date_time_input_format = 'best_effort_us'
. Closes #34799. #34982 (WenYao). - A new settings called
allow_plaintext_password
andallow_no_password
are added in server configuration which turn on/off authentication types that can be potentially insecure in some environments. They are allowed by default. #34738 (Heena Bansal). - ๐ Support for
DateTime64
data type inArrow
format, closes #8280 and closes #28574. #34561 (ๆๆฌ). - Reload
remote_url_allow_hosts
(filtering of outgoing connections) on config update. #35294 (Nikolai Kochetov). - ๐ Support
--testmode
parameter forclickhouse-local
. This parameter enables interpretation of test hints that we use in functional tests. #35264 (Kseniia Sumarokova). - ๐ฒ Add
distributed_depth
to query log. It is like a more detailed variant ofis_initial_query
#35207 (ๆๆฌ). - Respect
remote_url_allow_hosts
forMySQL
andPostgreSQL
table functions. #35191 (Heena Bansal). - Added
disk_name
field tosystem.part_log
. #35178 (Artyom Yurkov). - Do not retry non-rertiable errors when querying remote URLs. Closes #35161. #35172 (Kseniia Sumarokova).
- Support distributed INSERT SELECT queries (the setting
parallel_distributed_insert_select
) table functionview()
. #35132 (Azat Khuzhin). - More precise memory tracking during
INSERT
intoBuffer
withAggregateFunction
. #35072 (Azat Khuzhin). - ๐ง Avoid division by zero in Query Profiler if Linux kernel has a bug. Closes #34787. #35032 (Alexey Milovidov).
- โ Add more sanity checks for keeper configuration: now mixing of localhost and non-local servers is not allowed, also add checks for same value of internal raft port and keeper client port. #35004 (alesapin).
- ๐ Currently, if the user changes the settings of the system tables there will be tons of logs and ClickHouse will rename the tables every minute. This fixes #34929. #34949 (Nikita Mikhaylov).
- ๐ Use connection pool for Hive metastore client. #34940 (lgbo).
- ๐ Ignore per-column
TTL
inCREATE TABLE AS
if new table engine does not support it (i.e. if the engine is not ofMergeTree
family). #34938 (Azat Khuzhin). - Allow
LowCardinality
strings forngrambf_v1
/tokenbf_v1
indexes. Closes #21865. #34911 (Lars Hiller Eidnes). - ๐ Allow opening empty sqlite db if the file doesn't exist. Closes #33367. #34907 (Kseniia Sumarokova).
- Implement memory statistics for FreeBSD - this is required for
max_server_memory_usage
to work correctly. #34902 (Alexandre Snarskii). - In previous versions the progress bar in clickhouse-client can jump forward near 50% for no reason. This closes #34324. #34801 (Alexey Milovidov).
- ๐ Now
ALTER TABLE DROP COLUMN columnX
queries forMergeTree
table engines will work instantly whencolumnX
is anALIAS
column. Fixes #34660. #34786 (alesapin). - ๐ Show hints when user mistyped the name of a data skipping index. Closes #29698. #34764 (flynn).
- Support
remote()
/cluster()
table functions forparallel_distributed_insert_select
. #34728 (Azat Khuzhin). - ๐ง Do not reset logging that configured via
--log-file
/--errorlog-file
command line options in case of empty configuration in the config file. #34718 (Amos Bird). - Extract schema only once on table creation and prevent reading from local files/external sources to extract schema on each server startup. #34684 (Kruglov Pavel).
- ๐ Allow specifying argument names for executable UDFs. This is necessary for formats where argument name is part of serialization, like
Native
,JSONEachRow
. Closes #34604. #34653 (Maksim Kita). MaterializedMySQL
(experimental feature) now supportsmaterialized_mysql_tables_list
(a comma-separated list of MySQL database tables, which will be replicated by the MaterializedMySQL database engine. Default value: empty list โ means all the tables will be replicated), mentioned at #32977. #34487 (zzsmdfj).- ๐ Improve OpenTelemetry span logs for INSERT operation on distributed table. #34480 (Frank Chen).
- ๐ Make the znode
ctime
andmtime
consistent between servers in ClickHouse Keeper. #33441 (ๅฐ่ทฏ).
๐ Build/Testing/Packaging Improvement
- Package repository is migrated to JFrog Artifactory (Mikhail f. Shiryaev).
- โ Randomize some settings in functional tests, so more possible combinations of settings will be tested. This is yet another fuzzing method to ensure better test coverage. This closes #32268. #34092 (Kruglov Pavel).
- โฌ๏ธ Drop PVS-Studio from our CI. #34680 (Mikhail f. Shiryaev).
- โ Add an ability to build stripped binaries with CMake. In previous versions it was performed by dh-tools. #35196 (alesapin).
- ๐ Smaller "fat-free"
clickhouse-keeper
build. #35031 (alesapin). - ๐ Use @robot-clickhouse as an author and committer for PRs like https://github.com/ClickHouse/ClickHouse/pull/34685. #34793 (Mikhail f. Shiryaev).
- ๐ Limit DWARF version for debug info by 4 max, because our internal stack symbolizer cannot parse DWARF version 5. This makes sense if you compile ClickHouse with clang-15. #34777 (Alexey Milovidov).
- โ Remove
clickhouse-test
debian package as unneeded complication. CI use tests from repository and standalone testing via deb package is no longer supported. #34606 (Ilya Yatsishin).
๐ Bug Fix (user-visible misbehaviour in official stable or prestable release)
- A fix for HDFS integration: When the inner buffer size is too small, NEED_MORE_INPUT in
HadoopSnappyDecoder
will run multi times (>=3) for one compressed block. This makes the input data be copied into the wrong place inHadoopSnappyDecoder::buffer
. #35116 (lgbo). - ๐ Ignore obsolete grants in ATTACH GRANT statements. This PR fixes #34815. #34855 (Vitaly Baranov).
- ๐ Fix segfault in Postgres database when getting create table query if database was created using named collections. Closes #35312. #35313 (Kseniia Sumarokova).
- ๐ Fix partial merge join duplicate rows bug, close #31009. #35311 (Vladimir C).
- Fix possible
Assertion 'position() != working_buffer.end()' failed
while using bzip2 compression with smallmax_read_buffer_size
setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35300 (Kruglov Pavel). While using lz4 compression with a small max_read_buffer_size setting value. #35296 (Kruglov Pavel). While using lzma compression with smallmax_read_buffer_size
setting value. #35295 (Kruglov Pavel). While usingbrotli
compression with a smallmax_read_buffer_size
setting value. The bug was found in https://github.com/ClickHouse/ClickHouse/pull/35047. #35281 (Kruglov Pavel). - ๐ Fix possible segfault in
JSONEachRow
schema inference. #35291 (Kruglov Pavel). - ๐ Fix
CHECK TABLE
query in case when sparse columns are enabled in table. #35274 (Anton Popov). - ๐ป Avoid std::terminate in case of exception in reading from remote VFS. #35257 (Azat Khuzhin).
- ๐ Fix reading port from config, close #34776. #35193 (Vladimir C).
- ๐ Fix error in query with
WITH TOTALS
in case ifHAVING
returned empty result. This fixes #33711. #35186 (Amos Bird). - ๐ Fix a corner case of
replaceRegexpAll
, close #35117. #35182 (Vladimir C). - Schema inference didn't work properly on case of
INSERT INTO FUNCTION s3(...) FROM ...
, it tried to read schema from s3 file instead of from select query. #35176 (Kruglov Pavel). - ๐ Fix MaterializedPostgreSQL (experimental feature)
table overrides
for partition by, etc. Closes #35048. #35162 (Kseniia Sumarokova). - ๐ Fix MaterializedPostgreSQL (experimental feature) adding new table to replication (ATTACH TABLE) after manually removing (DETACH TABLE). Closes #33800. Closes #34922. Closes #34315. #35158 (Kseniia Sumarokova).
- ๐ Fix partition pruning error when non-monotonic function is used with IN operator. This fixes #35136. #35146 (Amos Bird).
- ๐ Fixed slightly incorrect translation of YAML configs to XML. #35135 (Miel Donkers).
- Fix
optimize_skip_unused_shards_rewrite_in
for signed columns and negative values. #35134 (Azat Khuzhin). - โก๏ธ The
update_lag
external dictionary configuration option was unusable showing the error messageUnexpected key `update_lag` in dictionary source configuration
. #35089 (Jason Chu). - Avoid possible deadlock on server shutdown. #35081 (Azat Khuzhin).
- Fix missing alias after function is optimized to a subcolumn when setting
optimize_functions_to_subcolumns
is enabled. Closes #33798. #35079 (qieqieplus). - ๐ Fix reading from
system.asynchronous_inserts
table if there exists asynchronous insert into table function. #35050 (Anton Popov). - ๐ Fix possible exception
Reading for MergeTree family tables must be done with last position boundary
(relevant to operation on remote VFS). Closes #34979. #35001 (Kseniia Sumarokova). - ๐ Fix unexpected result when use -State type aggregate function in window frame. #34999 (metahys).
- ๐ Fix possible segfault in FileLog (experimental feature). Closes #30749. #34996 (Kseniia Sumarokova).
- ๐ Fix possible rare error
Cannot push block to port which already has data
. #34993 (Nikolai Kochetov). - ๐ Fix wrong schema inference for unquoted dates in CSV. Closes #34768. #34961 (Kruglov Pavel).
- Integration with Hive: Fix unexpected result when use
in
inwhere
in hive query. #34945 (lgbo). - Avoid busy polling in ClickHouse Keeper while searching for changelog files to delete. #34931 (Azat Khuzhin).
- ๐ Fix DateTime64 conversion from PostgreSQL. Closes #33364. #34910 (Kseniia Sumarokova).
- ๐ Fix possible "Part directory doesn't exist" during
INSERT
into MergeTree table backed by VFS over s3. #34876 (Azat Khuzhin). - ๐ Support DDLs like CREATE USER to be executed on cross replicated cluster. #34860 (Jianmei Zhang).
- ๐ Fix bugs for multiple columns group by in
WindowView
(experimental feature). #34859 (vxider). - ๐ Fix possible failures in S2 functions when queries contain const columns. #34745 (Bharat Nallan).
- ๐ Fix bug for H3 funcs containing const columns which cause queries to fail. #34743 (Bharat Nallan).
- Fix
No such file or directory
with enabledfsync_part_directory
and vertical merge. #34739 (Azat Khuzhin). - ๐ Fix serialization/printing for system queries
RELOAD MODEL
,RELOAD FUNCTION
,RESTART DISK
when usedON CLUSTER
. Closes #34514. #34696 (Maksim Kita). - Fix
allow_experimental_projection_optimization
withenable_global_with_statement
(before it may lead toStack size too large
error in case of multiple expressions inWITH
clause, and also it executes scalar subqueries again and again, so not it will be more optimal). #34650 (Azat Khuzhin). - โก๏ธ Stop to select part for mutate when the other replica has already updated the transaction log for
ReplatedMergeTree
engine. #34633 (Jianmei Zhang). - ๐ Fix incorrect result of trivial count query when part movement feature is used #34089. #34385 (nvartolomei).
- Fix inconsistency of
max_query_size
limitation in distributed subqueries. #34078 (Chao Ma).
- โช Make
-
v22.2 Changes
February 17, 2022โฌ๏ธ Upgrade Notes
- Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting
use_skip_indexes_if_final
is introduced and disabled by default). #34243 (Azat Khuzhin).
๐ New Feature
- Projections are production ready. Set
allow_experimental_projection_optimization
by default and deprecate this setting. #34456 (Nikolai Kochetov). - 0๏ธโฃ An option to create a new files on insert for
File
/S3
/HDFS
engines. Allow to overwrite a file inHDFS
. Throw an exception in attempt to overwrite a file inS3
by default. Throw an exception in attempt to append data to file in formats that have a suffix (and thus don't support appends, likeParquet
,ORC
). Closes #31640 Closes #31622 Closes #23862 Closes #15022 Closes #16674. #33302 (Kruglov Pavel). - โ Add a setting that allows a user to provide own deduplication semantic in
MergeTree
/ReplicatedMergeTree
If provided, it's used instead of data digest to generate block ID. So, for example, by providing a unique value for the setting in each INSERT statement, the user can avoid the same inserted data being deduplicated. This closes: #7461. #32304 (Igor Nikonov). - โ Add support of
DEFAULT
keyword for INSERT statements. Closes #6331. #33141 (Andrii Buriachevskyi). EPHEMERAL
column specifier is added toCREATE TABLE
query. Closes #9436. #34424 (yakov-olkhovskiy).- ๐ Support
IF EXISTS
clause forTTL expr TO [DISK|VOLUME] [IF EXISTS] 'xxx'
feature. Parts will be moved to disk or volume only if it exists on replica, soMOVE TTL
rules will be able to behave differently on replicas according to the existing storage policies. Resolves #34455. #34504 (Anton Popov). - ๐ Allow set default table engine and to create tables without specifying ENGINE. #34187 (Ilya Yatsishin).
- โ Add table function
format(format_name, data)
. #34125 (Kruglov Pavel). - Detect format in
clickhouse-local
by file name even in the case when it is passed to stdin. #33829 (Kruglov Pavel). - โ Add schema inference for
values
table function. Closes #33811. #34017 (Kruglov Pavel). - Dynamic reload of server TLS certificates on config reload. Closes #15764. #15765 (johnskopis). #31257 (Filatenkov Artur).
- Now ReplicatedMergeTree can recover data when some of its disks are broken. #13544 (Amos Bird).
- Fault-tolerant connections in clickhouse-client:
clickhouse-client ... --host host1 --host host2 --port port2 --host host3 --port port --host host4
. #34490 (Kruglov Pavel). #33824 (Filippov Denis). - โ Add
DEGREES
andRADIANS
functions for MySQL compatibility. #33769 (Bharat Nallan). - โ Add
h3ToCenterChild
function. #33313 (Bharat Nallan). Add new h3 miscellaneous functions:edgeLengthKm
,exactEdgeLengthKm
,exactEdgeLengthM
,exactEdgeLengthRads
,numHexagons
. #33621 (Bharat Nallan). - โ Add function
bitSlice
to extract bit subsequences from String/FixedString. #33360 (RogerYK). - โ
Implemented
meanZTest
aggregate function. #33354 (achimbab). - โ Add confidence intervals to T-tests aggregate functions. #33260 (achimbab).
- โ Add function
addressToLineWithInlines
. Close #26211. #33467 (SuperDJY). - โ Added
#!
and#
as a recognised start of a single line comment. Closes #34138. #34230 (Aaron Katz).
Experimental Feature
- ๐ Functions for text classification: language and charset detection. See #23271. #33314 (Nikolay Degterinsky).
- Add memory overcommit to
MemoryTracker
. Addedguaranteed
settings for memory limits which represent soft memory limits. In case when hard memory limit is reached,MemoryTracker
tries to cancel the most overcommited query. New settingmemory_usage_overcommit_max_wait_microseconds
specifies how long queries may wait another query to stop. Closes #28375. #31182 (Dmitry Novik). - Enable stream to table join in WindowView. #33729 (vxider).
- ๐ Support
SET
,YEAR
,TIME
andGEOMETRY
data types inMaterializedMySQL
(experimental feature). Fixes #18091, #21536, #26361. #33429 (zzsmdfj). - ๐ Fix various issues when projection is enabled by default. Each issue is described in separate commit. This is for #33678 . This fixes #34273. #34305 (Amos Bird).
๐ Performance Improvement
- Support
optimize_read_in_order
if prefix of sorting key is already sorted. E.g. if we have sorting keyORDER BY (a, b)
in table and query withWHERE a = const ORDER BY b
clauses, now it will be applied reading in order of sorting key instead of full sort. #32748 (Anton Popov). - ๐ Improve performance of partitioned insert into table functions
URL
,S3
,File
,HDFS
. Closes #34348. #34510 (Maksim Kita). - ๐ Multiple performance improvements of clickhouse-keeper. #34484 #34587 (zhanglistar).
- ๐
FlatDictionary
improve performance of dictionary data load. #33871 (Maksim Kita). - ๐ Improve performance of
mapPopulateSeries
function. Closes #33944. #34318 (Maksim Kita). _file
and_path
virtual columns (in file-like table engines) are madeLowCardinality
- it will make queries for multiple files faster. Closes #34300. #34317 (flynn).- Speed up loading of data parts. It was not parallelized before: the setting
part_loading_threads
did not have effect. See #4699. #34310 (alexey-milovidov). - ๐ Improve performance of
LineAsString
format. This closes #34303. #34306 (alexey-milovidov). - โก๏ธ Optimize
quantilesExact{Low,High}
to usenth_element
instead ofsort
. #34287 (Danila Kutenin). - ๐ Slightly improve performance of
Regexp
format. #34202 (alexey-milovidov). - Minor improvement for analysis of scalar subqueries. #34128 (Federico Rodriguez).
- ๐ Make ORDER BY tuple almost as fast as ORDER BY columns. We have special optimizations for multiple column ORDER BY: https://github.com/ClickHouse/ClickHouse/pull/10831 . It's beneficial to also apply to tuple columns. #34060 (Amos Bird).
- Rework and reintroduce the scalar subqueries cache to Materialized Views execution. #33958 (Raรบl Marรญn).
- ๐ Slightly improve performance of
ORDER BY
by adding x86-64 AVX-512 support formemcmpSmall
functions to accelerate memory comparison. It works only if you compile ClickHouse by yourself. #33706 (hanqf-git). - ๐ Improve
range_hashed
dictionary performance if for key there are a lot of intervals. Fixes #23821. #33516 (Maksim Kita). - ๐ For inserts and merges into S3, write files in parallel whenever possible (TODO: check if it's merged). #33291 (Nikolai Kochetov).
- ๐ Improve
clickhouse-keeper
performance and fix several memory leaks in NuRaft library. #33329 (alesapin).
๐ Improvement
- ๐ Support asynchronous inserts in
clickhouse-client
for queries with inlined data. #34267 (Anton Popov). - Functions
dictGet
,dictHas
implicitly cast key argument to dictionary key structure, if they are different. #33672 (Maksim Kita). - ๐ Improvements for
range_hashed
dictionaries. Improve performance of load time if there are multiple attributes. Allow to create a dictionary without attributes. Added option to specify strategy when intervalsstart
andend
haveNullable
typeconvert_null_range_bound_to_open
by default istrue
. Closes #29791. Allow to specifyFloat
,Decimal
,DateTime64
,Int128
,Int256
,UInt128
,UInt256
as range types.RangeHashedDictionary
added support for range values that extendInt64
type. Closes #28322. Added optionrange_lookup_strategy
to specify range lookup typemin
,max
by default ismin
. Closes #21647. Fixed allocated bytes calculations. Fixed type name insystem.dictionaries
in case ofComplexKeyHashedDictionary
. #33927 (Maksim Kita). - ๐
flat
,hashed
,hashed_array
dictionaries now support creating with empty attributes, with support of reading the keys and usingdictHas
. Fixes #33820. #33918 (Maksim Kita). - โ Added support for
DateTime64
data type in dictionaries. #33914 (Maksim Kita). - Allow to write
s3(url, access_key_id, secret_access_key)
(autodetect of data format and table structure, but with explicit credentials). #34503 (Kruglov Pavel). - โ Added sending of the output format back to client like it's done in HTTP protocol as suggested in #34362. Closes #34362. #34499 (Vitaly Baranov).
- Send ProfileEvents statistics in case of INSERT SELECT query (to display query metrics in
clickhouse-client
for this type of queries). #34498 (Dmitry Novik). - Recognize
.jsonl
extension for JSONEachRow format. #34496 (Kruglov Pavel). - ๐ Improve schema inference in clickhouse-local. Allow to write just
clickhouse-local -q "select * from table" < data.format
. #34495 (Kruglov Pavel). - Privileges CREATE/ALTER/DROP ROW POLICY now can be granted on a table or on
database.*
as well as globally*.*
. #34489 (Vitaly Baranov). - Allow to export arbitrary large files to
s3
. Add two new settings:s3_upload_part_size_multiply_factor
ands3_upload_part_size_multiply_parts_count_threshold
. Now each times3_upload_part_size_multiply_parts_count_threshold
uploaded to S3 from a single querys3_min_upload_part_size
multiplied bys3_upload_part_size_multiply_factor
. Fixes #34244. #34422 (alesapin). - ๐ Allow to skip not found (404) URLs for globs when using URL storage / table function. Also closes #34359. #34392 (Kseniia Sumarokova).
- 0๏ธโฃ Default input and output formats for
clickhouse-local
that can be overriden by --input-format and --output-format. Close #30631. #34352 (ๆๆฌ). - Add options for
clickhouse-format
. Which close #30528 -max_query_size
-max_parser_depth
. #34349 (ๆๆฌ). - ๐ Better handling of pre-inputs before client start. This is for #34308. #34336 (Amos Bird).
REGEXP_MATCHES
andREGEXP_REPLACE
function aliases for compatibility with PostgreSQL. Close #30885. #34334 (ๆๆฌ).- Some servers expect a User-Agent header in their HTTP requests. A
User-Agent
header entry has been added to HTTP requests of the form: User-Agent: ClickHouse/VERSION_STRING. #34330 (Saad Ur Rahman). - ๐ Cancel merges before acquiring table lock for
TRUNCATE
query to avoidDEADLOCK_AVOIDED
error in some cases. Fixes #34302. #34304 (tavplubix). - ๐ Change severity of the "Cancelled merging parts" message in logs, because it's not an error. This closes #34148. #34232 (alexey-milovidov).
- โ Add ability to compose PostgreSQL-style cast operator
::
with expressions using[]
and.
operators (array and tuple indexing). #34229 (Nikolay Degterinsky). - ๐ Recognize
YYYYMMDD-hhmmss
format inparseDateTimeBestEffort
function. This closes #34206. #34208 (alexey-milovidov). - ๐ Allow carriage return in the middle of the line while parsing by
Regexp
format. This closes #34200. #34205 (alexey-milovidov). - ๐ Allow to parse dictionary's
PRIMARY KEY
asPRIMARY KEY (id, value)
; previously supported onlyPRIMARY KEY id, value
. Closes #34135. #34141 (Maksim Kita). - An optional argument for
splitByChar
to limit the number of resulting elements. close #34081. #34140 (ๆๆฌ). - Improving the experience of multiple line editing for clickhouse-client. This is a follow-up of #31123. #34114 (Amos Bird).
- โ Add
UUID
suport inMsgPack
input/output format. #34065 (Kruglov Pavel). - ๐ Tracing context (for OpenTelemetry) is now propagated from GRPC client metadata (this change is relevant for GRPC client-server protocol). #34064 (andremarianiello).
- ๐ Supports all types of
SYSTEM
queries withON CLUSTER
clause. #34005 (ๅฐ่ทฏ). - Improve memory accounting for queries that are using less than
max_untracker_memory
. #34001 (Azat Khuzhin). - ๐ Fixed UTF-8 string case-insensitive search when lowercase and uppercase characters are represented by different number of bytes. Example is
แบ
andร
. This closes #7334. #33992 (Harry Lee). - Detect format and schema from stdin in
clickhouse-local
. #33960 (Kruglov Pavel). - Correctly handle the case of misconfiguration when multiple disks are using the same path on the filesystem. #29072. #33905 (zhongyuankai).
- Try every resolved IP address while getting S3 proxy. S3 proxies are rarely used, mostly in Yandex Cloud. #33862 (Nikolai Kochetov).
- ๐ Support EXPLAIN AST CREATE FUNCTION query
EXPLAIN AST CREATE FUNCTION mycast AS (n) -> cast(n as String)
will returnEXPLAIN AST CREATE FUNCTION mycast AS n -> CAST(n, 'String')
. #33819 (ๆๆฌ). - โ Added support for cast from
Map(Key, Value)
toArray(Tuple(Key, Value))
. #33794 (Maksim Kita). - โ Add some improvements and fixes for
Bool
data type. Fixes #33244. #33737 (Kruglov Pavel). - ๐ Parse and store OpenTelemetry trace-id in big-endian order. #33723 (Frank Chen).
- ๐ Improvement for
fromUnixTimestamp64
family functions.. They now accept any integer value that can be converted toInt64
. This closes: #14648. #33505 (Andrey Zvonov). - Reimplement
_shard_num
from constants (see #7624) withshardNum()
function (seee #27020), to avoid possible issues (like those that had been found in #16947). #33392 (Azat Khuzhin). - โ Enable binary arithmetic (plus, minus, multiply, division, least, greatest) between Decimal and Float. #33355 (flynn).
- Respect cgroups limits in max_threads autodetection. #33342 (JaySon).
- Add new clickhouse-keeper setting
min_session_timeout_ms
. Now clickhouse-keeper will determine client session timeout according tomin_session_timeout_ms
andsession_timeout_ms
settings. #33288 (JackyWoo). - โ Added
UUID
data type support for functionshex
andbin
. #32170 (Frank Chen). - ๐ Fix reading of subcolumns with dots in their names. In particular fixed reading of
Nested
columns, if their element names contain dots (e.gNested(`keys.name` String, `keys.id` UInt64, values UInt64)
). #34228 (Anton Popov). - Fixes
parallel_view_processing = 0
not working when inserting into a table usingVALUES
. - Fixesview_duration_ms
in thequery_views_log
not being set correctly for materialized views. #34067 (Raรบl Marรญn). - ๐ Fix parsing tables structure from ZooKeeper: now metadata from ZooKeeper compared with local metadata in canonical form. It helps when canonical function names can change between ClickHouse versions. #33933 (sunny).
- Properly escape some characters for interaction with LDAP. #33401 (IlyaTsoi).
๐ Build/Testing/Packaging Improvement
- โ Remove unbundled build support. #33690 (Azat Khuzhin).
- โ Ensure that tests don't depend on the result of non-stable sorting of equal elements. Added equal items ranges randomization in debug after sort to prevent issues when we rely on equal items sort order. #34393 (Maksim Kita).
- โ Add verbosity to a style check. #34289 (Mikhail f. Shiryaev).
- โ Remove
clickhouse-test
debian package because it's obsolete. #33948 (Ilya Yatsishin). - ๐ท Multiple improvements for build system to remove the possibility of occasionally using packages from the OS and to enforce hermetic builds. #33695 (Amos Bird).
๐ Bug Fix (user-visible misbehaviour in official stable or prestable release)
- Fixed the assertion in case of using
allow_experimental_parallel_reading_from_replicas
withmax_parallel_replicas
equals to 1. This fixes #34525. #34613 (Nikita Mikhaylov). - ๐ Fix rare bug while reading of empty arrays, which could lead to
Data compressed with different methods
error. It can reproduce if you have mostly empty arrays, but not always. And reading is performed in backward direction with ORDER BY ... DESC. This error is extremely unlikely to happen. #34327 (Anton Popov). - ๐ Fix wrong result of
round
/roundBankers
if integer values of small types are rounded. Closes #33267. #34562 (ๆๆฌ). - ๐ Sometimes query cancellation did not work immediately when we were reading multiple files from s3 or HDFS. Fixes #34301 Relates to #34397. #34539 (Dmitry Novik).
- Fix exception
Chunk should have AggregatedChunkInfo in MergingAggregatedTransform
(in case ofoptimize_aggregation_in_order = 1
anddistributed_aggregation_memory_efficient = 0
). Fixes #34526. #34532 (Anton Popov). - ๐ Fix comparison between integers and floats in index analysis. Previously it could lead to skipping some granules for reading by mistake. Fixes #34493. #34528 (Anton Popov).
- ๐ Fix compression support in URL engine. #34524 (Frank Chen).
- ๐ Fix possible error 'file_size: Operation not supported' in files' schema autodetection. #34479 (Kruglov Pavel).
- ๐ Fixes possible race with table deletion. #34416 (Kseniia Sumarokova).
- ๐ Fix possible error
Cannot convert column Function to mask
in short circuit function evaluation. Closes #34171. #34415 (Kruglov Pavel). - ๐ Fix potential crash when doing schema inference from url source. Closes #34147. #34405 (Kruglov Pavel).
- For UDFs access permissions were checked for database level instead of global level as it should be. Closes #34281. #34404 (Maksim Kita).
- ๐ Fix wrong engine syntax in result of
SHOW CREATE DATABASE
query for databases with engineMemory
. This closes #34335. #34345 (alexey-milovidov). - ๐ Fixed a couple of extremely rare race conditions that might lead to broken state of replication queue and "intersecting parts" error. #34297 (tavplubix).
- ๐ Fix progress bar width. It was incorrectly rounded to integer number of characters. #34275 (alexey-milovidov).
- ๐ Fix current_user/current_address client information fields for inter-server communication (before this patch current_user/current_address will be preserved from the previous query). #34263 (Azat Khuzhin).
- Fix memory leak in case of some Exception during query processing with
optimize_aggregation_in_order=1
. #34234 (Azat Khuzhin). - ๐ Fix metric
Query
, which shows the number of executing queries. In last several releases it was always 0. #34224 (Anton Popov). - ๐ Fix schema inference for table runction
s3
. #34186 (Kruglov Pavel). - ๐ Fix rare and benign race condition in
HDFS
,S3
andURL
storage engines which can lead to additional connections. #34172 (alesapin). - ๐ Fix bug which can rarely lead to error "Cannot read all data" while reading LowCardinality columns of MergeTree table engines family which stores data on remote file system like S3 (virtual filesystem over s3 is an experimental feature that is not ready for production). #34139 (alesapin).
- ๐ Fix inserts to distributed tables in case of a change of native protocol. The last change was in the version 22.1, so there may be some failures of inserts to distributed tables after upgrade to that version. #34132 (Anton Popov).
- ๐ Fix possible data race in
File
table engine that was introduced in #33960. Closes #34111. #34113 (Kruglov Pavel). - ๐ Fixed minor race condition that might cause "intersecting parts" error in extremely rare cases after ZooKeeper connection loss. #34096 (tavplubix).
- ๐ Fix asynchronous inserts with
Native
format. #34068 (Anton Popov). - Fix bug which lead to inability for server to start when both replicated access storage and keeper (embedded in clickhouse-server) are used. Introduced two settings for keeper socket timeout instead of settings from default user:
keeper_server.socket_receive_timeout_sec
andkeeper_server.socket_send_timeout_sec
. Fixes #33973. #33988 (alesapin). - ๐ Fix segfault while parsing ORC file with corrupted footer. Closes #33797. #33984 (Kruglov Pavel).
- ๐ Fix parsing IPv6 from query parameter (prepared statements) and fix IPv6 to string conversion. Closes #33928. #33971 (Kruglov Pavel).
- ๐ Fix crash while reading of nested tuples. Fixes #33838. #33956 (Anton Popov).
- ๐ Fix usage of functions
array
andtuple
with literal arguments in distributed queries. Previously it could lead toNot found columns
exception. #33938 (Anton Popov). - Aggregate function combinator
-If
did not correctly processNullable
filter argument. This closes #27073. #33920 (alexey-milovidov). - ๐ Fix potential race condition when doing remote disk read (virtual filesystem over s3 is an experimental feature that is not ready for production). #33912 (Amos Bird).
- ๐ Fix crash if SQL UDF is created with lambda with non identifier arguments. Closes #33866. #33868 (Maksim Kita).
- Fix usage of sparse columns (which can be enabled by experimental setting
ratio_of_defaults_for_sparse_serialization
). #33849 (Anton Popov). - ๐ Fixed
replica is not readonly
logical error onSYSTEM RESTORE REPLICA
query when replica is actually readonly. Fixes #33806. #33847 (tavplubix). - ๐ Fix memory leak in
clickhouse-keeper
in case of compression is used (default). #33840 (Azat Khuzhin). - ๐ Fix index analysis with no common types available. #33833 (Amos Bird).
- ๐ Fix schema inference for
JSONEachRow
andJSONCompactEachRow
. #33830 (Kruglov Pavel). - ๐ Fix usage of external dictionaries with
redis
source and large number of keys. #33804 (Anton Popov). - ๐ Fix bug in client that led to 'Connection reset by peer' in server. Closes #33309. #33790 (Kruglov Pavel).
- ๐ Fix parsing query INSERT INTO ... VALUES SETTINGS ... (...), ... #33776 (Kruglov Pavel).
- ๐ Fix bug of check table when creating data part with wide format and projection. #33774 (ๆๆฌ).
- Fix tiny race between count() and INSERT/merges/... in MergeTree (it is possible to return incorrect number of rows for SELECT with optimize_trivial_count_query). #33753 (Azat Khuzhin).
- ๐ป Throw exception when directory listing request has failed in storage HDFS. #33724 (LiuNeng).
- ๐ Fix mutation when table contains projections. This fixes #33010. This fixes #33275. #33679 (Amos Bird).
- Correctly determine current database if
CREATE TEMPORARY TABLE AS SELECT
is queried inside a named HTTP session. This is a very rare use case. This closes #8340. #33676 (alexey-milovidov). - ๐ Allow some queries with sorting, LIMIT BY, ARRAY JOIN and lambda functions. This closes #7462. #33675 (alexey-milovidov).
- ๐ Fix bug in "zero copy replication" (a feature that is under development and should not be used in production) which lead to data duplication in case of TTL move. Fixes #33643. #33642 (alesapin).
- Fix
Chunk should have AggregatedChunkInfo in GroupingAggregatedTransform
(in case ofoptimize_aggregation_in_order = 1
). #33637 (Azat Khuzhin). - ๐ Fix error
Bad cast from type ... to DB::DataTypeArray
which may happen when table hasNested
column with dots in name, and default value is generated for it (e.g. during insert, when column is not listed). Continuation of #28762. #33588 (Alexey Pavlenko). - ๐ Export into
lz4
files has been fixed. Closes #31421. #31862 (Kruglov Pavel). - Fix potential crash if
group_by_overflow_mode
was set toany
(approximate GROUP BY) and aggregation was performed by single column of typeLowCardinality
. #34506 (DR). - ๐ Fix inserting to temporary tables via gRPC client-server protocol. Fixes #34347, issue
#2
. #34364 (Vitaly Baranov). - ๐ Fix issue #19429. #34225 (Vitaly Baranov).
- ๐ Fix issue #18206. #33977 (Vitaly Baranov).
- โ This PR allows using multiple LDAP storages in the same list of user directories. It worked earlier but was broken because LDAP tests are disabled (they are part of the testflows tests). #33574 (Vitaly Baranov).
- Applying data skipping indexes for queries with FINAL may produce incorrect result. In this release we disabled data skipping indexes by default for queries with FINAL (a new setting
-
v22.1 Changes
January 18, 2022โฌ๏ธ Upgrade Notes
- โฌ๏ธ The functions
left
andright
were previously implemented in parser and now full-featured. Distributed queries withleft
orright
functions without aliases may throw exception if cluster contains different versions of clickhouse-server. If you are upgrading your cluster and encounter this error, you should finish upgrading your cluster to ensure all nodes have the same version. Also you can add aliases (AS something
) to the columns in your queries to avoid this issue. #33407 (alexey-milovidov). - Resource usage by scalar subqueries is fully accounted since this version. With this change, rows read in scalar subqueries are now reported in the query_log. If the scalar subquery is cached (repeated or called for several rows) the rows read are only counted once. This change allows KILLing queries and reporting progress while they are executing scalar subqueries. #32271 (Raรบl Marรญn).
๐ New Feature
- ๐ Implement data schema inference for input formats. Allow to skip structure (or write just
auto
) in table functionsfile
,url
,s3
,hdfs
and in parameters ofclickhouse-local
. Allow to skip structure in create query for table enginesFile
,HDFS
,S3
,URL
,Merge
,Buffer
,Distributed
andReplicatedMergeTree
(if we add new replicas). #32455 (Kruglov Pavel). - Detect format by file extension in
file
/hdfs
/s3
/url
table functions andHDFS
/S3
/URL
table engines and also forSELECT INTO OUTFILE
andINSERT FROM INFILE
#33565 (Kruglov Pavel). Close #30918. #33443 (OnePiece). - ๐ A tool for collecting diagnostics data if you need support. #33175 (Alexander Burmak).
- ๐ง Automatic cluster discovery via Zoo/Keeper. It allows to add replicas to the cluster without changing configuration on every server. #31442 (vdimir).
- Implement hive table engine to access apache hive from clickhouse. This implements: #29245. #31104 (taiyang-li).
- โ Add aggregate functions
cramersV
,cramersVBiasCorrected
,theilsU
andcontingency
. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist. - โ Added table function
hdfsCluster
which allows processing files from HDFS in parallel from many nodes in a specified cluster, similarly tos3Cluster
. #32400 (Zhichang Yu). - โ Adding support for disks backed by Azure Blob Storage, in a similar way it has been done for disks backed by AWS S3. #31505 (Jakub Kuklis).
- ๐ Allow
COMMENT
inCREATE VIEW
(for all VIEW kinds). #31062 (Vasily Nemkov). - ๐ง Dynamically reinitialize listening ports and protocols when configuration changes. #30549 (Kevin Michel).
- โ Added
left
,right
,leftUTF8
,rightUTF8
functions. Fix error in implementation ofsubstringUTF8
function with negative offset (offset from the end of string). #33407 (alexey-milovidov). - โ Add new functions for
H3
coordinate system:h3HexAreaKm2
,h3CellAreaM2
,h3CellAreaRads2
. #33479 (Bharat Nallan). - โ Add
MONTHNAME
function. #33436 (usurai). - โ Added function
arrayLast
. Closes #33390. #33415 Added functionarrayLastIndex
. #33465 (Maksim Kita). - โ Add function
decodeURLFormComponent
slightly different todecodeURLComponent
. Close #10298. #33451 (SuperDJY). - ๐ Allow to split
GraphiteMergeTree
rollup rules for plain/tagged metrics (optional rule_type field). #33494 (Michail Safronov).
๐ Performance Improvement
- ๐ Support moving conditions to
PREWHERE
(settingoptimize_move_to_prewhere
) for tables ofMerge
engine if its all underlying tables supportsPREWHERE
. #33300 (Anton Popov). - More efficient handling of globs for URL storage. Now you can easily query million URLs in parallel with retries. Closes #32866. #32907 (Kseniia Sumarokova).
- ๐ Avoid exponential backtracking in parser. This closes #20158. #33481 (alexey-milovidov).
- Abuse of
untuple
function was leading to exponential complexity of query analysis (found by fuzzer). This closes #33297. #33445 (alexey-milovidov). - โฌ๏ธ Reduce allocated memory for dictionaries with string attributes. #33466 (Maksim Kita).
- ๐ Slight performance improvement of
reinterpret
function. #32587 (alexey-milovidov). - ๐ Non significant change. In extremely rare cases when data part is lost on every replica, after merging of some data parts, the subsequent queries may skip less amount of partitions during partition pruning. This hardly affects anything. #32220 (Azat Khuzhin).
- ๐ Improve
clickhouse-keeper
writing performance by optimization the size calculation logic. #32366 (zhanglistar). - โก๏ธ Optimize single part projection materialization. This closes #31669. #31885 (Amos Bird).
- ๐ Improve query performance of system tables. #33312 (OnePiece).
- โก๏ธ Optimize selecting of MergeTree parts that can be moved between volumes. #33225 (OnePiece).
- ๐ Fix
sparse_hashed
dict performance with sequential keys (wrong hash function). #32536 (Azat Khuzhin).
Experimental Feature
- Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set
allow_experimental_parallel_reading_from_replicas = 1
andmax_parallel_replicas
to any number. This closes #26748. #29279 (Nikita Mikhaylov). - Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting
ratio_for_sparse_serialization
. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts. #22535 (Anton Popov). - โ Add "TABLE OVERRIDE" feature for customizing MaterializedMySQL table schemas. #32325 (Stig Bakken).
- โ Add
EXPLAIN TABLE OVERRIDE
query. #32836 (Stig Bakken). - ๐ Support TABLE OVERRIDE clause for MaterializedPostgreSQL. RFC: #31480. #32749 (Kseniia Sumarokova).
- ๐ Change ZooKeeper path for zero-copy marks for shared data. Note that "zero-copy replication" is non-production feature (in early stages of development) that you shouldn't use anyway. But in case if you have used it, let you keep in mind this change. #32061 (ianton-ru).
- ๐ Events clause support for WINDOW VIEW watch query. #32607 (vxider).
- ๐ Fix ACL with explicit digit hash in
clickhouse-keeper
: now the behavior consistent with ZooKeeper and generated digest is always accepted. #33249 (ๅฐ่ทฏ). #33246. - ๐ Fix unexpected projection removal when detaching parts. #32067 (Amos Bird).
๐ Improvement
- ๐ Now date time conversion functions that generates time before
1970-01-01 00:00:00
will be saturated to zero instead of overflow. #29953 (Amos Bird). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch. - Always display resource usage (total CPU usage, total RAM usage and max RAM usage per host) in client. #33271 (alexey-milovidov).
- ๐ Improve
Bool
type serialization and deserialization, check the range of values. #32984 (Kruglov Pavel). - If an invalid setting is defined using the
SET
query or using the query parameters in the HTTP request, error message will contain suggestions that are similar to the invalid setting string (if any exists). #32946 (Antonio Andelic). - ๐ Support hints for mistyped setting names for clickhouse-client and clickhouse-local. Closes #32237. #32841 (ๅๆถ).
- ๐ Allow to use virtual columns in Materialized Views. Close #11210. #33482 (OnePiece).
- โ Add config to disable IPv6 in clickhouse-keeper if needed. This close #33381. #33450 (Wu Xueyang).
- โ Add more info to
system.build_options
about current git revision. #33431 (taiyang-li). clickhouse-local
: track memory under--max_memory_usage_in_client
option. #33341 (Azat Khuzhin).- ๐ Allow negative intervals in function
intervalLengthSum
. Their length will be added as well. This closes #33323. #33335 (alexey-milovidov). LineAsString
can be used as output format. This closes #30919. #33331 (Sergei Trifonov).- ๐ Support
<secure/>
in cluster configuration, as an alternative form of<secure>1</secure>
. Close #33270. #33330 (SuperDJY). - Pressing Ctrl+C twice will terminate
clickhouse-benchmark
immediately without waiting for in-flight queries. This closes #32586. #33303 (alexey-milovidov). - ๐ Support Unix timestamp with milliseconds in
parseDateTimeBestEffort
function. #33276 (Ben). - Allow to cancel query while reading data from external table in the formats:
Arrow
/Parquet
/ORC
- it failed to be cancelled it case of big files and setting input_format_allow_seeks as false. Closes #29678. #33238 (Kseniia Sumarokova). - ๐ If table engine supports
SETTINGS
clause, allow to pass the settings as key-value or via config. Add this support for MySQL. #33231 (Kseniia Sumarokova). - Correctly prevent Nullable primary keys if necessary. This is for #32780. #33218 (Amos Bird).
- โ Add retry for
PostgreSQL
connections in case nothing has been fetched yet. Closes #33199. #33209 (Kseniia Sumarokova). - Validate config keys for external dictionaries. #33095. #33130 (Kseniia Sumarokova).
- Send profile info inside
clickhouse-local
. Closes #33093. #33097 (Kseniia Sumarokova). - ๐ Short circuit evaluation: support for function
throwIf
. Closes #32969. #32973 (Maksim Kita). - ๐ (This only happens in unofficial builds). Fixed segfault when inserting data into compressed Decimal, String, FixedString and Array columns. This closes #32939. #32940 (N. Kolotov).
- โ Added support for specifying subquery as SQL user defined function. Example:
CREATE FUNCTION test AS () -> (SELECT 1)
. Closes #30755. #32758 (Maksim Kita). - ๐ Improve gRPC compression support for #28671. #32747 (Vitaly Baranov).
- Flush all In-Memory data parts when WAL is not enabled while shutdown server or detaching table. #32742 (nauta).
- ๐ Allow to control connection timeouts for MySQL (previously was supported only for dictionary source). Closes #16669. Previously default connect_timeout was rather small, now it is configurable. #32734 (Kseniia Sumarokova).
- ๐ Support
authSource
option for storageMongoDB
. Closes #32594. #32702 (Kseniia Sumarokova). - ๐ Support
Date32
type ingenarateRandom
table function. #32643 (nauta). - Add settings
max_concurrent_select_queries
andmax_concurrent_insert_queries
for control concurrent queries by query kind. Close #3575. #32609 (SuperDJY). - ๐ Improve handling nested structures with missing columns while reading data in
Protobuf
format. Follow-up to https://github.com/ClickHouse/ClickHouse/pull/31988. #32531 (Vitaly Baranov). - ๐ Allow empty credentials for
MongoDB
engine. Closes #26267. #32460 (Kseniia Sumarokova). - Disable some optimizations for window functions that may lead to exceptions. Closes #31535. Closes #31620. #32453 (Kseniia Sumarokova).
- ๐ Allows to connect to MongoDB 5.0. Closes #31483,. #32416 (Kseniia Sumarokova).
- Enable comparison between
Decimal
andFloat
. Closes #22626. #31966 (flynn). - Added settings
command_read_timeout
,command_write_timeout
forStorageExecutable
,StorageExecutablePool
,ExecutableDictionary
,ExecutablePoolDictionary
,ExecutableUserDefinedFunctions
. Settingcommand_read_timeout
controls timeout for reading data from command stdout in milliseconds. Settingcommand_write_timeout
timeout for writing data to command stdin in milliseconds. Added settingscommand_termination_timeout
forExecutableUserDefinedFunction
,ExecutableDictionary
,StorageExecutable
. Added settingexecute_direct
forExecutableUserDefinedFunction
, by default true. Added settingexecute_direct
forExecutableDictionary
,ExecutablePoolDictionary
, by default false. #30957 (Maksim Kita). - Bitmap aggregate functions will give correct result for out of range argument instead of wraparound. #33127 (DR).
- ๐ Fix parsing incorrect queries with
FROM INFILE
statement. #33521 (Kruglov Pavel). - Don't allow to write into
S3
if path contains globs. #33142 (Kruglov Pavel). --echo
option was not used byclickhouse-client
in batch mode with single query. #32843 (N. Kolotov).- ๐ Use
--database
option for clickhouse-local. #32797 (Kseniia Sumarokova). - ๐ Fix surprisingly bad code in SQL ordinary function
file
. Now it supports symlinks. #32640 (alexey-milovidov). - โก๏ธ Updating
modification_time
for data part insystem.parts
after part movement #32964. #32965 (save-my-heart). - Potential issue, cannot be exploited: integer overflow may happen in array resize. #33024 (varadarajkumar).
๐ Build/Testing/Packaging Improvement
- โ Add packages, functional tests and Docker builds for AArch64 (ARM) version of ClickHouse. #32911 (Mikhail f. Shiryaev). #32415
- 0๏ธโฃ Prepare ClickHouse to be built with musl-libc. It is not enabled by default. #33134 (alexey-milovidov).
- ๐ Make installation script working on FreeBSD. This closes #33384. #33418 (alexey-milovidov).
- โ Add
actionlint
for GitHub Actions workflows and verify workflow files viaact --list
to check the correct workflow syntax. #33612 (Mikhail f. Shiryaev). - โ Add more tests for the nullable primary key feature. Add more tests with different types and merge tree kinds, plus randomly generated data. #33228 (Amos Bird).
- โ Add a simple tool to visualize flaky tests in web browser. #33185 (alexey-milovidov).
- ๐ Enable hermetic build for shared builds. This is mainly for developers. #32968 (Amos Bird).
- โก๏ธ Update
libc++
andlibc++abi
to the latest. #32484 (Raรบl Marรญn). - โ Added integration test for external .NET client (ClickHouse.Client). #23230 (Oleg V. Kozlyuk).
- Inject git information into clickhouse binary file. So we can get source code revision easily from clickhouse binary file. #33124 (taiyang-li).
- โ Remove obsolete code from ConfigProcessor. Yandex specific code is not used anymore. The code contained one minor defect. This defect was reported by Mallik Hassan in #33032. This closes #33032. #33026 (alexey-milovidov).
๐ Bug Fix (user-visible misbehavior in official stable or prestable release)
- ๐ Several fixes for format parsing. This is relevant if
clickhouse-server
is open for write access to adversary. Specifically crafted input data forNative
format may lead to reading uninitialized memory or crash. This is relevant ifclickhouse-server
is open for write access to adversary. #33050 (Heena Bansal). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee). Fix null pointer dereference inLowCardinality
data when deserializingLowCardinality
data in the Native format. #33021 (Harry Lee). - ๐ ClickHouse Keeper handler will correctly remove operation when response sent. #32988 (JackyWoo).
- ๐ Potential off-by-one miscalculation of quotas: quota limit was not reached, but the limit was exceeded. This fixes #31174. #31656 (sunny).
- ๐ Fixed CASTing from String to IPv4 or IPv6 and back. Fixed error message in case of failed conversion. #29224 (Dmitry Novik) #27914 (Vasily Nemkov).
- ๐ Fixed an exception like
Unknown aggregate function nothing
during an execution on a remote server. This fixes #16689. #26074 (hexiaoting). - ๐ Fix wrong database for JOIN without explicit database in distributed queries (Fixes: #10471). #33611 (Azat Khuzhin).
- ๐ Fix segfault in Apache
Avro
format that appears after the second insert into file. #33566 (Kruglov Pavel). - ๐ Fix segfault in Apache
Arrow
format if schema containsDictionary
type. Closes #33507. #33529 (Kruglov Pavel). - Out of band
offset
andlimit
settings may be applied incorrectly for views. Close #33289 #33518 (hexiaoting). - ๐ Fix an exception
Block structure mismatch
which may happen during insertion into table with default nestedLowCardinality
column. Fixes #33028. #33504 (Nikolai Kochetov). - ๐ Fix dictionary expressions for
range_hashed
range min and range max attributes when created using DDL. Closes #30809. #33478 (Maksim Kita). - ๐ Fix possible use-after-free for INSERT into Materialized View with concurrent DROP (Azat Khuzhin).
- Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires
index_granularity_bytes=0
(i.e. turn off adaptive index granularity). #33372 (Azat Khuzhin). - ๐ The commands
SYSTEM SUSPEND
andSYSTEM ... THREAD FUZZER
missed access control. It is fixed. Author: Kevin Michel. #33333 (alexey-milovidov). - ๐ Fix when
COMMENT
for dictionaries does not appear insystem.tables
,system.dictionaries
. Allow to modify the comment forDictionary
engine. Closes #33251. #33261 (Maksim Kita). - โ Add asynchronous inserts (with enabled setting
async_insert
) to query log. Previously such queries didn't appear in the query log. #33239 (Anton Popov). - ๐ Fix sending
WHERE 1 = 0
expressions for external databases query. Closes #33152. #33214 (Kseniia Sumarokova). - Fix DDL validation for MaterializedPostgreSQL. Fix setting
materialized_postgresql_allow_automatic_update
. Closes #29535. #33200 (Kseniia Sumarokova). Make sure unused replication slots are always removed. Found in #26952. #33187 (Kseniia Sumarokova). Fix MaterializedPostreSQL detach/attach (removing / adding to replication) tables with non-default schema. Found in #29535. #33179 (Kseniia Sumarokova). Fix DROP MaterializedPostgreSQL database. #33468 (Kseniia Sumarokova). - The metric
StorageBufferBytes
sometimes was miscalculated. #33159 (xuyatian). - Fix error
Invalid version for SerializationLowCardinality key column
in case of reading fromLowCardinality
column withlocal_filesystem_read_prefetch
orremote_filesystem_read_prefetch
enabled. #33046 (Nikolai Kochetov). - ๐ Fix
s3
table function reading empty file. Closes #33008. #33037 (Kseniia Sumarokova). - Fix Context leak in case of cancel_http_readonly_queries_on_client_close (i.e. leaking of external tables that had been uploaded the the server and other resources). #32982 (Azat Khuzhin).
- ๐ Fix wrong tuple output in
CSV
format in case of custom csv delimiter. #32981 (Kruglov Pavel). - ๐ Fix HDFS URL check that didn't allow using HA namenode address. Bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/31042. #32976 (Kruglov Pavel).
- ๐ Fix throwing exception like positional argument out of bounds for non-positional arguments. Closes #31173#event-5789668239. #32961 (Kseniia Sumarokova).
- ๐ Fix UB in case of unexpected EOF during filling a set from HTTP query (i.e. if the client interrupted in the middle, i.e.
timeout 0.15s curl -Ss -F '[email protected];' 'http://127.0.0.1:8123/?s_structure=key+Int&query=SELECT+dummy+IN+s'
and with large enought.csv
). #32955 (Azat Khuzhin). - ๐ Fix a regression in
replaceRegexpAll
function. The function worked incorrectly when matched substring was empty. This closes #32777. This closes #30245. #32945 (alexey-milovidov). - ๐ Fix
ORC
format stripe reading. #32929 (kreuzerkrieg). topKWeightedState
failed for some input types. #32487. #32914 (vdimir).- ๐ Fix exception
Single chunk is expected from view inner query (LOGICAL_ERROR)
in materialized view. Fixes #31419. #32862 (Nikolai Kochetov). - ๐ Fix optimization with lazy seek for async reads from remote filesystems. Closes #32803. #32835 (Kseniia Sumarokova).
- ๐
MergeTree
table engine might silently skip some mutations if there are too many running mutations or in case of high memory consumption, it's fixed. Fixes #17882. #32814 (tavplubix). - ๐ Avoid reusing the scalar subquery cache when processing MV blocks. This fixes a bug when the scalar query reference the source table but it means that all subscalar queries in the MV definition will be calculated for each block. #32811 (Raรบl Marรญn).
- ๐ Server might fail to start if database with
MySQL
engine cannot connect to MySQL server, it's fixed. Fixes #14441. #32802 (tavplubix). - ๐ Fix crash when used
fuzzBits
function, close #32737. #32755 (SuperDJY). - ๐ Fix error
Column is not under aggregate function
in case of MV withGROUP BY (list of columns)
(which is pared asGROUP BY tuple(...)
) overKafka
/RabbitMQ
. Fixes #32668 and #32744. #32751 (Nikolai Kochetov). - ๐ Fix
ALTER TABLE ... MATERIALIZE TTL
query withTTL ... DELETE WHERE ...
andTTL ... GROUP BY ...
modes. #32695 (Anton Popov). - Fix
optimize_read_in_order
optimization in case when table engine isDistributed
orMerge
and its underlyingMergeTree
tables have monotonous function in prefix of sorting key. #32670 (Anton Popov). - ๐ Fix LOGICAL_ERROR exception when the target of a materialized view is a JOIN or a SET table. #32669 (Raรบl Marรญn).
- Inserting into S3 with multipart upload to Google Cloud Storage may trigger abort. #32504. #32649 (vdimir).
- ๐ Fix possible exception at
RabbitMQ
storage startup by delaying channel creation. #32584 (Kseniia Sumarokova). - ๐ Fix table lifetime (i.e. possible use-after-free) in case of parallel DROP TABLE and INSERT. #32572 (Azat Khuzhin).
- ๐ Fix async inserts with formats
CustomSeparated
,Template
,Regexp
,MsgPack
andJSONAsString
. Previousely the async inserts with these formats didn't read any data. #32530 (Kruglov Pavel). - ๐ Fix
groupBitmapAnd
function on distributed table. #32529 (minhthucdao). - ๐ Fix crash in JOIN found by fuzzer, close #32458. #32508 (vdimir).
- Proper handling of the case with Apache Arrow column duplication. #32507 (Dmitriy Mokhnatkin).
- ๐ Fix issue with ambiguous query formatting in distributed queries that led to errors when some table columns were named
ALL
orDISTINCT
. This closes #32391. #32490 (alexey-milovidov). - ๐ Fix failures in queries that are trying to use skipping indices, which are not materialized yet. Fixes #32292 and #30343. #32359 (Anton Popov).
- ๐ Fix broken select query when there are more than 2 row policies on same column, begin at second queries on the same session. #31606. #32291 (SuperDJY).
- ๐ Fix fractional unix timestamp conversion to
DateTime64
, fractional part was reversed for negative unix timestamps (before 1970-01-01). #32240 (Ben). - Some entries of replication queue might hang for
temporary_directories_lifetime
(1 day by default) withDirectory tmp_merge_<part_name>
orPart ... (state Deleting) already exists, but it will be deleted soon
or similar error. It's fixed. Fixes #29616. #32201 (tavplubix). - ๐ Fix parsing of
APPLY lambda
column transformer which could lead to client/server crash. #32138 (Kruglov Pavel). - ๐ Fix
base64Encode
adding trailing bytes on small strings. #31797 (Kevin Michel). - ๐ Fix possible crash (or incorrect result) in case of
LowCardinality
arguments of window function. Fixes #31114. #31888 (Nikolai Kochetov). - ๐ Fix hang up with command
DROP TABLE system.query_log sync
. #33293 (zhanghuajie).
- โฌ๏ธ The functions
-
v21.12 Changes
December 15, 2021Backward Incompatible Change
- A fix for a feature that previously had unwanted behaviour. Do not allow direct select for Kafka/RabbitMQ/FileLog. Can be enabled by setting
stream_like_engine_allow_direct_select
. Direct select will be not allowed even if enabled by setting, in case there is an attached materialized view. For Kafka and RabbitMQ direct selectm if allowed, will not commit massages by default. To enable commits with direct select, user must use storage level settingkafka{rabbitmq}_commit_on_select=1
(default0
). #31053 (Kseniia Sumarokova). - A slight change in behaviour of a new function. Return unquoted string in JSON_VALUE. Closes #27965. #31008 (Kseniia Sumarokova).
- Setting rename. Add custom null representation support for TSV/CSV input formats. Fix deserialing Nullable(String) in TSV/CSV/JSONCompactStringsEachRow/JSONStringsEachRow input formats. Rename
output_format_csv_null_representation
andoutput_format_tsv_null_representation
toformat_csv_null_representation
andformat_tsv_null_representation
accordingly. #30497 (Kruglov Pavel). - ๐ Further deprecation of already unused code. This is relevant only for users of ClickHouse versions older than 20.6. A "leader election" mechanism is removed from
ReplicatedMergeTree
, because multiple leaders are supported since 20.6. If you are upgrading from an older version and some replica with an old version is a leader, then server will fail to start after upgrade. Stop replicas with old version to make new version start. After that it will not be possible to downgrade to version older than 20.6. #32140 (tavplubix).
๐ New Feature
- Implemented more of the ZooKeeper Four Letter Words commands in clickhouse-keeper: https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands. #28981 (JackyWoo). Now
clickhouse-keeper
is feature complete. - ๐ Support for
Bool
data type. #31072 (kevin wan). - ๐ Support for
PARTITION BY
in File, URL, HDFS storages and withINSERT INTO
table function. Closes #30273. #30690 (Kseniia Sumarokova). - โ Added
CONSTRAINT ... ASSUME ...
(without checking duringINSERT
). Added query transformation to CNF (https://github.com/ClickHouse/ClickHouse/issues/11749) for more convenient optimization. Added simple query rewriting using constraints (only simple matching now, will be improved to support <,=,>... later). Added ability to replace heavy columns with light columns if it's possible. #18787 (Nikita Vasilev). - Basic access authentication for http/url functions. #31648 (michael1589).
- ๐ Support
INTERVAL
type inSTEP
clause forWITH FILL
modifier. #30927 (Anton Popov). - โ Add support for parallel reading from multiple files and support globs in
FROM INFILE
clause. #30135 (Filatenkov Artur). - โ Add support for
Identifier
table and database query parameters. Closes #27226. #28668 (Nikolay Degterinsky). - TLDR: Major improvements of completeness and consistency of text formats. Refactor formats
TSV
,TSVRaw
,CSV
andJSONCompactEachRow
,JSONCompactStringsEachRow
, remove code duplication, add base interface for formats with-WithNames
and-WithNamesAndTypes
suffixes. Add formatsCSVWithNamesAndTypes
,TSVRawWithNames
,TSVRawWithNamesAndTypes
,JSONCompactEachRowWIthNames
,JSONCompactStringsEachRowWIthNames
,RowBinaryWithNames
. Support parallel parsing for formatsTSVWithNamesAndTypes
,TSVRaw(WithNames/WIthNamesAndTypes)
,CSVWithNamesAndTypes
,JSONCompactEachRow(WithNames/WIthNamesAndTypes)
,JSONCompactStringsEachRow(WithNames/WIthNamesAndTypes)
. Support columns mapping and types checking forRowBinaryWithNamesAndTypes
format. Add settinginput_format_with_types_use_header
which specify if we should check that types written inWIthNamesAndTypes
format matches with table structure. Add settinginput_format_csv_empty_as_default
and use it in CSV format instead ofinput_format_defaults_for_omitted_fields
(because this setting should not controlcsv_empty_as_default
). Fix usage of settinginput_format_defaults_for_omitted_fields
(it was used only ascsv_empty_as_default
, but it should control calculation of default expressions for omitted fields). Fix Nullable input/output inTSVRaw
format, make this format fully compatible with inserting into TSV. Fix inserting NULLs inLowCardinality(Nullable)
wheninput_format_null_as_default
is enabled (previously default values was inserted instead of actual NULLs). Fix strings deserialization inJSONStringsEachRow
/JSONCompactStringsEachRow
formats (strings were parsed just until first '\n' or '\t'). Add ability to useRaw
escaping rule in Template input format. Add diagnostic info for JSONCompactEachRow(WithNames/WIthNamesAndTypes) input format. Fix bug with parallel parsing of-WithNames
formats in case when settingmin_chunk_bytes_for_parallel_parsing
is less than bytes in a single row. #30178 (Kruglov Pavel). Allow to print/parse names and types of colums inCustomSeparated
input/output format. Add formatsCustomSeparatedWithNames/WithNamesAndTypes
similar toTSVWithNames/WithNamesAndTypes
. #31434 (Kruglov Pavel). - ๐ Aliyun OSS Storage support. #31286 (cfcz48).
- ๐ฆ Exposes all settings of the global thread pool in the configuration file. #31285 (Tomรกลก Hromada).
- ๐ Introduced window functions
exponentialTimeDecayedSum
,exponentialTimeDecayedMax
,exponentialTimeDecayedCount
andexponentialTimeDecayedAvg
which are more effective thanexponentialMovingAverage
for bigger windows. Also more use-cases were covered. #29799 (Vladimir Chebotarev). - โ Add option to compress logs before writing them to a file using LZ4. Closes #23860. #29219 (Nikolay Degterinsky).
- ๐ Support
JOIN ON 1 = 1
that have CROSS JOIN semantic. This closes #25578. #25894 (Vladimir C). - โ Add Map combinator for
Map
type. - Rename oldsum-, min-, max- Map
for mapped arrays tosum-, min-, max- MappedArrays
. #24539 (Ildus Kurbangaliev). - ๐ Make reading from HTTP retriable. Closes #29696. #29894 (Kseniia Sumarokova).
Experimental Feature
WINDOW VIEW
to enable stream processing in ClickHouse. #8331 (vxider).- โฌ๏ธ Drop support for using Ordinary databases with
MaterializedMySQL
. #31292 (Stig Bakken). - โช Implement the commands BACKUP and RESTORE for the Log family. This feature is under development. #30688 (Vitaly Baranov).
๐ Performance Improvement
- Reduce memory usage when reading with
s3
/url
/hdfs
formatsParquet
,ORC
,Arrow
(controlled by settinginput_format_allow_seeks
, enabled by default). Also add settingremote_read_min_bytes_for_seek
to control seeks. Closes #10461. Closes #16857. #30936 (Kseniia Sumarokova). - โ Add optimizations for constant conditions in JOIN ON, ref #26928. #27021 (Vladimir C).
- ๐ Support parallel formatting for all text formats, except
JSONEachRowWithProgress
andPrettyCompactMonoBlock
. #31489 (Kruglov Pavel). - Speed up count over nullable columns. #31806 (Raรบl Marรญn).
- Speed up
avg
andsumCount
aggregate functions. #31694 (Raรบl Marรญn). - ๐ Improve performance of JSON and XML output formats. #31673 (alexey-milovidov).
- ๐ Improve performance of syncing data to block device. This closes #31181. #31229 (zhanglistar).
- ๐ Fixing query performance issue in
LiveView
tables. Fixes #30831. #31006 (vzakaznikov). - ๐ Speed up query parsing. #31949 (Raรบl Marรญn).
- ๐ Allow to split
GraphiteMergeTree
rollup rules for plain/tagged metrics (optionalrule_type
field). #25122 (Michail Safronov). - โ Remove excessive
DESC TABLE
requests forremote()
(in case ofremote('127.1', system.one)
(i.e. identifier as the db.table instead of string) there was excessiveDESC TABLE
request). #32019 (Azat Khuzhin). - Optimize function
tupleElement
to reading of subcolumn with enabled settingoptimize_functions_to_subcolumns
. #31261 (Anton Popov). - Optimize function
mapContains
to reading of subcolumnkey
with enabled settingsoptimize_functions_to_subcolumns
. #31218 (Anton Popov). - Add settings
merge_tree_min_rows_for_concurrent_read_for_remote_filesystem
andmerge_tree_min_bytes_for_concurrent_read_for_remote_filesystem
. #30970 (Kseniia Sumarokova). - Skipping mutations of different partitions in
StorageMergeTree
. #21326 (Vladimir Chebotarev).
๐ Improvement
- Do not allow to drop a table or dictionary if some tables or dictionaries depend on it. #30977 (tavplubix).
- ๐ Allow versioning of aggregate function states. Now we can introduce backward compatible changes in serialization format of aggregate function states. Closes #12552. #24820 (Kseniia Sumarokova).
- ๐ Support PostgreSQL style
ALTER MODIFY COLUMN
syntax. #32003 (SuperDJY). - โ Added
update_field
support forRangeHashedDictionary
,ComplexKeyRangeHashedDictionary
. #32185 (Maksim Kita). - The
murmurHash3_128
andsipHash128
functions now accept an arbitrary number of arguments. This closes #28774. #28965 (ๅฐ่ทฏ). - ๐ Support default expression for
HDFS
storage and optimize fetching when source is column oriented. #32256 (ๆๆฌ). - ๐ Improve the operation name of an opentelemetry span. #32234 (Frank Chen).
- ๐ Use
Content-Type: application/x-ndjson
(http://ndjson.org/) for output formatJSONEachRow
. #32223 (Dmitriy Dorofeev). - ๐ Improve skipping unknown fields with quoted escaping rule in Template/CustomSeparated formats. Previously you could skip only quoted strings, now you can skip values with any type. #32204 (Kruglov Pavel).
- ๐ง Now
clickhouse-keeper
refuses to start or apply configuration changes when they contain duplicated IDs or endpoints. Fixes #31339. #32121 (alesapin). - Set Content-Type in HTTP packets issued from URL engine. #32113 (Frank Chen).
- Return Content-Type as 'application/json' for
JSONEachRow
format ifoutput_format_json_array_of_rows
is enabled. #32112 (Frank Chen). - ๐ Allow to parse
+
beforeFloat32
/Float64
values. #32079 (Kruglov Pavel). - ๐ Allow a user configured
hdfs_replication
parameter forDiskHDFS
andStorageHDFS
. Closes #32039. #32049 (leosunli). - โ Added ClickHouse
exception
andexception_code
fields to opentelemetry span log. #32040 (Frank Chen). - ๐ Improve opentelemetry span log duration - it was is zero at the query level if there is a query exception. #32038 (Frank Chen).
- ๐ Fix the issue that
LowCardinality
ofInt256
cannot be created. #31832 (alexey-milovidov). - ๐ฒ Recreate
system.*_log
tables in case of different engine/partition_by. #31824 (Azat Khuzhin). MaterializedMySQL
: Fix issue with table named 'table'. #31781 (Hรฅvard Kvรฅlen).- ๐ ClickHouse dictionary source: support predefined connections. Closes #31705. #31749 (Kseniia Sumarokova).
- ๐ Allow to use predefined connections configuration for Kafka and RabbitMQ engines (the same way as for other integration table engines). #31691 (Kseniia Sumarokova).
- Always re-render prompt while navigating history in clickhouse-client. This will improve usability of manipulating very long queries that don't fit on screen. #31675 (alexey-milovidov) (author: Amos Bird).
- โ Add key bindings for navigating through history (instead of lines/history). #31641 (Azat Khuzhin).
- Improve the
max_execution_time
checks. Fixed some cases when timeout checks do not happen and query could run too long. #31636 (Raรบl Marรญn). - ๐ Better exception message when
users.xml
cannot be loaded due to bad password hash. This closes #24126. #31557 (Vitaly Baranov). - ๐ Use shard and replica name from
Replicated
database arguments when expanding macros inReplicatedMergeTree
arguments if these macros are not defined in config. Closes #31471. #31488 (tavplubix). - Better analysis for
min/max/count
projection. Now, with enabledallow_experimental_projection_optimization
, virtualmin/max/count
projection can be used together with columns from partition key. #31474 (Amos Bird). - โ Add
--pager
support forclickhouse-local
. #31457 (Azat Khuzhin). - ๐ Fix waiting of the editor during interactive query edition (
waitpid()
returns -1 onSIGWINCH
andEDITOR
andclickhouse-local
/clickhouse-client
works concurrently). #31456 (Azat Khuzhin). - ๐ป Throw an exception if there is some garbage after field in
JSONCompactStrings(EachRow)
format. #31455 (Kruglov Pavel). - Default value of
http_send_timeout
andhttp_receive_timeout
settings changed from 1800 (30 minutes) to 180 (3 minutes). #31450 (tavplubix). MaterializedMySQL
now handlesCREATE TABLE ... LIKE ...
DDL queries. #31410 (Stig Bakken).- Return artificial create query when executing
show create table
on system's tables. #31391 (SuperDJY). - Previously progress was shown only for
numbers
table function. Now fornumbers_mt
it is also shown. #31318 (Kseniia Sumarokova). - ๐ Initial user's roles are used now to find row policies, see #31080. #31262 (Vitaly Baranov).
- โ If some obsolete setting is changed - show warning in
system.warnings
. #31252 (tavplubix). - Improved backoff for background cleanup tasks in
MergeTree
. Settingsmerge_tree_clear_old_temporary_directories_interval_seconds
andmerge_tree_clear_old_parts_interval_seconds
moved from users settings to merge tree settings. #31180 (tavplubix). - Now every replica will send to client only incremental information about profile events counters. #31155 (Dmitry Novik). This makes
--hardware_utilization
option inclickhouse-client
usable. - 0๏ธโฃ Enable multiline editing in clickhouse-client by default. This addresses #31121 . #31123 (Amos Bird).
- ๐ Function name normalization for
ALTER
queries. This helps avoid metadata mismatch between creating table with indices/projections and adding indices/projections via alter commands. This is a follow-up PR of https://github.com/ClickHouse/ClickHouse/pull/20174. Mark as improvements as there are no bug reports and the senario is somehow rare. #31095 (Amos Bird). - ๐ Support
IF EXISTS
modifier forRENAME DATABASE
/TABLE
/DICTIONARY
query. If this directive is used, one will not get an error if the DATABASE/TABLE/DICTIONARY to be renamed doesn't exist. #31081 (victorgao). - ๐ Cancel vertical merges when partition is dropped. This is a follow-up of https://github.com/ClickHouse/ClickHouse/pull/25684 and https://github.com/ClickHouse/ClickHouse/pull/30996. #31057 (Amos Bird).
- The local session inside a Clickhouse dictionary source won't send its events to the session log anymore. This fixes a possible deadlock (tsan alert) on shutdown. Also this PR fixes flaky
test_dictionaries_dependency_xml/
. #31013 (Vitaly Baranov). - Less locking in ALTER command. #31010 (Amos Bird).
- ๐ Fix
--verbose
option in clickhouse-local interactive mode and allow logging into file. #30881 (Kseniia Sumarokova). - โ Added
\l
,\d
,\c
commands inclickhouse-client
like in MySQL and PostgreSQL. #30876 (Pavel Medvedev). - For clickhouse-local or clickhouse-client: if there is
--interactive
option with--query
or--queries-file
, then first execute them like in non-interactive and then start interactive mode. #30851 (Kseniia Sumarokova). - ๐ Fix possible "The local set of parts of X doesn't look like the set of parts in ZooKeeper" error (if DROP fails during removing znodes from zookeeper). #30826 (Azat Khuzhin).
- Avro format works against Kafka. Setting
output_format_avro_rows_in_file
added. #30351 (Ilya Golshtein). - ๐ Allow to specify one or any number of PostgreSQL schemas for one
MaterializedPostgreSQL
database. Closes #28901. Closes #29324. #28933 (Kseniia Sumarokova). - 0๏ธโฃ Replaced default ports for clickhouse-keeper internal communication from 44444 to 9234. Fixes #30879. #31799 (alesapin).
- Implement function transform with Decimal arguments. #31839 (ๆๅธ ).
- Fix abort in debug server and
DB::Exception: std::out_of_range: basic_string
error in release server in case of bad hdfs url by adding additional check of hdfs url structure. #31042 (Kruglov Pavel). - ๐ Fix possible assert in
hdfs
table function/engine, add test. #31036 (Kruglov Pavel).
๐ Bug Fixes
- ๐ Fix group by / order by / limit by aliases with positional arguments enabled. Closes #31173. #31741 (Kseniia Sumarokova).
- ๐ Fix usage of
Buffer
table engine with typeMap
. Fixes #30546. #31742 (Anton Popov). - Fix reading from
MergeTree
tables with enableduse_uncompressed_cache
. #31826 (Anton Popov). - Fixed the behavior when mutations that have nothing to do are stuck (with enabled setting
empty_result_for_aggregation_by_empty_set
). #32358 (Nikita Mikhaylov). - ๐ Fix skipping columns while writing protobuf. This PR fixes #31160, see the comment #31160#issuecomment-980595318. #31988 (Vitaly Baranov).
- ๐ Fix bug when remove unneeded columns in subquery. If there is an aggregation function in query without group by, do not remove if it is unneeded. #32289 (dongyifeng).
- ๐ Quota limit was not reached, but the limit was exceeded. This PR fixes #31174. #31337 (sunny).
- ๐ Fix SHOW GRANTS when partial revokes are used. This PR fixes #31138. #31249 (Vitaly Baranov).
- Memory amount was incorrectly estimated when ClickHouse is run in containers with cgroup limits. #31157 (Pavel Medvedev).
- ๐ Fix
ALTER ... MATERIALIZE COLUMN ...
queries in case when data type of default expression is not equal to the data type of column. #32348 (Anton Popov). - ๐ Fixed crash with SIGFPE in aggregate function
avgWeighted
withDecimal
argument. Fixes #32053. #32303 (tavplubix). - ๐ Server might fail to start with
Cannot attach 1 tables due to cyclic dependencies
error ifDictionary
table looks at XML-dictionary with the same name, it's fixed. Fixes #31315. #32288 (tavplubix). - ๐ Fix parsing error while NaN deserializing for
Nullable(Float)
forQuoted
escaping rule. #32190 (Kruglov Pavel). - โฌ๏ธ XML dictionaries: identifiers, used in table create query, can be qualified to
default_database
during upgrade to newer version. Closes #31963. #32187 (Maksim Kita). - Number of active replicas might be determined incorrectly when inserting with quorum if setting
replicated_can_become_leader
is disabled on some replicas. It's fixed. #32157 (tavplubix). - Dictionaries: fix cases when
{condition}
does not work for custom database queries. #32117 (Maksim Kita). - Fix
CAST
fromNullable
withcast_keep_nullable
(PARAMETER_OUT_OF_BOUND
error before for i.e.toUInt32OrDefault(toNullable(toUInt32(1)))
). #32080 (Azat Khuzhin). - ๐ Fix CREATE TABLE of Join Storage in some obscure cases. Close #31680. #32066 (SuperDJY).
- ๐ Fixed
Directory ... already exists and is not empty
error when detaching part. #32063 (tavplubix). MaterializedMySQL
(experimental feature): Fix misinterpretation ofDECIMAL
data from MySQL. #31990 (Hรฅvard Kvรฅlen).FileLog
(experimental feature) engine unnesessary created meta data directory when create table failed. Fix #31962. #31967 (flynn).- ๐ Some
GET_PART
entry might hang in replication queue if part is lost on all replicas and there are no other parts in the same partition. It's fixed in cases when partition key contains only columns of integer types orDate[Time]
. Fixes #31485. #31887 (tavplubix). - ๐ Fix functions
empty
andnotEmpty
with arguments ofUUID
type. Fixes #31819. #31883 (Anton Popov). - Change configuration path from
keeper_server.session_timeout_ms
tokeeper_server.coordination_settings.session_timeout_ms
when constructing aKeeperTCPHandler
. Same withoperation_timeout
. #31859 (JackyWoo). - ๐ Fix invalid cast of Nullable type when nullable primary key is used. (Nullable primary key is a discouraged feature - please do not use). This fixes #31075. #31823 (Amos Bird).
- ๐ Fix crash in recursive UDF in SQL. Closes #30856. #31820 (Maksim Kita).
- ๐ Fix crash when function
dictGet
with type is used for dictionary attribute when type isNullable
. Fixes #30980. #31800 (Maksim Kita). - ๐ Fix crash with empty result of ODBC query (with some ODBC drivers). Closes #31465. #31766 (Kseniia Sumarokova).
- Fix disabling query profiler (In case of
query_profiler_real_time_period_ns>0
/query_profiler_cpu_time_period_ns>0
query profiler can stayed enabled even after query finished). #31740 (Azat Khuzhin). - ๐ Fixed rare segfault on concurrent
ATTACH PARTITION
queries. #31738 (tavplubix). - ๐ Fix race in JSONEachRowWithProgress output format when data and lines with progress are mixed in output. #31736 (Kruglov Pavel).
- ๐ Fixed
there are no such cluster here
error on execution ofON CLUSTER
query if specified cluster name is name ofReplicated
database. #31723 (tavplubix). - ๐ Fix exception on some of the applications of
decrypt
function on Nullable columns. This closes #31662. This closes #31426. #31707 (alexey-milovidov). - ๐ Fixed function ngrams when string contains UTF-8 characters. #31706 (yandd).
- Settings
input_format_allow_errors_num
andinput_format_allow_errors_ratio
did not work for parsing of domain types, such asIPv4
, it's fixed. Fixes #31686. #31697 (tavplubix). - ๐ Fixed null pointer exception in
MATERIALIZE COLUMN
. #31679 (Nikolai Kochetov). - ๐
RENAME TABLE
query worked incorrectly on attempt to rename an DDL dictionary inOrdinary
database, it's fixed. #31638 (tavplubix). - ๐ Implement
sparkbar
aggregate function as it was intended, see: #26175#issuecomment-960353867, comment. #31624 (ๅฐ่ทฏ). - ๐ Fix invalid generated JSON when only column names contain invalid UTF-8 sequences. #31534 (Kevin Michel).
- ๐ Disable
partial_merge_join_left_table_buffer_bytes
before bug in this optimization is fixed. See #31009). Remove redundant optionpartial_merge_join_optimizations
. #31528 (Vladimir C). - ๐ Fix progress for short
INSERT SELECT
queries. #31510 (Azat Khuzhin). - ๐ Fix wrong behavior with group by and positional arguments. Closes #31280#issuecomment-968696186. #31420 (Kseniia Sumarokova).
- Resolve
nullptr
in STS credentials provider for S3. #31409 (Vladimir Chebotarev). - โ Remove
notLike
function from index analysis, because it was wrong. #31169 (sundyli). - ๐ Fix bug in Keeper which can lead to inability to start when some coordination logs was lost and we have more fresh snapshot than our latest log. #31150 (alesapin).
- Rewrite right distributed table in local join. solves #25809. #31105 (abel-cheng).
- ๐ Fix
Merge
table with aliases and where (it did not work before at all). Closes #28802. #31044 (Kseniia Sumarokova). - Fix JSON_VALUE/JSON_QUERY with quoted identifiers. This allows to have spaces in json path. Closes #30971. #31003 (Kseniia Sumarokova).
- Using
formatRow
function with not row-oriented formats led to segfault. Don't allow to use this function with such formats (because it doesn't make sense). #31001 (Kruglov Pavel). - ๐ Fix bug which broke select queries if they happened after dropping materialized view. Found in #30691. #30997 (Kseniia Sumarokova).
- Skip
max_partition_size_to_drop check
in case of ATTACH PARTITION ... FROM and MOVE PARTITION ... #30995 (Amr Alaa). - ๐ Fix some corner cases with
INTERSECT
andEXCEPT
operators. Closes #30803. #30965 (Kseniia Sumarokova).
๐ Build/Testing/Packaging Improvement
- ๐ Fix incorrect filtering result on non-x86 builds. This closes #31417. This closes #31524. #31574 (alexey-milovidov).
- ๐ Make ClickHouse build fully reproducible (byte identical on different machines). This closes #22113. #31899 (alexey-milovidov). Remove filesystem path to the build directory from binaries to enable reproducible builds. This needed for #22113. #31838 (alexey-milovidov).
- ๐ท Use our own CMakeLists for
zlib-ng
,cassandra
,mariadb-connector-c
andxz
,re2
,sentry
,gsasl
,arrow
,protobuf
. This is needed for #20151. Part of #9226. A small step towards removal of annoying trash from the build system. #30599 (alexey-milovidov). - ๐ Hermetic builds: use fixed version of libc and make sure that no source or binary files from the host OS are using during build. This closes #27133. This closes #21435. This closes #30462. #30011 (alexey-milovidov).
- โ Adding function
getFuzzerData()
to easily fuzz particular functions. This closes #23227. #27526 (Alexey Boykov). - ๐ณ More correct setting up capabilities inside Docker. #31802 (Constantine Peresypkin).
- Enable clang
-fstrict-vtable-pointers
,-fwhole-program-vtables
compile options. #20151 (Maksim Kita). - Avoid downloading toolchain tarballs for cross-compiling for FreeBSD. #31672 (alexey-milovidov).
- ๐ Initial support for risc-v. See development/build-cross-riscv for quirks and build command that was tested. #31309 (Vladimir Smirnov).
- ๐ Support compile in arm machine with parameter "-DENABLE_TESTS=OFF". #31007 (zhanghuajie).
- A fix for a feature that previously had unwanted behaviour. Do not allow direct select for Kafka/RabbitMQ/FileLog. Can be enabled by setting
-
v21.11 Changes
November 09, 2021Backward Incompatible Change
- ๐ Change order of json_path and json arguments in SQL/JSON functions (to be consistent with the standard). Closes #30449. #30474 (Kseniia Sumarokova).
- Remove
MergeTree
table settingwrite_final_mark
. It will be alwaystrue
. #30455 (Kseniia Sumarokova). No actions required, all tables are compatible with the new version. - ๐ Function
bayesAB
is removed. Please help to return this function back, refreshed. This closes #26233. #29934 (alexey-milovidov). - This is relevant only if you already started using the experimental
clickhouse-keeper
support. Now ClickHouse Keeper snapshots compressed withZSTD
codec by default instead of custom ClickHouse LZ4 block compression. This behavior can be turned off withcompress_snapshots_with_zstd_format
coordination setting (must be equal on all quorum replicas). Backward incompatibility is quite rare and may happen only when new node will send snapshot (happens in case of recovery) to the old node which is unable to read snapshots in ZSTD format. #29417 (alesapin).
๐ New Feature
- New asynchronous INSERT mode allows to accumulate inserted data and store it in a single batch in background. On client it can be enabled by setting
async_insert
forINSERT
queries with data inlined in query or in separate buffer (e.g. forINSERT
queries via HTTP protocol). Ifwait_for_async_insert
is true (by default) the client will wait until data will be flushed to table. On server-side it controlled by the settingsasync_insert_threads
,async_insert_max_data_size
andasync_insert_busy_timeout_ms
. Implements #18282. #27537 (Anton Popov). #20557 (Ivan). Notes on performance: with asynchronous inserts you can do up to around 10 000 individual INSERT queries per second, so it is still recommended to insert in batches if you want to achieve performance up to millions inserted rows per second. - โ Add interactive mode for
clickhouse-local
. So, you can just runclickhouse-local
to get a command line ClickHouse interface without connecting to a server and process data from files and external data sources. Also merge the code ofclickhouse-client
andclickhouse-local
together. Closes #7203. Closes #25516. Closes #22401. #26231 (Kseniia Sumarokova). - โ Added support for executable (scriptable) user defined functions. These are UDFs that can be written in any programming language. #28803 (Maksim Kita).
- ๐ Allow predefined connections to external data sources. This allows to avoid specifying credentials or addresses while using external data sources, they can be referenced by names instead. Closes #28367. #28577 (Kseniia Sumarokova).
- โ Added
INFORMATION_SCHEMA
database withSCHEMATA
,TABLES
,VIEWS
andCOLUMNS
views to the corresponding tables insystem
database. Closes #9770. #28691 (tavplubix). - ๐ Support
EXISTS (subquery)
. Closes #6852. #29731 (Kseniia Sumarokova). - ๐ฒ Session logging for audit. Logging all successful and failed login and logout events to a new
system.session_log
table. #22415 (Vasily Nemkov) (Vitaly Baranov). - ๐ Support multidimensional cosine distance and euclidean distance functions; L1, L2, Lp, Linf distances and norms. Scalar product on tuples and various arithmetic operators on tuples. This fully closes #4509 and even more. #27933 (Alexey Boykov).
- โ Add support for compression and decompression for
INTO OUTFILE
andFROM INFILE
(with autodetect or with additional optional parameter). #27135 (Filatenkov Artur). - โ Add CORS (Cross Origin Resource Sharing) support with HTTP
OPTIONS
request. It means, now Grafana will work with serverless requests without a kludges. Closes #18693. #29155 (Filatenkov Artur). - ๐ Queries with JOIN ON now supports disjunctions (OR). #21320 (Ilya Golshtein).
- โ Added function
tokens
. That allow to split string into tokens using non-alpha numeric ASCII characters as separators. #29981 (Maksim Kita). Added functionngrams
to extract ngrams from text. Closes #29699. #29738 (Maksim Kita). - โ Add functions for Unicode normalization:
normalizeUTF8NFC
,normalizeUTF8NFD
,normalizeUTF8NFKC
,normalizeUTF8NFKD
functions. #28633 (darkkeks). - ๐ Streaming consumption of application log files in ClickHouse with
FileLog
table engine. It's likeKafka
orRabbitMQ
engine but for append-only and rotated logs in local filesystem. Closes #6953. #25969 (flynn) (Kseniia Sumarokova). - โ Add
CapnProto
output format, refactorCapnProto
input format. #29291 (Kruglov Pavel). - ๐ Allow to write number in query as binary literal. Example
SELECT 0b001;
. #29304 (Maksim Kita). - โ Added
hashed_array
dictionary type. It saves memory when using dictionaries with multiple attributes. Closes #30236. #30242 (Maksim Kita). - โ Added
JSONExtractKeys
function. #30056 (Vitaly). - โ Add a function
getOSKernelVersion
- it returns a string with OS kernel version. #29755 (Memo). - โ Added
MD4
andSHA384
functions. MD4 is an obsolete and insecure hash function, it can be used only in rare cases when MD4 is already being used in some legacy system and you need to get exactly the same result. #29602 (Nikita Tikhomirov). - HSTS can be enabled for Clickhouse HTTP server by setting
hsts_max_age
in configuration file with a positive number. #29516 (ๅๆถ). - ๐ Huawei OBS Storage support. Closes #24294. #29511 (kevin wan).
- ๐ New function
mapContainsKeyLike
to get the map that key matches a simple regular expression. #29471 (ๅๆถ). New functionmapExtractKeyLike
to get the map only kept elements matched specified pattern. #30793 (ๅๆถ). - Implemented
ALTER TABLE x MODIFY COMMENT
. #29264 (Vasily Nemkov). - โ Adds H3 inspection functions that are missing from ClickHouse but are available via the H3 api: https://h3geo.org/docs/api/inspection. #29209 (Bharat Nallan).
- ๐ Allow non-replicated ALTER TABLE FETCH and ATTACH in Replicated databases. #29202 (Kevin Michel).
- Added a setting
output_format_csv_null_representation
: This is the same asoutput_format_tsv_null_representation
but is for CSV output. #29123 (PHO). - โ Added function
zookeeperSessionUptime()
which returns uptime of current ZooKeeper session in seconds. #28983 (tavplubix). - Implements the
h3ToGeoBoundary
function. #28952 (Ivan Veselov). - โ Add aggregate function
exponentialMovingAverage
that can be used as window function. This closes #27511. #28914 (alexey-milovidov). - Allow to include subcolumns of table columns into
DESCRIBE
query result (can be enabled by settingdescribe_include_subcolumns
). #28905 (Anton Popov). Executable
,ExecutablePool
added optionsend_chunk_header
. If this option is true then chunk rows_count with line break will be sent to client before chunk. #28833 (Maksim Kita).- ๐
tokenbf_v1
andngram
support Map with key of String of FixedSring type. It enhance data skipping in query with map key filter.sql CREATE TABLE map_tokenbf ( row_id UInt32, map Map(String, String), INDEX map_tokenbf map TYPE ngrambf_v1(4,256,2,0) GRANULARITY 1 ) Engine=MergeTree() Order by id
With table above, the queryselect * from map_tokebf where map['K']='V'
will skip the granule that doesn't contain keyA
. Of course, how many rows will skipped is depended on thegranularity
andindex_granularity
you set. #28511 (ๅๆถ). - Send profile events from server to client. New packet type
ProfileEvents
was introduced. Closes #26177. #28364 (Dmitry Novik). - ๐ Bit shift operations for
FixedString
andString
data types. This closes #27763. #28325 (ๅฐ่ทฏ). - ๐ Support adding / deleting tables to replication from PostgreSQL dynamically in database engine MaterializedPostgreSQL. Support alter for database settings. Closes #27573. #28301 (Kseniia Sumarokova).
- โ Added function accurateCastOrDefault(x, T). Closes #21330. Authors @taiyang-li. #23028 (Maksim Kita).
- โ Add Function
toUUIDOrDefault
,toUInt8/16/32/64/256OrDefault
,toInt8/16/32/64/128/256OrDefault
, which enables user defining default value(not null) when string parsing is failed. #21330 (taiyang-li).
๐ Performance Improvement
- ๐ Background merges can be preempted by each other and they are scheduled with appropriate priorities. Now long running merges won't prevent short merges to proceed. This is needed for a better scheduling and controlling of merges execution. It reduces the chances to get "too many parts" error. #22381. #25165 (Nikita Mikhaylov). Added an ability to execute more merges and mutations than the number of threads in background pool. Merges and mutations will be executed step by step according to their sizes (lower is more prioritized). The ratio of the number of tasks to threads to execute is controlled by a setting
background_merges_mutations_concurrency_ratio
, 2 by default. #29140 (Nikita Mikhaylov). - ๐ Allow to use asynchronous reads for remote filesystems. Lower the number of seeks while reading from remote filesystems. It improves performance tremendously and makes the experimental
web
ands3
disks to work faster than EBS under certain conditions. #29205 (Kseniia Sumarokova). In the meantime, theweb
disk type (static dataset hosted on a web server) is graduated from being experimental to be production ready. - Queries with
INTO OUTFILE
inclickhouse-client
will use multiple threads. Fix the issue with flickering progress-bar when usingINTO OUTFILE
. This closes #30873. This closes #30872. #30886 (alexey-milovidov). - โฌ๏ธ Reduce amount of redundant compressed data read from disk for some types
SELECT
queries (only forMergeTree
engines family). #30111 (alesapin). - โ Remove some redundant
seek
calls while reading compressed blocks in MergeTree table engines family. #29766 (alesapin). - ๐ Make
url
table function to process multiple URLs in parallel. This closes #29670 and closes #29671. #29673 (alexey-milovidov). - Improve performance of aggregation in order of primary key (with enabled setting
optimize_aggregation_in_order
). #30266 (Anton Popov). - Now clickhouse is using DNS cache while communicating with external S3. #29999 (alesapin).
- โ Add support for pushdown of
IS NULL
/IS NOT NULL
to external databases (i.e. MySQL). #29463 (Azat Khuzhin). TransformisNull
/isNotNull
toIS NULL
/IS NOT NULL
(for external dbs, i.e. MySQL). #29446 (Azat Khuzhin). - SELECT queries from Dictionary tables will use multiple threads. #30500 (Maksim Kita).
- ๐ Improve performance for filtering (WHERE operation) of
Decimal
columns. #30431 (Jun Jin). - โ Remove branchy code in filter operation with a better implementation with popcnt/ctz which have better performance. #29881 (Jun Jin).
- ๐ Improve filter bytemask generator (used for WHERE operator) function all in one with SSE/AVX2/AVX512 instructions. Note that by default ClickHouse is only using SSE, so it's only relevant for custom builds. #30014 (jasperzhu). #30670 (jasperzhu).
- ๐ Improve the performance of SUM aggregate function of Nullable floating point numbers. #28906 (Raรบl Marรญn).
- Speed up part loading process with multiple disks are in use. The idea is similar to https://github.com/ClickHouse/ClickHouse/pull/16423 . Prod env shows improvement: 24 min -> 16 min . #28363 (Amos Bird).
- โฌ๏ธ Reduce default settings for S3 multipart upload part size to lower memory usage. #28679 (ianton-ru).
- Speed up
bitmapAnd
function. #28332 (dddounaiking). - โ Removed sub-optimal mutation notifications in
StorageMergeTree
when merges are still going. #27552 (Vladimir Chebotarev). - ๐ Attempt to improve performance of string comparison. #28767 (alexey-milovidov).
- Primary key index and partition filter can work in tuple. #29281 (ๅๆถ).
- If query has multiple quantile aggregate functions with the same arguments but different level parameter, they will be fused together and executed in one pass if the setting
optimize_syntax_fuse_functions
is enabled. #26657 (hexiaoting). - โก๏ธ Now min-max aggregation over the first expression of primary key is optimized by projection. This is for #329. #29918 (Amos Bird).
Experimental Feature
- โ Add ability to change nodes configuration (in
.xml
file) for ClickHouse Keeper. #30372 (alesapin). - โ Add
sparkbar
aggregate function. This closes #26175. #27481 (ๅฐ่ทฏ). Note: there is one flaw in this function, the behaviour will be changed in future releases.
๐ Improvement
- ๐ Allow user to change log levels without restart. #29586 (Nikolay Degterinsky).
- โ
Multiple improvements for SQL UDF. Queries for manipulation of SQL User Defined Functions now support ON CLUSTER clause. Example
CREATE FUNCTION test_function ON CLUSTER 'cluster' AS x -> x + 1;
. Closes #30666. #30734 (Maksim Kita). SupportCREATE OR REPLACE
,CREATE IF NOT EXISTS
syntaxes. #30454 (Maksim Kita). Added DROP IF EXISTS support. ExampleDROP FUNCTION IF EXISTS test_function
. #30437 (Maksim Kita). Support lambdas. ExampleCREATE FUNCTION lambda_function AS x -> arrayMap(element -> element * 2, x);
. #30435 (Maksim Kita). Support SQL user defined functions forclickhouse-local
. #30179 (Maksim Kita). - Enable per-query memory profiler (set to
memory_profiler_step
= 4MiB) globally. #29455 (Azat Khuzhin). - Added columns
data_compressed_bytes
,data_uncompressed_bytes
,marks_bytes
intosystem.data_skipping_indices
. Added columnssecondary_indices_compressed_bytes
,secondary_indices_uncompressed_bytes
,secondary_indices_marks_bytes
intosystem.parts
. Closes #29697. #29896 (Maksim Kita). - โ Add
table
alias to system.tables anddatabase
alias to system.databases #29677. #29882 (kevin wan). - Correctly resolve interdependencies between tables on server startup. Closes #8004, closes #15170. #28373 (tavplubix).
- Avoid error "Division by zero" when denominator is Nullable in functions
divide
,intDiv
andmodulo
. Closes #22621. #28352 (Kruglov Pavel). - ๐ Allow to parse values of
Date
data type in text formats asYYYYMMDD
in addition toYYYY-MM-DD
. This closes #30870. #30871 (alexey-milovidov). - ๐ป Web UI: render bars in table cells. #29792 (alexey-milovidov).
- ๐ User can now create dictionaries with comments:
CREATE DICTIONARY ... COMMENT 'vaue'
... #29899 (Vasily Nemkov). Users now can set comments to database inCREATE DATABASE
statement ... #29429 (Vasily Nemkov). - Introduce
compiled_expression_cache_elements_size
setting. If you will ever want to use this setting, you will already know what it does. #30667 (Maksim Kita). - ๐ clickhouse-format now supports option
--query
. In previous versions you have to pass the query to stdin. #29325 (ๅๆถ). - ๐ Support
ALTER TABLE
for tables inMemory
databases. Memory databases are used inclickhouse-local
. #30866 (tavplubix). - ๐ Arrays of all serializable types are now supported by
arrayStringConcat
. #30840 (Nickita Taranov). - ๐ ClickHouse now will account docker/cgroups limitations to get system memory amount. See #25662. #30574 (Pavel Medvedev).
- Fetched table structure for PostgreSQL database is more reliable now. #30477 (Kseniia Sumarokova).
- ๐ Full support of positional arguments in GROUP BY and ORDER BY. #30433 (Kseniia Sumarokova).
- ๐ Allow extracting non-string element as string using JSONExtractString. This is for pull/25452#issuecomment-927123287. #30426 (Amos Bird).
- โ Added an ability to use FINAL clause in SELECT queries from
GraphiteMergeTree
. #30360 (Nikita Mikhaylov). - Minor improvements in replica cloning and enqueuing fetch for broken parts, that should avoid extremely rare hanging of
GET_PART
entries in replication queue. #30346 (tavplubix). - ๐ Allow symlinks to files in
user_files
directory for file table function. #30309 (Kseniia Sumarokova). - ๐ Fixed comparison of
Date32
withDate
,DateTime
,DateTime64
andString
. #30219 (liang.huang). - ๐ Allow to remove
SAMPLE BY
expression fromMergeTree
tables (ALTER TABLE <table> REMOVE SAMPLE BY
). #30180 (Anton Popov). - Now
Keeper
(as part ofclickhouse-server
) will start asynchronously if it can connect to some other node. #30170 (alesapin). - ๐ Now
clickhouse-client
supports native multi-line editing. #30143 (Amos Bird). polygon
dictionaries (reverse geocoding): added support for reading the dictionary content with SELECT query method if settingstore_polygon_key_column
= true. Closes #30090. #30142 (Maksim Kita).- โ Add ClickHouse logo to Play UI. #29674 (alexey-milovidov).
- ๐ Better exception message while reading column from Arrow-supported formats like
Arrow
,ArrowStream
,Parquet
andORC
. This closes #29926. #29927 (alexey-milovidov). - ๐ Fix data-race between flush and startup in
Buffer
tables. This can appear in tests. #29930 (Azat Khuzhin). - ๐ Fix
lock-order-inversion
betweenDROP TABLE
forDatabaseMemory
andLiveView
. Live View is an experimental feature. Memory database is used in clickhouse-local. #29929 (Azat Khuzhin). - ๐ Fix lock-order-inversion between periodic dictionary reload and config reload. #29928 (Azat Khuzhin).
- โก๏ธ Update zoneinfo files to 2021c. #29925 (alexey-milovidov).
- โ Add ability to configure retries and delays between them for
clickhouse-copier
. #29921 (Azat Khuzhin). - Add
shutdown_wait_unfinished_queries
server setting to allowing waiting for running queries up toshutdown_wait_unfinished
time. This is for #24451. #29914 (Amos Bird). - Add ability to trace peak memory usage (with new trace_type in
system.trace_log
-MemoryPeak
). #29858 (Azat Khuzhin). - PostgreSQL foreign tables: Added partitioned table prefix 'p' for the query for fetching replica identity index. #29828 (Shoh Jahon).
- Apply
max_untracked_memory
/memory_profiler_step
/memory_profiler_sample_probability
during mutate/merge to profile memory usage during merges. #29681 (Azat Khuzhin). - Query obfuscator:
clickhouse-format --obfuscate
now works with more types of queries. #29672 (alexey-milovidov). - ๐ Fixed the issue:
clickhouse-format --obfuscate
cannot process queries with embedded dictionaries (functionsregionTo...
). #29667 (alexey-milovidov). - ๐ Fix incorrect Nullable processing of JSON functions. This fixes #29615 . Mark as improvement because https://github.com/ClickHouse/ClickHouse/pull/28012 is not released. #29659 (Amos Bird).
- ๐ง Increase
listen_backlog
by default (to match default in newer linux kernel). #29643 (Azat Khuzhin). - Reload dictionaries, models, user defined executable functions if servers config
dictionaries_config
,models_config
,user_defined_executable_functions_config
changes. Closes #28142. #29529 (Maksim Kita). - Get rid of pointless restriction on projection name. Now projection name can start with
tmp_
. #29520 (Amos Bird). - Fixed
There is no query or query context has expired
error in mutations with nested subqueries. Do not allow subqueries in mutation if table is replicated andallow_nondeterministic_mutations
setting is disabled. #29495 (tavplubix). - Apply config changes to
max_concurrent_queries
during runtime (no need to restart). #29414 (Raรบl Marรญn). - Added setting
use_skip_indexes
. #29405 (Maksim Kita). - โ Add support for
FREEZE
ing in-memory parts (for backups). #29376 (Mo Xuan). - Pass through initial query_id for
clickhouse-benchmark
(previously if you run remote query viaclickhouse-benchmark
, queries on shards will not be linked to the initial query viainitial_query_id
). #29364 (Azat Khuzhin). - Skip indexes
tokenbf_v1
andngrambf_v1
: added support forArray
data type with key ofString
ofFixedString
type. #29280 (Maksim Kita). Skip indexestokenbf_v1
andngrambf_v1
added support forMap
data type with key ofString
ofFixedString
type. Author @lingtaolf. #29220 (Maksim Kita). - ๐ Function
has
: added support forMap
data type. #29267 (Maksim Kita). - โ Add
compress_logs
settings for clickhouse-keeper which allow to compress clickhouse-keeper logs (for replicated state machine) inZSTD
. Implements: #26977. #29223 (alesapin). - Add a setting
external_table_strict_query
- it will force passing the whole WHERE expression in queries to foreign databases even if it is incompatible. #29206 (Azat Khuzhin). - Disable projections when
ARRAY JOIN
is used. In previous versions projection analysis may break aliases in array join. #29139 (Amos Bird). - ๐ Support more types in
MsgPack
input/output format. #29077 (Kruglov Pavel). - ๐ Allow to input and output
LowCardinality
columns inORC
input/output format. #29062 (Kruglov Pavel). - Select from
system.distributed_ddl_queue
might show incorrect values, it's fixed. #29061 (tavplubix). - Correct behaviour with unknown methods for HTTP connection. Solves #29050. #29057 (Filatenkov Artur).
- ๐
clickhouse-keeper
: Fix bug inclickhouse-keeper-converter
which can lead to some data loss while restoring from ZooKeeper logs (not snapshot). #29030 (ๅฐ่ทฏ). Fix bug inclickhouse-keeper-converter
which can lead to incorrect ZooKeeper log deserialization. #29071 (ๅฐ่ทฏ). - ๐ Apply settings from
CREATE ... AS SELECT
queries (fixes: #28810). #28962 (Azat Khuzhin). - ๐ Respect default database setting for ALTER TABLE ... ON CLUSTER ... REPLACE/MOVE PARTITION FROM/TO ... #28955 (anneji-dev).
- gRPC protocol: Allow change server-side compression from client. #28953 (Vitaly Baranov).
- ๐ป Skip "no data" exception when reading thermal sensors for asynchronous metrics. This closes #28852. #28882 (alexey-milovidov).
- ๐ Fixed logical race condition that might cause
Dictionary not found
error for existing dictionary in rare cases. #28853 (tavplubix). - ๐ Relax nested function for If-combinator check (but forbid nested identical combinators). #28828 (Azat Khuzhin).
- ๐ Fix possible uncaught exception during server termination. #28761 (Azat Khuzhin).
- ๐ Forbid cleaning of tmp directories that can be used by an active mutation/merge if mutation/merge is extraordinarily long. #28760 (Azat Khuzhin).
- Allow optimization
optimize_arithmetic_operations_in_aggregate_functions = 1
when alias is used. #28746 (Amos Bird). - Implement
detach_not_byte_identical_parts
setting forReplicatedMergeTree
, that will detach instead of remove not byte-identical parts (after mege/mutate). #28708 (Azat Khuzhin). - Implement
max_suspicious_broken_parts_bytes
setting forMergeTree
(to limit total size of all broken parts, default is1GiB
). #28707 (Azat Khuzhin). - Enable expanding macros in
RabbitMQ
table settings. #28683 (Vitaly Baranov). - โช Restore the possibility to read data of a table using the
Log
engine in multiple threads. #28125 (Vitaly Baranov). - ๐ Fix misbehavior of NULL column handling in JSON functions. This fixes #27930. #28012 (Amos Bird).
- ๐ Allow to set the size of Mark/Uncompressed cache for skip indices separately from columns. #27961 (Amos Bird).
- ๐ Allow to mix JOIN with
USING
with other JOIN types. #23881 (darkkeks). - โก๏ธ Update aws-sdk submodule for throttling in Yandex Cloud S3. #30646 (ianton-ru).
- ๐ Fix releasing query ID and session ID at the end of query processing while handing gRPC call. #29954 (Vitaly Baranov).
- ๐ Fix shutdown of
AccessControlManager
to fix flaky test. #29951 (Vitaly Baranov). - ๐ Fix failed assertion in reading from
HDFS
. Update libhdfs3 library to be able to run in tests in debug. Closes #29251. Closes #27814. #29276 (Kseniia Sumarokova).
๐ Build/Testing/Packaging Improvement
- โ Add support for FreeBSD builds for Aarch64 machines. #29952 (MikaelUrankar).
- Recursive submodules are no longer needed for ClickHouse. #30315 (alexey-milovidov).
- ๐ ClickHouse can be statically built with Musl. This is added as experiment, it does not support building
odbc-bridge
,library-bridge
, integration with CatBoost and some libraries. #30248 (alexey-milovidov). - ๐ Enable
Protobuf
,Arrow
,ORC
,Parquet
forAArch64
andDarwin
(macOS) builds. This closes #29248. This closes #28018. #30015 (alexey-milovidov). - โ Add cross-build for PowerPC (powerpc64le). This closes #9589. Enable support for interaction with MySQL for AArch64 and PowerPC. This closes #26301. #30010 (alexey-milovidov).
- Leave only required files in cross-compile toolchains. Include them as submodules (earlier they were downloaded as tarballs). #29974 (alexey-milovidov).
- ๐ Implemented structure-aware fuzzing approach in ClickHouse for select statement parser. #30012 (Paul).
- Turning on experimental constexpr expressions evaluator for clang to speed up template code compilation. #29668 (myrrc).
- โ Add ability to compile using newer version fo glibc without using new symbols. #29594 (Azat Khuzhin).
- โฌ๏ธ Reduce Debug build binary size by clang optimization option. #28736 (flynn).
- ๐ท Now all images for CI will be placed in the separate dockerhub repo. #28656 (alesapin).
- ๐ Improve support for build with clang-13. #28046 (Sergei Semin).
- โ Add ability to print raw profile events to
clickhouse-client
(This can be useful for debugging and for testing). #30064 (Azat Khuzhin). - โ Add time dependency for clickhouse-server unit (systemd and sysvinit init). #28891 (Azat Khuzhin).
- Reload stacktrace cache when symbol is reloaded. #28137 (Amos Bird).
๐ Bug Fix
- ๐ Functions for case-insensitive search in UTF-8 strings like
positionCaseInsensitiveUTF8
andcountSubstringsCaseInsensitiveUTF8
might find substrings that actually does not match in very rare cases, it's fixed. #30663 (tavplubix). - ๐ Fix reading from empty file on encrypted disk. #30494 (Vitaly Baranov).
- Fix transformation of disjunctions chain to
IN
(controlled by settingsoptimize_min_equality_disjunction_chain_length
) in distributed queries with settingslegacy_column_name_of_tuple_literal = 0
. #28658 (Anton Popov). - ๐ Allow using a materialized column as the sharding key in a distributed table even if
insert_allow_materialized_columns=0
:. #28637 (Vitaly Baranov). - ๐ Fix
ORDER BY ... WITH FILL
with setTO
andFROM
and no rows in result set. #30888 (Anton Popov). - ๐ Fix set index not used in AND/OR expressions when there are more than two operands. This fixes #30416 . #30887 (Amos Bird).
- ๐ Fix crash when projection with hashing function is materialized. This fixes #30861 . The issue is similar to https://github.com/ClickHouse/ClickHouse/pull/28560 which is a lack of proper understanding of the invariant of header's emptyness. #30877 (Amos Bird).
- ๐ Fixed ambiguity when extracting auxiliary ZooKeeper name from ZooKeeper path in
ReplicatedMergeTree
. Previously server might fail to start withUnknown auxiliary ZooKeeper name
if ZooKeeper path contains a colon. Fixes #29052. Also it was allowed to specify ZooKeeper path that does not start with slash, but now it's deprecated and creation of new tables with such path is not allowed. Slashes and colons in auxiliary ZooKeeper names are not allowed too. #30822 (tavplubix). - Clean temporary directory when localBackup failed by some reason. #30797 (ianton-ru).
- ๐ Fixed a race condition between
REPLACE/MOVE PARTITION
and background merge in non-replicatedMergeTree
that might cause a part of moved/replaced data to remain in partition. Fixes #29327. #30717 (tavplubix). - ๐ Fix PREWHERE with WHERE in case of always true PREWHERE. #30668 (Azat Khuzhin).
- ๐ Limit push down optimization could cause a error
Cannot find column
. Fixes #30438. #30562 (Nikolai Kochetov). - โ Add missing parenthesis for
isNotNull
/isNull
rewrites toIS [NOT] NULL
(fixes queries that has something likeisNotNull(1)+isNotNull(2)
). #30520 (Azat Khuzhin). - ๐ Fix deadlock on ALTER with scalar subquery to the same table, close #30461. #30492 (Vladimir C).
- ๐ Fixed segfault which might happen if session expired during execution of REPLACE PARTITION. #30432 (tavplubix).
- ๐ Queries with condition like
IN (subquery)
could return incorrect result in case if aggregate projection applied. Fixed creation of sets for projections. #30310 (Amos Bird). - ๐ Fix column alias resolution of JOIN queries when projection is enabled. This fixes #30146. #30293 (Amos Bird).
- ๐ Fix some deficiency in
replaceRegexpAll
function. #30292 (Memo). - ๐ Fix ComplexKeyHashedDictionary, ComplexKeySparseHashedDictionary parsing
preallocate
option from layout config. #30246 (Maksim Kita). - ๐ Fix
[I]LIKE
function. Closes #28661. #30244 (Nikolay Degterinsky). - ๐ Fix crash with shortcircuit and lowcardinality in multiIf. #30243 (Raรบl Marรญn).
- FlatDictionary, HashedDictionary fix bytes_allocated calculation for nullable attributes. #30238 (Maksim Kita).
- ๐ Allow identifiers starting with numbers in multiple joins. #30230 (Vladimir C).
- Fix reading from
MergeTree
withmax_read_buffer_size = 0
(when the user wants to shoot himself in the foot) (can lead to exceptionsCan't adjust last granule
,LOGICAL_ERROR
, or even data loss). #30192 (Azat Khuzhin). - Fix
pread_fake_async
/pread_threadpool
withmin_bytes_to_use_direct_io
. #30191 (Azat Khuzhin). - ๐ Fix INSERT SELECT incorrectly fills MATERIALIZED column based of Nullable column. #30189 (Azat Khuzhin).
- ๐ Support nullable arguments in function
initializeAggregation
. #30177 (Anton Popov). - ๐ Fix error
Port is already connected
for queries withGLOBAL IN
andWITH TOTALS
. Only for 21.9 and 21.10. #30086 (Nikolai Kochetov). - ๐ Fix race between MOVE PARTITION and merges/mutations for MergeTree. #30074 (Azat Khuzhin).
- โ Dropped
Memory
database might reappear after server restart, it's fixed (#29795). Also addedforce_remove_data_recursively_on_drop
setting as a workaround forDirectory not empty
error when droppingOrdinary
database (because it's not possible to remove data leftovers manually in cloud environment). #30054 (tavplubix). - ๐ Fix crash of sample by
tuple()
, closes #30004. #30016 (flynn). - try to close issue: #29965. #29976 (hexiaoting).
- ๐ Fix possible data-race between
FileChecker
andStorageLog
/StorageStripeLog
. #29959 (Azat Khuzhin). - ๐ Fix data-race between
LogSink::writeMarks()
andLogSource
inStorageLog
. #29946 (Azat Khuzhin). - ๐ Fix potential resource leak of the concurrent query limit of merge tree tables introduced in https://github.com/ClickHouse/ClickHouse/pull/19544. #29879 (Amos Bird).
- ๐ Fix system tables recreation check (fails to detect changes in enum values). #29857 (Azat Khuzhin).
- MaterializedMySQL: Fix an issue where if the connection to MySQL was lost, only parts of a transaction could be processed. #29837 (Hรฅvard Kvรฅlen).
- ๐ Avoid
Timeout exceeded: elapsed 18446744073.709553 seconds
error that might happen in extremely rare cases, presumably due to some bug in kernel. Fixes #29154. #29811 (tavplubix). - ๐ Fix bad cast in
ATTACH TABLE ... FROM 'path'
query when non-string literal is used instead of path. It may lead to reading of uninitialized memory. #29790 (alexey-milovidov). - ๐ Fix concurrent access to
LowCardinality
duringGROUP BY
(in combination withBuffer
tables it may lead to troubles). #29782 (Azat Khuzhin). - Fix incorrect
GROUP BY
(multiple rows with the same keys in result) in case of distributed query when shards had mixed versions<= 21.3
and>= 21.4
,GROUP BY
key had several columns all with fixed size, and two-level aggregation was activated (seegroup_by_two_level_threshold
andgroup_by_two_level_threshold_bytes
). Fixes #29580. #29735 (Nikolai Kochetov). - Fixed incorrect behaviour of setting
materialized_postgresql_tables_list
at server restart. Found in #28529. #29686 (Kseniia Sumarokova). - Condition in filter predicate could be lost after push-down optimisation. #29625 (Nikolai Kochetov).
- ๐ Fix JIT expression compilation with aliases and short-circuit expression evaluation. Closes #29403. #29574 (Maksim Kita).
- ๐ Fix rare segfault in
ALTER MODIFY
query when using incorrect table identifier inDEFAULT
expression likex.y.z...
Fixes #29184. #29573 (alesapin). - ๐ Fix nullptr deference for
GROUP BY WITH TOTALS HAVING
(when the column fromHAVING
wasn't selected). #29553 (Azat Khuzhin). - Avoid deadlocks when reading and writting on Join table engine tables at the same time. #29544 (Raรบl Marรญn).
- ๐ Fix bug in check
pathStartsWith
becuase there was bug with the usage ofstd::mismatch
:The behavior is undefined if the second range is shorter than the first range.
. #29531 (Kseniia Sumarokova). - In ODBC bridge add retries for error Invalid cursor state. It is a retriable error. Closes #29473. #29518 (Kseniia Sumarokova).
- ๐ Fixed incorrect table name parsing on loading of
Lazy
database. Fixes #29456. #29476 (tavplubix). - ๐ Fix possible
Block structure mismatch
for subqueries with pushed-downHAVING
predicate. Fixes #29010. #29475 (Nikolai Kochetov). - ๐ Fix Logical error
Cannot capture columns
in functions greatest/least. Closes #29334. #29454 (Kruglov Pavel). - โ RocksDB table engine: fix race condition during multiple DB opening (and get back some tests that triggers the problem on CI). #29393 (Azat Khuzhin).
- ๐ Fix replicated access storage not shutting down cleanly when misconfigured. #29388 (Kevin Michel).
- โ Remove window function
nth_value
as it is not memory-safe. This closes #29347. #29348 (alexey-milovidov). - ๐ Fix vertical merges of projection parts. This fixes #29253 . This PR also fixes several projection merge/mutation issues introduced in https://github.com/ClickHouse/ClickHouse/pull/25165. #29337 (Amos Bird).
- ๐ Fix hanging DDL queries on Replicated database while adding a new replica. #29328 (Kevin Michel).
- โฑ Fix connection timeouts (
send_timeout
/receive_timeout
). #29282 (Azat Khuzhin). - ๐ Fix possible
Table columns structure in ZooKeeper is different from local table structure
exception while recreating or creating new replicas ofReplicatedMergeTree
, when one of table columns have default expressions with case-insensitive functions. #29266 (Anton Popov). - Send normal
Database doesn't exist error
(UNKNOWN_DATABASE
) to the client (via TCP) instead ofAttempt to read after eof
(ATTEMPT_TO_READ_AFTER_EOF
). #29229 (Azat Khuzhin). - ๐ Fix segfault while inserting into column with type LowCardinality(Nullable) in Avro input format. #29132 (Kruglov Pavel).
- ๐ง Do not allow to reuse previous credentials in case of inter-server secret (Before INSERT via Buffer/Kafka to Distributed table with interserver secret configured for that cluster, may re-use previously set user for that connection). #29060 (Azat Khuzhin).
- Handle
any_join_distinct_right_table_keys
when join with dictionary, close #29007. #29014 (Vladimir C). - ๐ Fix "Not found column ... in block" error, when join on alias column, close #26980. #29008 (Vladimir C).
- ๐ Fix the number of threads used in
GLOBAL IN
subquery (it was executed in single threads since #19414 bugfix). #28997 (Nikolai Kochetov). - ๐ Fix bad optimizations of ORDER BY if it contains WITH FILL. This closes #28908. This closes #26049. #28910 (alexey-milovidov).
- ๐ Fix higher-order array functions (
SIGSEGV
forarrayCompact
/ILLEGAL_COLUMN
forarrayDifference
/arrayCumSumNonNegative
) with consts. #28904 (Azat Khuzhin). - ๐ Fix waiting for mutation with
mutations_sync=2
. #28889 (Azat Khuzhin). - ๐ Fix queries to external databases (i.e. MySQL) with multiple columns in IN ( i.e.
(k,v) IN ((1, 2))
). #28888 (Azat Khuzhin). - ๐ Fix bug with
LowCardinality
in short-curcuit function evaluation. Closes #28884. #28887 (Kruglov Pavel). - ๐ Fix reading of subcolumns from compact parts. #28873 (Anton Popov).
- ๐ Fixed a race condition between
DROP PART
andREPLACE/MOVE PARTITION
that might cause replicas to diverge in rare cases. #28864 (tavplubix). - ๐ Fix expressions compilation with short circuit evaluation. #28821 (Azat Khuzhin).
- ๐ Fix extremely rare case when ReplicatedMergeTree replicas can diverge after hard reboot of all replicas. The error looks like
Part ... intersects (previous|next) part ...
. #28817 (alesapin). - ๐ Better check for connection usability and also catch any exception in
RabbitMQ
shutdown just in case. #28797 (Kseniia Sumarokova). - ๐ Fix benign race condition in ReplicatedMergeTreeQueue. Shouldn't be visible for user, but can lead to subtle bugs. #28734 (alesapin).
- ๐ Fix possible crash for
SELECT
with partially created aggregate projection in case of exception. #28700 (Amos Bird). - ๐ Fix the coredump in the creation of distributed tables, when the parameters passed in are wrong. #28686 (Zhiyong Wang).
- โ Add Settings.Names, Settings.Values aliases for system.processes table. #28685 (Vitaly).
- ๐ Support for S2 Geometry library: Fix the number of arguments required by
s2RectAdd
ands2RectContains
functions. #28663 (Bharat Nallan). - ๐ Fix invalid constant type conversion when Nullable or LowCardinality primary key is used. #28636 (Amos Bird).
- ๐ Fix "Column is not under aggregate function and not in GROUP BY" with PREWHERE (Fixes: #28461). #28502 (Azat Khuzhin).
-
v21.10 Changes
October 14, 2021Backward Incompatible Change
- Now the following MergeTree table-level settings:
replicated_max_parallel_sends
,replicated_max_parallel_sends_for_table
,replicated_max_parallel_fetches
,replicated_max_parallel_fetches_for_table
do nothing. They never worked well and were replaced withmax_replicated_fetches_network_bandwidth
,max_replicated_sends_network_bandwidth
andbackground_fetches_pool_size
. #28404 (alesapin).
๐ New Feature
- Add feature for creating user-defined functions (UDF) as lambda expressions. Syntax
CREATE FUNCTION {function_name} as ({parameters}) -> {function core}
. ExampleCREATE FUNCTION plus_one as (a) -> a + 1
. Authors @Realist007. #27796 (Maksim Kita) #23978 (Realist007). - โ Added
Executable
storage engine andexecutable
table function. It enables data processing with external scripts in streaming fashion. #28102 (Maksim Kita) (ruct). - โ Added
ExecutablePool
storage engine. Similar toExecutable
but it's using a pool of long running processes. #28518 (Maksim Kita). - โ Add
ALTER TABLE ... MATERIALIZE COLUMN
query. #27038 (Vladimir Chebotarev). - ๐ Support for partitioned write into
s3
table function. #23051 (Vladimir Chebotarev). - ๐ Support
lz4
compression format (in addition togz
,bz2
,xz
,zstd
) for data import / export. #25310 (Bharat Nallan). - Allow positional arguments under setting
enable_positional_arguments
. Closes #2592. #27530 (Kseniia Sumarokova). - Accept user settings related to file formats in
SETTINGS
clause inCREATE
query for s3 tables. This closes #27580. #28037 (Nikita Mikhaylov). - ๐ Allow SSL connection for
RabbitMQ
engine. #28365 (Kseniia Sumarokova). - โ Add
getServerPort
function to allow getting server port. When the port is not used by the server, throw an exception. #27900 (Amos Bird). - โ Add conversion functions between "snowflake id" and
DateTime
,DateTime64
. See #27058. #27704 (jasine). - โ Add function
SHA512
. #27830 (zhanglistar). - Add
log_queries_probability
setting that allows user to write to query_log only a sample of queries. Closes #16609. #27527 (Nikolay Degterinsky).
Experimental Feature
- ๐
web
type of disks to store readonly tables on web server in form of static files. See #23982. #25251 (Kseniia Sumarokova). This is mostly needed to faciliate testing of operation on shared storage and for easy importing of datasets. Not recommended to use before release 21.11. - โ Added new commands
BACKUP
andRESTORE
. #21945 (Vitaly Baranov). This is under development and not intended to be used in current version.
๐ Performance Improvement
- Speed up
sumIf
andcountIf
aggregation functions. #28272 (Raรบl Marรญn). - Create virtual projection for
minmax
indices. Now, whenallow_experimental_projection_optimization
is enabled, queries will use minmax index instead of reading the data when possible. #26286 (Amos Bird). - Introducing two checks in
sequenceMatch
andsequenceCount
that allow for early exit when some deterministic part of the sequence pattern is missing from the events list. This change unlocks many queries that would previously fail due to reaching operations cap, and generally speeds up the pipeline. #27729 (Jakub Kuklis). - โจ Enhance primary key analysis with always monotonic information of binary functions, notably non-zero constant division. #28302 (Amos Bird).
- ๐ Make
hasAll
filter condition leverage bloom filter data-skipping indexes. #27984 (Braulio Valdivielso Martรญnez). - Speed up data parts loading by delaying table startup process. #28313 (Amos Bird).
- ๐ Fixed possible excessive number of conditions moved from
WHERE
toPREWHERE
(optimization controlled by settingsoptimize_move_to_prewhere
). #28139 (lthaooo). - Enable
optimize_distributed_group_by_sharding_key
by default. #28105 (Azat Khuzhin).
๐ Improvement
- ๐ Check cluster name before creating
Distributed
table, do not allow to create a table with incorrect cluster name. Fixes #27832. #27927 (tavplubix). - โ Add aggregate function
quantileBFloat16Weighted
similarly to other quantile...Weighted functions. This closes #27745. #27758 (Ivan Novitskiy). - ๐ Allow to create dictionaries with empty attributes list. #27905 (Maksim Kita).
- โ Add interactive documentation in
clickhouse-client
about how to reset the password. This is useful in scenario when user has installed ClickHouse, set up the password and instantly forget it. See #27750. #27903 (alexey-milovidov). - ๐ Support the case when the data is enclosed in array in
JSONAsString
input format. Closes #25517. #25633 (Kruglov Pavel). - Add new column
last_queue_update_exception
tosystem.replicas
table. #26843 (nvartolomei). - ๐ Support reconnections on failover for
MaterializedPostgreSQL
tables. Closes #28529. #28614 (Kseniia Sumarokova). - Generate a unique server UUID on first server start. #20089 (Bharat Nallan).
- Introduce
connection_wait_timeout
(default to 5 seconds, 0 - do not wait) setting forMySQL
engine. #28474 (Azat Khuzhin). - Do not allow creating
MaterializedPostgreSQL
with bad arguments. Closes #28423. #28430 (Kseniia Sumarokova). - ๐ Use real tmp file instead of predefined "rows_sources" for vertical merges. This avoids generating garbage directories in tmp disks. #28299 (Amos Bird).
- Added
libhdfs3_conf
in server config instead of export envLIBHDFS3_CONF
in clickhouse-server.service. This is for configuration of interaction with HDFS. #28268 (Zhichang Yu). - ๐ Fix removing of parts in a Temporary state which can lead to an unexpected exception (
Part %name% doesn't exist
). Fixes #23661. #28221 #28221) (Azat Khuzhin). - ๐ฒ Fix
zookeeper_log.address
(before the first patch in this PR the address was always::
) and reduce number of callsgetpeername(2)
for this column (since each time entry forzookeeper_log
is addedgetpeername()
is called, cache this address in the zookeeper client to avoid this). #28212 (Azat Khuzhin). - ๐ Support implicit conversions between index in operator
[]
and key of typeMap
(e.g. differentInt
types,String
andFixedString
). #28096 (Anton Popov). - ๐ Support
ON CONFLICT
clause when inserting into PostgreSQL table engine or table function. Closes #27727. #28081 (Kseniia Sumarokova). - Lower restrictions for
Enum
data type to allow attaching compatible data. Closes #26672. #28028 (Dmitry Novik). - Add a setting
empty_result_for_aggregation_by_constant_keys_on_empty_set
to control the behavior of grouping by constant keys on empty set. This is to bring back the old baviour of #6842. #27932 (Amos Bird). - Added
replication_wait_for_inactive_replica_timeout
setting. It allows to specify how long to wait for inactive replicas to executeALTER
/OPTIMZE
/TRUNCATE
query (default is 120 seconds). Ifreplication_alter_partitions_sync
is 2 and some replicas are not active for more thanreplication_wait_for_inactive_replica_timeout
seconds, thenUNFINISHED
will be thrown. #27931 (tavplubix). - ๐ Support lambda argument for
APPLY
column transformer which allows applying functions with more than one argument. This is for #27877. #27901 (Amos Bird). - Enable
tcp_keep_alive_timeout
by default. #27882 (Azat Khuzhin). - ๐ Improve remote query cancelation (in case of remote server abnormaly terminated). #27881 (Azat Khuzhin).
- ๐ Use Multipart copy upload for large S3 objects. #27858 (ianton-ru).
- ๐ Allow symlink traversal for library dictionaty path. #27815 (Kseniia Sumarokova).
- Now
ALTER MODIFY COLUM
T
toNullable(T)
doesn't require mutation. #27787 (victorgao). - Don't silently ignore errors and don't count delays in
ReadBufferFromS3
. #27484 (Vladimir Chebotarev). - ๐ Improve
ALTER ... MATERIALIZE TTL
by recalculating metadata only without actual TTL action. #27019 (lthaooo). - ๐ Allow reading the list of custom top level domains without a new line at EOF. #28213 (Azat Khuzhin).
๐ Bug Fix
- ๐ Fix cases, when reading compressed data from
carbon-clickhouse
fails with 'attempt to read after end of file'. Closes #26149. #28150 (FArthur-cmd). - ๐ Fix checking access grants when executing
GRANT WITH REPLACE
statement withON CLUSTER
clause. This PR improves fix #27001. #27983 (Vitaly Baranov). - ๐ Allow selecting with
extremes = 1
from a column of the typeLowCardinality(UUID)
. #27918 (Vitaly Baranov). - ๐ Fix PostgreSQL-style cast (
::
operator) with negative numbers. #27876 (Anton Popov). - After #26864. Fix shutdown of
NamedSessionStorage
: session contexts stored inNamedSessionStorage
are now destroyed before destroying the global context. #27875 (Vitaly Baranov). - ๐ Bugfix for
windowFunnel
"strict" mode. This fixes #27469. #27563 (achimbab). - ๐ Fix infinite loop while reading truncated
bzip2
archive. #28543 (Azat Khuzhin). - ๐ Fix UUID overlap in
DROP TABLE
for internal DDL fromMaterializedMySQL
. MaterializedMySQL is an experimental feature. #28533 (Azat Khuzhin). - ๐ Fix
There is no subcolumn
error, while select from tables, which haveNested
columns and scalar columns with dot in name and the same prefix asNested
(e.g.n.id UInt32, n.arr1 Array(UInt64), n.arr2 Array(UInt64)
). #28531 (Anton Popov). - ๐ Fix bug which can lead to error
Existing table metadata in ZooKeeper differs in sorting key expression.
after ALTER ofReplicatedVersionedCollapsingMergeTree
. Fixes #28515. #28528 (alesapin). - ๐ Fixed possible ZooKeeper watches leak (minor issue) on background processing of distributed DDL queue. Closes #26036. #28446 (tavplubix).
- ๐ Fix missing quoting of table names in
MaterializedPostgreSQL
engine. Closes #28316. #28433 (Kseniia Sumarokova). - ๐ Fix the wrong behaviour of non joined rows from nullable column. Close #27691. #28349 (vdimir).
- ๐ Fix NOT-IN index optimization when not all key columns are used. This fixes #28120. #28315 (Amos Bird).
- ๐ Fix intersecting parts due to new part had been replaced with an empty part. #28310 (Azat Khuzhin).
- Fix inconsistent result in queries with
ORDER BY
andMerge
tables with enabled settingoptimize_read_in_order
. #28266 (Anton Popov). - ๐ Fix possible read of uninitialized memory for queries with
Nullable(LowCardinality)
type and the settingextremes
set to 1. Fixes #28165. #28205 (Nikolai Kochetov). - ๐ Multiple small fixes for projections. See detailed description in the PR. #28178 (Amos Bird).
- ๐ Fix extremely rare segfaults on shutdown due to incorrect order of context/config reloader shutdown. #28088 (nvartolomei).
- ๐ Fix handling null value with type of
Nullable(String)
in functionJSONExtract
. This fixes #27929 and #27930. This was introduced in https://github.com/ClickHouse/ClickHouse/pull/25452 . #27939 (Amos Bird). - ๐ Multiple fixes for the new
clickhouse-keeper
tool. Fix a rare bug inclickhouse-keeper
when the client can receive a watch response before request-response. #28197 (alesapin). Fix incorrect behavior inclickhouse-keeper
when list watches (getChildren
) triggered withset
requests for children. #28190 (alesapin). Fix rare case when changes ofclickhouse-keeper
settings may lead to lost logs and server hung. #28360 (alesapin). Fix bug inclickhouse-keeper
which can lead to endless logs whenrotate_logs_interval
decreased. #28152 (alesapin).
๐ Build/Testing/Packaging Improvement
- โฑ Enable Thread Fuzzer in Stress Test. Thread Fuzzer is ClickHouse feature that allows to test more permutations of thread scheduling and discover more potential issues. This closes #9813. This closes #9814. This closes #9515. This closes #9516. #27538 (alexey-milovidov).
- โ Add new log level
test
for testing environments. It is even more verbose than the defaulttrace
. #28559 (alesapin). - ๐ง Print out git status information at CMake configure stage. #28047 (Braulio Valdivielso Martรญnez).
- 0๏ธโฃ Temporarily switched ubuntu apt repository to mirror ru.archive.ubuntu.com as the default one (archive.ubuntu.com) is not responding from our CI. #28016 (Ilya Yatsishin).
- Now the following MergeTree table-level settings:
-
v21.9 Changes
September 09, 2021Backward Incompatible Change
- Do not output trailing zeros in text representation of
Decimal
types. Example:1.23
will be printed instead of1.230000
for decimal with scale 6. This closes #15794. It may introduce slight incompatibility if your applications somehow relied on the trailing zeros. Serialization in output formats can be controlled with the settingoutput_format_decimal_trailing_zeros
. Implementation oftoString
and casting to String is changed unconditionally. #27680 (alexey-milovidov). - ๐ Do not allow to apply parametric aggregate function with
-Merge
combinator to aggregate function state if state was produced by aggregate function with different parameters. For example, state offooState(42)(x)
cannot be finalized withfooMerge(s)
orfooMerge(123)(s)
, parameters must be specified explicitly likefooMerge(42)(s)
and must be equal. It does not affect some special aggregate functions likequantile
andsequence*
that use parameters for finalization only. #26847 (tavplubix). - Under clickhouse-local, always treat local addresses with a port as remote. #26736 (Raรบl Marรญn).
- โก๏ธ Fix the issue that in case of some sophisticated query with column aliases identical to the names of expressions, bad cast may happen. This fixes #25447. This fixes #26914. This fix may introduce backward incompatibility: if there are different expressions with identical names, exception will be thrown. It may break some rare cases when
enable_optimize_predicate_expression
is set. #26639 (alexey-milovidov). - ๐ Now, scalar subquery always returns
Nullable
result if it's type can beNullable
. It is needed because in case of empty subquery it's result should beNull
. Previously, it was possible to get error about incompatible types (type deduction does not execute scalar subquery, and it could use not-nullable type). Scalar subquery with empty result which can't be converted toNullable
(likeArray
orTuple
) now throws error. Fixes #25411. #26423 (Nikolai Kochetov). - Introduce syntax for here documents. Example
SELECT $doc$ VALUE $doc$
. #26671 (Maksim Kita). This change is backward incompatible if in query there are identifiers that contain$
#28768. - โก๏ธ Now indices can handle Nullable types, including
isNull
andisNotNull
. #12433 and #12455 (Amos Bird) and #27250 (Azat Khuzhin). But this was done with on-disk format changes, and even though new server can read old data, old server cannot. Also, in case you haveMINMAX
data skipping indices, you may getData after mutation/merge is not byte-identical
error, since new index will have.idx2
extension while before it was.idx
. That said, that you should not delay updating all existing replicas, in this case, otherwise, if old replica (<21.9) will download data from new replica with 21.9+ it will not be able to apply index for downloaded part.
๐ New Feature
- Implementation of short circuit function evaluation, closes #12587. Add settings
short_circuit_function_evaluation
to configure short circuit function evaluation. #23367 (Kruglov Pavel). - โ Add support for INTERSECT, EXCEPT, ANY, ALL operators. #24757 (Kirill Ershov). (Kseniia Sumarokova).
- โ Add support for encryption at the virtual file system level (data encryption at rest) using AES-CTR algorithm. #24206 (Latysheva Alexandra). (Vitaly Baranov) #26733 #26377 #26465.
- โ Added natural language processing (NLP) functions for tokenization, stemming, lemmatizing and search in synonyms extensions. #24997 (Nikolay Degterinsky).
- โ Added integration with S2 geometry library. #24980 (Andr0901). (Nikita Mikhaylov).
- โ Add SQLite table engine, table function, database engine. #24194 (Arslan Gumerov). (Kseniia Sumarokova).
- โ Added support for custom query for
MySQL
,PostgreSQL
,ClickHouse
,JDBC
,Cassandra
dictionary source. Closes #1270. #26995 (Maksim Kita). - โ Add shared (replicated) storage of user, roles, row policies, quotas and settings profiles through ZooKeeper. #27426 (Kevin Michel).
- โ Add compression for
INTO OUTFILE
that automatically choose compression algorithm. Closes #3473. #27134 (Filatenkov Artur). - โ Add
INSERT ... FROM INFILE
similarly toSELECT ... INTO OUTFILE
. #27655 (Filatenkov Artur). - Added
complex_key_range_hashed
dictionary. Closes #22029. #27629 (Maksim Kita). - ๐ Support expressions in JOIN ON section. Close #21868. #24420 (Vladimir C).
- ๐ง When client connects to server, it receives information about all warnings that are already were collected by server. (It can be disabled by using option
--no-warnings
). Addsystem.warnings
table to collect warnings about server configuration. #26246 (Filatenkov Artur). #26282 (Filatenkov Artur). - ๐ Allow using constant expressions from with and select in aggregate function parameters. Close #10945. #27531 (abel-cheng).
- โ Add
tupleToNameValuePairs
, a function that turns a named tuple into an array of pairs. #27505 (Braulio Valdivielso Martรญnez). - โ Add support for
bzip2
compression method for import/export. Closes #22428. #27377 (Nikolay Degterinsky). - Added
bitmapSubsetOffsetLimit(bitmap, offset, cardinality_limit)
function. It creates a subset of bitmap limit the results tocardinality_limit
with offset ofoffset
. #27234 (DHBin). - โ Add column
default_database
tosystem.users
. #27054 (kevin wan). - ๐ Supported
cluster
macros inside table functions 'cluster' and 'clusterAllReplicas'. #26913 (polyprogrammist). - โ Add new functions
currentRoles()
,enabledRoles()
,defaultRoles()
. #26780 (Vitaly Baranov). - ๐ New functions
currentProfiles()
,enabledProfiles()
,defaultProfiles()
. #26714 (Vitaly Baranov). - Add functions that return (initial_)query_id of the current query. This closes #23682. #26410 (Alexey Boykov).
- โ Add
REPLACE GRANT
feature. #26384 (Caspian). - ๐
EXPLAIN
query now hasEXPLAIN ESTIMATE ...
mode that will show information about read rows, marks and parts from MergeTree tables. Closes #23941. #26131 (fastio). - โ Added
system.zookeeper_log
table. All actions of ZooKeeper client are logged into this table. Implements #25449. #26129 (tavplubix). - Zero-copy replication for
ReplicatedMergeTree
overHDFS
storage. #25918 (Zhichang Yu). - ๐ Allow to insert Nested type as array of structs in
Arrow
,ORC
andParquet
input format. #25902 (Kruglov Pavel). - โ Add a new datatype
Date32
(store data as Int32), support date range same withDateTime64
support load parquet date32 to ClickHouseDate32
Add new functiontoDate32
liketoDate
. #25774 (LiuNeng). - ๐ Allow setting default database for users. #25268. #25687 (kevin wan).
- โ Add an optional parameter to
MongoDB
engine to accept connection string options and support SSL connection. Closes #21189. Closes #21041. #22045 (Omar Bazaraa).
Experimental Feature
- Added a compression codec
AES_128_GCM_SIV
which encrypts columns instead of compressing them. #19896 (PHO). Will be rewritten, do not use. - ๐ Rename
MaterializeMySQL
toMaterializedMySQL
. #26822 (tavplubix).
๐ Performance Improvement
- Improve the performance of fast queries when
max_execution_time = 0
by reducing the number ofclock_gettime
system calls. #27325 (filimonov). - ๐ Specialize date time related comparison to achieve better performance. This fixes #27083 . #27122 (Amos Bird).
- ๐ Share file descriptors in concurrent reads of the same files. There is no noticeable performance difference on Linux. But the number of opened files will be significantly (10..100 times) lower on typical servers and it makes operations easier. See #26214. #26768 (alexey-milovidov).
- ๐ Improve latency of short queries, that require reading from tables with large number of columns. #26371 (Anton Popov).
- ๐ Don't build sets for indices when analyzing a query. #26365 (Raรบl Marรญn).
- Vectorize the SUM of Nullable integer types with native representation (David Manzanares, Raรบl Marรญn). #26248 (Raรบl Marรญn).
- Compile expressions involving columns with
Enum
types. #26237 (Maksim Kita). - Compile aggregate functions
groupBitOr
,groupBitAnd
,groupBitXor
. #26161 (Maksim Kita). - ๐ Improved memory usage with better block size prediction when reading empty DEFAULT columns. Closes #17317. #25917 (Vladimir Chebotarev).
- โฌ๏ธ Reduce memory usage and number of read rows in queries with
ORDER BY primary_key
. #25721 (Anton Popov). - Enable
distributed_push_down_limit
by default. #27104 (Azat Khuzhin). - ๐ Make
toTimeZone
monotonicity when timeZone is a constant value to support partition puring when use sql like:. #26261 (huangzhaowei).
๐ Improvement
- Mark window functions as ready for general use. Remove the
allow_experimental_window_functions
setting. #27184 (Alexander Kuzmenkov). - ๐ Improve compatibility with non-whole-minute timezone offsets. #27080 (Raรบl Marรญn).
- If file descriptor in
File
table is regular file - allow to read multiple times from it. It allowsclickhouse-local
to read multiple times from stdin (with multiple SELECT queries or subqueries) if stdin is a regular file likeclickhouse-local --query "SELECT * FROM table UNION ALL SELECT * FROM table" ... < file
. This closes #11124. Co-authored with (alexey-milovidov). #25960 (BoloniniD). - โ Remove duplicate index analysis and avoid possible invalid limit checks during projection analysis. #27742 (Amos Bird).
- Enable query parameters to be passed in the body of HTTP requests. #27706 (Hermano Lustosa).
- Disallow
arrayJoin
on partition expressions. #27648 (Raรบl Marรญn). - ๐ฒ Log client IP address if authentication fails. #27514 (Misko Lee).
- ๐ Use bytes instead of strings for binary data in the GRPC protocol. #27431 (Vitaly Baranov).
- Send response with error message if HTTP port is not set and user tries to send HTTP request to TCP port. #27385 (Braulio Valdivielso Martรญnez).
- Add
_CAST
function for internal usage, which will not preserve type nullability, but non-internal cast will preserve according to settingcast_keep_nullable
. Closes #12636. #27382 (Kseniia Sumarokova). - Add setting
log_formatted_queries
to log additional formatted query intosystem.query_log
. It's useful for normalized query analysis because functions likenormalizeQuery
andnormalizeQueryKeepNames
don't parse/format queries in order to achieve better performance. #27380 (Amos Bird). - Add two settings
max_hyperscan_regexp_length
andmax_hyperscan_regexp_total_length
to prevent huge regexp being used in hyperscan related functions, such asmultiMatchAny
. #27378 (Amos Bird). - Memory consumed by bitmap aggregate functions now is taken into account for memory limits. This closes #26555. #27252 (alexey-milovidov).
- โ Add 10 seconds cache for S3 proxy resolver. #27216 (ianton-ru).
- Split global mutex into individual regexp construction. This helps avoid huge regexp construction blocking other related threads. #27211 (Amos Bird).
- ๐ Support schema for PostgreSQL database engine. Closes #27166. #27198 (Kseniia Sumarokova).
- Track memory usage in clickhouse-client. #27191 (Filatenkov Artur).
- Try recording
query_kind
insystem.query_log
even when query fails to start. #27182 (Amos Bird). - Added columns
replica_is_active
that maps replica name to is replica active status to tablesystem.replicas
. Closes #27138. #27180 (Maksim Kita). - ๐ Allow to pass query settings via server URI in Web UI. #27177 (kolsys).
- โ Add a new metric called
MaxPushedDDLEntryID
which is the maximum ddl entry id that current node push to zookeeper. #27174 (Fuwang Hu). - ๐ Improved the existence condition judgment and empty string node judgment when
clickhouse-keeper
creates znode. #27125 (ๅฐ่ทฏ). - ๐ Merge JOIN correctly handles empty set in the right. #27078 (Vladimir C).
- ๐ Now functions can be shard-level constants, which means if it's executed in the context of some distributed table, it generates a normal column, otherwise it produces a constant value. Notable functions are:
hostName()
,tcpPort()
,version()
,buildId()
,uptime()
, etc. #27020 (Amos Bird). - โก๏ธ Updated
extractAllGroupsHorizontal
- upper limit on the number of matches per row can be set via optional third argument. #26961 (Vasily Nemkov). - ๐ฆ Expose
RocksDB
statistics via system.rocksdb table. Read rocksdb options from ClickHouse config (rocksdb...
keys). NOTE: ClickHouse does not rely on RocksDB, it is just one of the additional integration storage engines. #26821 (Azat Khuzhin). - ๐ Less verbose internal RocksDB logs. NOTE: ClickHouse does not rely on RocksDB, it is just one of the additional integration storage engines. This closes #26252. #26789 (alexey-milovidov).
- 0๏ธโฃ Changing default roles affects new sessions only. #26759 (Vitaly Baranov).
- ๐ณ Watchdog is disabled in docker by default. Fix for not handling ctrl+c. #26757 (Mikhail f. Shiryaev).
SET PROFILE
now applies constraints too if they're set for a passed profile. #26730 (Vitaly Baranov).- ๐ Improve handling of
KILL QUERY
requests. #26675 (Raรบl Marรญn). - ๐
mapPopulatesSeries
function supportsMap
type. #26663 (Ildus Kurbangaliev). - Fix excessive (x2) connect attempts with
skip_unavailable_shards
. #26658 (Azat Khuzhin). - Avoid hanging
clickhouse-benchmark
if connection fails (i.e. on EMFILE). #26656 (Azat Khuzhin). - ๐ Allow more threads to be used by the Kafka engine. #26642 (feihengye).
- โ Add round-robin support for
clickhouse-benchmark
(it does not differ from the regular multi host/port run except for statistics report). #26607 (Azat Khuzhin). - Executable dictionaries (
executable
,executable_pool
) enable creation with DDL query usingclickhouse-local
. Closes #22355. #26510 (Maksim Kita). - Set client query kind for
mysql
andpostgresql
compatibility protocol handlers. #26498 (anneji-dev). - Apply
LIMIT
on the shards for queries likeSELECT * FROM dist ORDER BY key LIMIT 10
w/distributed_push_down_limit=1
. Avoid runningDistinct
/LIMIT BY
steps for queries likeSELECT DISTINCT shading_key FROM dist ORDER BY key
. Nowdistributed_push_down_limit
is respected byoptimize_distributed_group_by_sharding_key
optimization. #26466 (Azat Khuzhin). - ๐ Updated protobuf to 3.17.3. Changelogs are available on https://github.com/protocolbuffers/protobuf/releases. #26424 (Ilya Yatsishin).
- Enable
use_hedged_requests
setting that allows to mitigate tail latencies on large clusters. #26380 (alexey-milovidov). - ๐ Improve behaviour with non-existing host in user allowed host list. #26368 (ianton-ru).
- Add ability to set
Distributed
directory monitor settings via CREATE TABLE (i.e.CREATE TABLE dist (key Int) Engine=Distributed(cluster, db, table) SETTINGS monitor_batch_inserts=1
and similar). #26336 (Azat Khuzhin). - ๐พ Save server address in history URLs in web UI if it differs from the origin of web UI. This closes #26044. #26322 (alexey-milovidov).
- โ Add events to profile calls to
sleep
/sleepEachRow
. #26320 (Raรบl Marรญn). - ๐ Allow to reuse connections of shards among different clusters. It also avoids creating new connections when using
cluster
table function. #26318 (Amos Bird). - 0๏ธโฃ Control the execution period of clear old temporary directories by parameter with default value. #26212. #26313 (fastio).
- Add a setting
function_range_max_elements_in_block
to tune the safety threshold for data volume generated by functionrange
. This closes #26303. #26305 (alexey-milovidov). - ๐ Check hash function at table creation, not at sampling. Add settings for MergeTree, if someone create a table with incorrect sampling column but sampling never be used, disable this settings for starting the server without exception. #26256 (zhaoyu).
- Added
output_format_avro_string_column_pattern
setting to put specified String columns to Avro as string instead of default bytes. Implements #22414. #26245 (Ilya Golshtein). - โ Add information about column sizes in
system.columns
table forLog
andTinyLog
tables. This closes #9001. #26241 (Nikolay Degterinsky). - ๐ง Don't throw exception when querying
system.detached_parts
table if there is custom disk configuration anddetached
directory does not exist on some disks. This closes #26078. #26236 (alexey-milovidov). - Check for non-deterministic functions in keys, including constant expressions like
now()
,today()
. This closes #25875. This closes #11333. #26235 (alexey-milovidov). - convert timestamp and timestamptz data types to
DateTime64
in PostgreSQL table engine. #26234 (jasine). - ๐ Apply aggressive IN index analysis for projections so that better projection candidate can be selected. #26218 (Amos Bird).
- โ Remove GLOBAL keyword for IN when scalar function is passed. In previous versions, if user specified
GLOBAL IN f(x)
exception was thrown. #26217 (Amos Bird). - โ Add error id (like
BAD_ARGUMENTS
) to exception messages. This closes #25862. #26172 (alexey-milovidov). - ๐ Fix incorrect output with --progress option for clickhouse-local. Progress bar will be cleared once it gets to 100% - same as it is done for clickhouse-client. Closes #17484. #26128 (Kseniia Sumarokova).
- Add
merge_selecting_sleep_ms
setting. #26120 (lthaooo). - Remove complicated usage of Linux AIO with one block readahead and replace it with plain simple synchronous IO with O_DIRECT. In previous versions, the setting
min_bytes_to_use_direct_io
may not work correctly ifmax_threads
is greater than one. Reading with direct IO (that is disabled by default for queries and enabled by default for large merges) will work in less efficient way. This closes #25997. #26003 (alexey-milovidov). - Flush
Distributed
table onREPLACE TABLE
query. Resolves #24566 - Do not replace (or create) table on[CREATE OR] REPLACE TABLE ... AS SELECT
query if insertion into new table fails. Resolves #23175. #25895 (tavplubix). - ๐ฒ Add
views
column to system.query_log containing the names of the (materialized or live) views executed by the query. Adds a new log table (system.query_views_log
) that contains information about each view executed during a query. Modifies view execution: When an exception is thrown while executing a view, any view that has already startedwill continue running until it finishes. This used to be the behaviour under parallel_view_processing=true and now it's always the same behaviour. - Dependent views now report reading progress to the context. #25714 (Raรบl Marรญn). - Do connection draining asynchonously upon finishing executing distributed queries. A new server setting is added
max_threads_for_connection_collector
which specifies the number of workers to recycle connections in background. If the pool is full, connection will be drained synchronously but a bit different than before: It's drained after we send EOS to client, query will succeed immediately after receiving enough data, and any exception will be logged instead of throwing to the client. Added settingdrain_timeout
(3 seconds by default). Connection draining will disconnect upon timeout. #25674 (Amos Bird). - Support for multiple includes in configuration. It is possible to include users configuration, remote servers configuration from multiple sources. Simply place
<include />
element withfrom_zk
,from_env
orincl
attribute and it will be replaced with the substitution. #24404 (nvartolomei). - Fix multiple block insertion into distributed table with
insert_distributed_one_random_shard = 1
. This is a marginal feature. Mark as improvement. #23140 (Amos Bird). - ๐ Support
LowCardinality
andFixedString
keys/values forMap
type. #21543 (hexiaoting). - Enable reloading of local disk config. #19526 (taiyang-li).
๐ Bug Fix
- ๐ Fix a couple of bugs that may cause replicas to diverge. #27808 (tavplubix).
- ๐ Fix a rare bug in
DROP PART
which can lead to the errorUnexpected merged part intersects drop range
. #27807 (alesapin). - Prevent crashes for some formats when NULL (tombstone) message was coming from Kafka. Closes #19255. #27794 (filimonov).
- ๐ Fix column filtering with union distinct in subquery. Closes #27578. #27689 (Kseniia Sumarokova).
- ๐ Fix bad type cast when functions like
arrayHas
are applied to arrays of LowCardinality of Nullable of different non-numeric types likeDateTime
andDateTime64
. In previous versions bad cast occurs. In new version it will lead to exception. This closes #26330. #27682 (alexey-milovidov). - ๐ Fix postgresql table function resulting in non-closing connections. Closes #26088. #27662 (Kseniia Sumarokova).
- ๐ Fixed another case of
Unexpected merged part ... intersecting drop range ...
error. #27656 (tavplubix). - ๐ Fix an error with aliased column in
Distributed
table. #27652 (Vladimir C). - After setting
max_memory_usage*
to non-zero value it was not possible to reset it back to 0 (unlimited). It's fixed. #27638 (tavplubix). - ๐ Fixed underflow of the time value when constructing it from components. Closes #27193. #27605 (Vasily Nemkov).
- ๐ Fix crash during projection materialization when some parts contain missing columns. This fixes #27512. #27528 (Amos Bird).
- ๐ fix metric
BackgroundMessageBrokerSchedulePoolTask
, maybe mistyped. #27452 (Ben). - ๐ Fix distributed queries with zero shards and aggregation. #27427 (Azat Khuzhin).
- Compatibility when
/proc/meminfo
does not contain KB suffix. #27361 (Mike Kot). - ๐ Fix incorrect result for query with row-level security, PREWHERE and LowCardinality filter. Fixes #27179. #27329 (Nikolai Kochetov).
- ๐ Fixed incorrect validation of partition id for MergeTree tables that created with old syntax. #27328 (tavplubix).
- ๐ Fix MySQL protocol when using parallel formats (CSV / TSV). #27326 (Raรบl Marรญn).
- ๐ Fix
Cannot find column
error for queries with sampling. Was introduced in #24574. Fixes #26522. #27301 (Nikolai Kochetov). - ๐ Fix errors like
Expected ColumnLowCardinality, gotUInt8
orBad cast from type DB::ColumnVector<char8_t> to DB::ColumnLowCardinality
for some queries withLowCardinality
inPREWHERE
. And more importantly, fix the lack of whitespace in the error message. Fixes #23515. #27298 (Nikolai Kochetov). - Fix
distributed_group_by_no_merge = 2
withdistributed_push_down_limit = 1
oroptimize_distributed_group_by_sharding_key = 1
withLIMIT BY
andLIMIT OFFSET
. #27249 (Azat Khuzhin). These are obscure combination of settings that no one is using. - ๐ Fix mutation stuck on invalid partitions in non-replicated MergeTree. #27248 (Azat Khuzhin).
- In case of ambiguity, lambda functions prefer its arguments to other aliases or identifiers. #27235 (Raรบl Marรญn).
- ๐ Fix column structure in merge join, close #27091. #27217 (Vladimir C).
- ๐ In rare cases
system.detached_parts
table might contain incorrect information for some parts, it's fixed. Fixes #27114. #27183 (tavplubix). - ๐ Fix uninitialized memory in functions
multiSearch*
with empty array, close #27169. #27181 (Vladimir C). - ๐ Fix synchronization in GRPCServer. This PR fixes #27024. #27064 (Vitaly Baranov).
- Fixed
cache
,complex_key_cache
,ssd_cache
,complex_key_ssd_cache
configuration parsing. Optionsallow_read_expired_keys
,max_update_queue_size
,update_queue_push_timeout_milliseconds
,query_wait_timeout_milliseconds
were not parsed for dictionaries with noncache
type. #27032 (Maksim Kita). - ๐ Fix possible mutation stack due to race with DROP_RANGE. #27002 (Azat Khuzhin).
- ๐ Now partition ID in queries like
ALTER TABLE ... PARTITION ID xxx
validates for correctness. Fixes #25718. #26963 (alesapin). - ๐ Fix "Unknown column name" error with multiple JOINs in some cases, close #26899. #26957 (Vladimir C).
- ๐ Fix reading of custom TLDs (stops processing with lower buffer or bigger file). #26948 (Azat Khuzhin).
- ๐ Fix error
Missing columns: 'xxx'
whenDEFAULT
column references other non materialized column withoutDEFAULT
expression. Fixes #26591. #26900 (alesapin). - ๐ Fix loading of dictionary keys in
library-bridge
forlibrary
dictionary source. #26834 (Kseniia Sumarokova). - ๐ Aggregate function parameters might be lost when applying some combinators causing exceptions like
Conversion from AggregateFunction(topKArray, Array(String)) to AggregateFunction(topKArray(10), Array(String)) is not supported
. It's fixed. Fixes #26196 and #26433. #26814 (tavplubix). - Add
event_time_microseconds
value forREMOVE_PART
insystem.part_log
. In previous versions is was not set. #26720 (Azat Khuzhin). - ๐ Do not remove data on ReplicatedMergeTree table shutdown to avoid creating data to metadata inconsistency. #26716 (nvartolomei).
- ๐ Sometimes
SET ROLE
could work incorrectly, this PR fixes that. #26707 (Vitaly Baranov). - ๐ Some fixes for parallel formatting (https://github.com/ClickHouse/ClickHouse/issues/26694). #26703 (Raรบl Marรญn).
- ๐ Fix potential nullptr dereference in window functions. This fixes #25276. #26668 (Alexander Kuzmenkov).
- ๐ Fix clickhouse-client history file conversion (when upgrading from the format of 3 years old version of clickhouse-client) if file is empty. #26589 (Azat Khuzhin).
- ๐ Fix incorrect function names of groupBitmapAnd/Or/Xor (can be displayed in some occasions). This fixes. #26557 (Amos Bird).
- โก๏ธ Update
chown
cmd check in clickhouse-server docker entrypoint. It fixes the bug that cluster pod restart failed (or timeout) on kubernetes. #26545 (Ky Li). - ๐ Fix crash in
RabbitMQ
shutdown in caseRabbitMQ
setup was not started. Closes #26504. #26529 (Kseniia Sumarokova). - ๐ Fix issues with
CREATE DICTIONARY
query if dictionary name or database name was quoted. Closes #26491. #26508 (Maksim Kita). - ๐ Fix broken column name resolution after rewriting column aliases. This fixes #26432. #26475 (Amos Bird).
- ๐ Fix some fuzzed msan crash. Fixes #22517. #26428 (Nikolai Kochetov).
- ๐ Fix infinite non joined block stream in
partial_merge_join
close #26325. #26374 (Vladimir C). - ๐ Fix possible crash when login as dropped user. This PR fixes #26073. #26363 (Vitaly Baranov).
- Fix
optimize_distributed_group_by_sharding_key
for multiple columns (leads to incorrect result w/optimize_skip_unused_shards=1
/allow_nondeterministic_optimize_skip_unused_shards=1
and multiple columns in sharding key expression). #26353 (Azat Khuzhin). - ๐ Fixed rare bug in lost replica recovery that may cause replicas to diverge. #26321 (tavplubix).
- ๐ Fix zstd decompression (for import/export in zstd framing format that is unrelated to tables data) in case there are escape sequences at the end of internal buffer. Closes #26013. #26314 (Kseniia Sumarokova).
- ๐ Fix logical error on join with totals, close #26017. #26250 (Vladimir C).
- Remove excessive newline in
thread_name
column insystem.stack_trace
table. This fixes #24124. #26210 (alexey-milovidov). - ๐ Fix potential crash if more than one
untuple
expression is used. #26179 (alexey-milovidov). - ๐ป Don't throw exception in
toString
for Nullable Enum if Enum does not have a value for zero, close #25806. #26123 (Vladimir C). - ๐ Fixed incorrect
sequence_id
in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes #21184. #26051 (tavplubix). - Fix for the case that
cutToFirstSignificantSubdomainCustom()
/cutToFirstSignificantSubdomainCustomWithWWW()
/firstSignificantSubdomainCustom()
returns incorrect type for consts, and henceoptimize_skip_unused_shards
does not work:. #26041 (Azat Khuzhin). - ๐ Fix possible mismatched header when using normal projection with prewhere. This fixes #26020. #26038 (Amos Bird).
- Fix sharding_key from column w/o function for remote() (before
select * from remote('127.1', system.one, dummy)
leads toUnknown column: dummy, there are only columns .
error). #25824 (Azat Khuzhin). - ๐ Fixed
Not found column ...
andMissing column ...
errors when selecting fromMaterializeMySQL
. Fixes #23708, #24830, #25794. #25822 (tavplubix). - Fix
optimize_skip_unused_shards_rewrite_in
for non-UInt64 types (may select incorrect shards eventually or throwCannot infer type of an empty tuple
orFunction tuple requires at least one argument
). #25798 (Azat Khuzhin).
๐ Build/Testing/Packaging Improvement
- โ Now we ran stateful and stateless tests in random timezones. Fixes #12439. Reading String as DateTime and writing DateTime as String in Protobuf format now respect timezone. Reading UInt16 as DateTime in Arrow and Parquet formats now treat it as Date and then converts to DateTime with respect to DateTime's timezone, because Date is serialized in Arrow and Parquet as UInt16. GraphiteMergeTree now respect time zone for rounding of times. Fixes #5098. Author: @alexey-milovidov. #15408 (alesapin).
- โ
clickhouse-test
supports SQL tests with Jinja2 templates. #26579 (Vladimir C). - โ Add support for build with
clang-13
. This closes #27705. #27714 (alexey-milovidov). #27777 (Sergei Semin) - โ Add CMake options to build with or without specific CPU instruction set. This is for #17469 and #27509. #27508 (alexey-milovidov).
- ๐ Fix linking of auxiliar programs when using dynamic libraries. #26958 (Raรบl Marรญn).
- โก๏ธ Update RocksDB to
2021-07-16
master. #26411 (alexey-milovidov).
- Do not output trailing zeros in text representation of
-
v21.8 Changes
August 12, 2021โฌ๏ธ Upgrade Notes
- ๐ฒ New version is using
Map
data type for system logs tables (system.query_log
,system.query_thread_log
,system.processes
,system.opentelemetry_span_log
). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes #18698. #23934, #25773 (hexiaoting, sundy-li, Maksim Kita). If you want to downgrade from version 21.8 to older versions, you will need to cleanup system tables with logs manually. Look at/var/lib/clickhouse/data/system/*_log
.
๐ New Features
- โ Add support for a part of SQL/JSON standard. #24148 (l1tsolaiki, Kseniia Sumarokova).
- Collect common system metrics (in
system.asynchronous_metrics
andsystem.asynchronous_metric_log
) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar toatop
in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close #9430. #24416 (alexey-milovidov, Yegor Levankov). - โ Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. #20470 (Kseniia Sumarokova).
- โ Add new functions
leftPad()
,rightPad()
,leftPadUTF8()
,rightPadUTF8()
. #26075 (Vitaly Baranov). - โ Add the
FIRST
keyword to theADD INDEX
command to be able to add the index at the beginning of the indices list. #25904 (xjewer). - Introduce
system.data_skipping_indices
table containing information about existing data skipping indices. Close #7659. #25693 (Dmitry Novik). - โ Add
bin
/unbin
functions. #25609 (zhaoyu). - ๐ Support
Map
andUInt128
,Int128
,UInt256
,Int256
types inmapAdd
andmapSubtract
functions. #25596 (Ildus Kurbangaliev). - ๐ Support
DISTINCT ON (columns)
expression, close #25404. #25589 (Zijie Lu). - โ Add an ability to reset a custom setting to default and remove it from the table's metadata. It allows rolling back the change without knowing the system/config's default. Closes #14449. #17769 (xjewer).
- ๐ป Render pipelines as graphs in Web UI if
EXPLAIN PIPELINE graph = 1
query is submitted. #26067 (alexey-milovidov).
๐ Performance Improvements
- Compile aggregate functions. Use option
compile_aggregate_expressions
to enable it. #24789 (Maksim Kita). - ๐ Improve latency of short queries that require reading from tables with many columns. #26371 (Anton Popov).
๐ Improvements
- ๐ฒ Use
Map
data type for system logs tables (system.query_log
,system.query_thread_log
,system.processes
,system.opentelemetry_span_log
). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes #18698. #23934, #25773 (hexiaoting, sundy-li, Maksim Kita). - For a dictionary with a complex key containing only one attribute, allow not wrapping the key expression in tuple for functions
dictGet
,dictHas
. #26130 (Maksim Kita). - Implement function
bin
/hex
fromAggregateFunction
states. #26094 (zhaoyu). - ๐ Support arguments of
UUID
type forempty
andnotEmpty
functions.UUID
is empty if it is all zeros (nil UUID). Closes #3446. #25974 (zhaoyu). - Add support for
SET SQL_SELECT_LIMIT
in MySQL protocol. Closes #17115. #25972 (Kseniia Sumarokova). - ๐ More instrumentation for network interaction: add counters for recv/send bytes; add gauges for recvs/sends. Added missing documentation. Close #5897. #25962 (alexey-milovidov).
- ๐ Add setting
optimize_move_to_prewhere_if_final
. If query hasFINAL
, the optimizationmove_to_prewhere
will be enabled only if bothoptimize_move_to_prewhere
andoptimize_move_to_prewhere_if_final
are enabled. Closes #8684. #25940 (Kseniia Sumarokova). - ๐ Allow complex quoted identifiers of JOINed tables. Close #17861. #25924 (alexey-milovidov).
- โ Add support for Unicode (e.g. Chinese, Cyrillic) components in
Nested
data types. Close #25594. #25923 (alexey-milovidov). - Allow
quantiles*
functions to work withaggregate_functions_null_for_empty
. Close #25892. #25919 (alexey-milovidov). - ๐ Allow parameters for parametric aggregate functions to be arbitrary constant expressions (e.g.,
1 + 2
), not just literals. It also allows using the query parameters (in parameterized queries like{param:UInt8}
) inside parametric aggregate functions. Closes #11607. #25910 (alexey-milovidov). - ๐ Correctly throw the exception on the attempt to parse an invalid
Date
. Closes #6481. #25909 (alexey-milovidov). - Support for multiple includes in configuration. It is possible to include users configuration, remote server configuration from multiple sources. Simply place
<include />
element withfrom_zk
,from_env
orincl
attribute, and it will be replaced with the substitution. #24404 (nvartolomei). - ๐ Support for queries with a column named
"null"
(it must be specified in back-ticks or double quotes) andON CLUSTER
. Closes #24035. #25907 (alexey-milovidov). - ๐ Support
LowCardinality
,Decimal
, andUUID
forJSONExtract
. Closes #24606. #25900 (Kseniia Sumarokova). - Convert history file from
readline
format toreplxx
format. #25888 (Azat Khuzhin). - ๐ Fix an issue which can lead to intersecting parts after
DROP PART
or background deletion of an empty part. #25884 (alesapin). - ๐ Better handling of lost parts for
ReplicatedMergeTree
tables. Fixes rare inconsistencies inReplicationQueue
. Fixes #10368. #25820 (alesapin). - ๐ Allow starting clickhouse-client with unreadable working directory. #25817 (ianton-ru).
- ๐ Fix "No available columns" error for
Merge
storage. #25801 (Azat Khuzhin). - ๐ MySQL Engine now supports the exchange of column comments between MySQL and ClickHouse. #25795 (Storozhuk Kostiantyn).
- ๐ Fix inconsistent behaviour of
GROUP BY
constant on empty set. Closes #6842. #25786 (Kseniia Sumarokova). - ๐ Cancel already running merges in partition on
DROP PARTITION
andTRUNCATE
forReplicatedMergeTree
. Resolves #17151. #25684 (tavplubix). - ๐ Support ENUM` data type for MaterializeMySQL. #25676 (Storozhuk Kostiantyn).
- ๐ Support materialized and aliased columns in JOIN, close #13274. #25634 (Vladimir C).
- ๐ Fix possible logical race condition between
ALTER TABLE ... DETACH
and background merges. #25605 (Azat Khuzhin). - ๐ Make
NetworkReceiveElapsedMicroseconds
metric to correctly include the time spent waiting for data from the client toINSERT
. Close #9958. #25602 (alexey-milovidov). - ๐ Support
TRUNCATE TABLE
for S3 and HDFS. Close #25530. #25550 (Kseniia Sumarokova). - ๐ Support for dynamic reloading of config to change number of threads in pool for background jobs execution (merges, mutations, fetches). #25548 (Nikita Mikhaylov).
- ๐ Allow extracting of non-string element as string using
JSONExtract
. This is for #25414. #25452 (Amos Bird). - ๐ Support regular expression in
Database
argument forStorageMerge
. Close #776. #25064 (flynn). - ๐ป Web UI: if the value looks like a URL, automatically generate a link. #25965 (alexey-milovidov).
- ๐ Make
sudo service clickhouse-server start
to work on systems withsystemd
like Centos 8. Close #14298. Close #17799. #25921 (alexey-milovidov).
๐ Bug Fixes
- ๐ Fix incorrect
SET ROLE
in some cases. #26707 (Vitaly Baranov). - ๐ Fix potential
nullptr
dereference in window functions. Fix #25276. #26668 (Alexander Kuzmenkov). - ๐ Fix incorrect function names of
groupBitmapAnd/Or/Xor
. Fix #26557 (Amos Bird). - ๐ Fix crash in RabbitMQ shutdown in case RabbitMQ setup was not started. Closes #26504. #26529 (Kseniia Sumarokova).
- ๐ Fix issues with
CREATE DICTIONARY
query if dictionary name or database name was quoted. Closes #26491. #26508 (Maksim Kita). - ๐ Fix broken name resolution after rewriting column aliases. Fix #26432. #26475 (Amos Bird).
- ๐ Fix infinite non-joined block stream in
partial_merge_join
close #26325. #26374 (Vladimir C). - ๐ Fix possible crash when login as dropped user. Fix #26073. #26363 (Vitaly Baranov).
- Fix
optimize_distributed_group_by_sharding_key
for multiple columns (leads to incorrect result w/optimize_skip_unused_shards=1
/allow_nondeterministic_optimize_skip_unused_shards=1
and multiple columns in sharding key expression). #26353 (Azat Khuzhin). CAST
fromDate
toDateTime
(orDateTime64
) was not using the timezone of theDateTime
type. It can also affect the comparison betweenDate
andDateTime
. Inference of the common type forDate
andDateTime
also was not using the corresponding timezone. It affected the results of functionif
and array construction. Closes #24128. #24129 (Maksim Kita).- ๐ Fixed rare bug in lost replica recovery that may cause replicas to diverge. #26321 (tavplubix).
- ๐ Fix zstd decompression in case there are escape sequences at the end of internal buffer. Closes #26013. #26314 (Kseniia Sumarokova).
- ๐ Fix logical error on join with totals, close #26017. #26250 (Vladimir C).
- Remove excessive newline in
thread_name
column insystem.stack_trace
table. Fix #24124. #26210 (alexey-milovidov). - ๐ Fix
joinGet
withLowCarinality
columns, close #25993. #26118 (Vladimir C). - ๐ Fix possible crash in
pointInPolygon
if the settingvalidate_polygons
is turned off. #26113 (alexey-milovidov). - ๐ Fix throwing exception when iterate over non-existing remote directory. #26087 (ianton-ru).
- ๐ Fix rare server crash because of
abort
in ZooKeeper client. Fixes #25813. #26079 (alesapin). - ๐ Fix wrong thread count estimation for right subquery join in some cases. Close #24075. #26052 (Vladimir C).
- ๐ Fixed incorrect
sequence_id
in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes #21184. #26051 (tavplubix). - ๐ Fix possible mismatched header when using normal projection with
PREWHERE
. Fix #26020. #26038 (Amos Bird). - ๐ Fix formatting of type
Map
with integer keys toJSON
. #25982 (Anton Popov). - ๐ Fix possible deadlock during query profiler stack unwinding. Fix #25968. #25970 (Maksim Kita).
- ๐ Fix crash on call
dictGet()
with bad arguments. #25913 (Vitaly Baranov). - ๐ Fixed
scram-sha-256
authentication for PostgreSQL engines. Closes #24516. #25906 (Kseniia Sumarokova). - ๐ Fix extremely long backoff for background tasks when the background pool is full. Fixes #25836. #25893 (alesapin).
- ๐ Fix ARM exception handling with non default page size. Fixes #25512, #25044, #24901, #23183, #20221, #19703, #19028, #18391, #18121, #17994, #12483. #25854 (Maksim Kita).
- Fix sharding_key from column w/o function for
remote()
(beforeselect * from remote('127.1', system.one, dummy)
leads toUnknown column: dummy, there are only columns .
error). #25824 (Azat Khuzhin). - ๐ Fixed
Not found column ...
andMissing column ...
errors when selecting fromMaterializeMySQL
. Fixes #23708, #24830, #25794. #25822 (tavplubix). - Fix
optimize_skip_unused_shards_rewrite_in
for non-UInt64 types (may select incorrect shards eventually or throwCannot infer type of an empty tuple
orFunction tuple requires at least one argument
). #25798 (Azat Khuzhin). - ๐ Fix rare bug with
DROP PART
query forReplicatedMergeTree
tables which can lead to error messageUnexpected merged part intersecting drop range
. #25783 (alesapin). - ๐ Fix bug in
TTL
withGROUP BY
expression which refuses to executeTTL
after first execution in part. #25743 (alesapin). - ๐ Allow StorageMerge to access tables with aliases. Closes #6051. #25694 (Kseniia Sumarokova).
- ๐ Fix slow dict join in some cases, close #24209. #25618 (Vladimir C).
- ๐ Fix
ALTER MODIFY COLUMN
of columns, which participates in TTL expressions. #25554 (Anton Popov). - ๐ Fix assertion in
PREWHERE
with non-UInt8 type, close #19589. #25484 (Vladimir C). - ๐ Fix some fuzzed msan crash. Fixes #22517. #26428 (Nikolai Kochetov).
- โก๏ธ Update
chown
cmd check inclickhouse-server
docker entrypoint. It fixes error 'cluster pod restart failed (or timeout)' on kubernetes. #26545 (Ky Li).
- ๐ฒ New version is using
-
v21.7 Changes
July 09, 2021Backward Incompatible Change
- Improved performance of queries with explicitly defined large sets. Added compatibility setting
legacy_column_name_of_tuple_literal
. It makes sense to set it totrue
, while doing rolling update of cluster from version lower than 21.7 to any higher version. Otherwise distributed queries with explicitly defined sets atIN
clause may fail during update. #25371 (Anton Popov). - ๐ Forward/backward incompatible change of maximum buffer size in clickhouse-keeper (an experimental alternative to ZooKeeper). Better to do it now (before production), than later. #25421 (alesapin).
๐ New Feature
- ๐ Support configuration in YAML format as alternative to XML. This closes #3607. #21858 (BoloniniD).
- ๐ Provides a way to restore replicated table when the data is (possibly) present, but the ZooKeeper metadata is lost. Resolves #13458. #13652 (Mike Kot).
- Support structs and maps in Arrow/Parquet/ORC and dictionaries in Arrow input/output formats. Present new setting
output_format_arrow_low_cardinality_as_dictionary
. #24341 (Kruglov Pavel). - โ Added support for
Array
type in dictionaries. #25119 (Maksim Kita). - โ Added function
bitPositionsToArray
. Closes #23792. Author [Kevin Wan] (@MaxWk). #25394 (Maksim Kita). - โ Added function
dateName
to return names like 'Friday' or 'April'. Author [Daniil Kondratyev] (@dankondr). #25372 (Maksim Kita). - โ Add
toJSONString
function to serialize columns to their JSON representations. #25164 (Amos Bird). - ๐ฒ Now
query_log
has two new columns:initial_query_start_time
,initial_query_start_time_microsecond
that record the starting time of a distributed query if any. #25022 (Amos Bird). - โ Add aggregate function
segmentLengthSum
. #24250 (flynn). - Add a new boolean setting
prefer_global_in_and_join
which defaults all IN/JOIN as GLOBAL IN/JOIN. #23434 (Amos Bird). - ๐ Support
ALTER DELETE
queries forJoin
table engine. #23260 (foolchi). - โ Add
quantileBFloat16
aggregate function as well as the correspondingquantilesBFloat16
andmedianBFloat16
. It is very simple and fast quantile estimator with relative error not more than 0.390625%. This closes #16641. #23204 (Ivan Novitskiy). - Implement
sequenceNextNode()
function useful forflow analysis
. #19766 (achimbab).
Experimental Feature
- โ Add support for virtual filesystem over HDFS. #11058 (overshov) (Kseniia Sumarokova).
- ๐ Now clickhouse-keeper (an experimental alternative to ZooKeeper) supports ZooKeeper-like
digest
ACLs. #24448 (alesapin).
๐ Performance Improvement
- Added optimization that transforms some functions to reading of subcolumns to reduce amount of read data. E.g., statement
col IS NULL
is transformed to reading of subcolumncol.null
. Optimization can be enabled by settingoptimize_functions_to_subcolumns
which is currently off by default. #24406 (Anton Popov). - ๐ Rewrite more columns to possible alias expressions. This may enable better optimization, such as projections. #24405 (Amos Bird).
- Index of type
bloom_filter
can be used for expressions withhasAny
function with constant arrays. This closes: #24291. #24900 (Vasily Nemkov). - โ Add exponential backoff to reschedule read attempt in case RabbitMQ queues are empty. (ClickHouse has support for importing data from RabbitMQ). Closes #24340. #24415 (Kseniia Sumarokova).
๐ Improvement
- ๐ Allow to limit bandwidth for replication. Add two Replicated*MergeTree settings:
max_replicated_fetches_network_bandwidth
andmax_replicated_sends_network_bandwidth
which allows to limit maximum speed of replicated fetches/sends for table. Add two server-wide settings (indefault
user profile):max_replicated_fetches_network_bandwidth_for_server
andmax_replicated_sends_network_bandwidth_for_server
which limit maximum speed of replication for all tables. The settings are not followed perfectly accurately. Turned off by default. Fixes #1821. #24573 (alesapin). - Resource constraints and isolation for ODBC and Library bridges. Use separate
clickhouse-bridge
group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes #23861. #25280 (Kseniia Sumarokova). - โ Add standalone
clickhouse-keeper
symlink to the mainclickhouse
binary. Now it's possible to run coordination without the main clickhouse server. #24059 (alesapin). - ๐ Use global settings for query to
VIEW
. Fixed the behavior when queries toVIEW
use local settings, that leads to errors if setting onCREATE VIEW
andSELECT
were different. As for now,VIEW
won't use these modified settings, but you can still pass additional settings inSETTINGS
section ofCREATE VIEW
query. Close #20551. #24095 (Vladimir). - ๐ On server start, parts with incorrect partition ID would not be ever removed, but always detached. #25070. #25166 (Nikolai Kochetov).
- โฑ Increase size of background schedule pool to 128 (
background_schedule_pool_size
setting). It allows avoiding replication queue hung on slow zookeeper connection. #25072 (alesapin). - Add merge tree setting
max_parts_to_merge_at_once
which limits the number of parts that can be merged in the background at once. Doesn't affectOPTIMIZE FINAL
query. Fixes #1820. #24496 (alesapin). - ๐ Allow
NOT IN
operator to be used in partition pruning. #24894 (Amos Bird). - โ
Recognize IPv4 addresses like
127.0.1.1
as local. This is controversial and closes #23504. Michael Filimonov will test this feature. #24316 (alexey-milovidov). - ClickHouse database created with MaterializeMySQL (it is an experimental feature) now contains all column comments from the MySQL database that materialized. #25199 (Storozhuk Kostiantyn).
- Add settings (
connection_auto_close
/connection_max_tries
/connection_pool_size
) for MySQL storage engine. #24146 (Azat Khuzhin). - ๐ Improve startup time of Distributed engine. #25663 (Azat Khuzhin).
- ๐ Improvement for Distributed tables. Drop replicas from dirname for internal_replication=true (allows INSERT into Distributed with cluster from any number of replicas, before only 15 replicas was supported, everything more will fail with ENAMETOOLONG while creating directory for async blocks). #25513 (Azat Khuzhin).
- โ Added support
Interval
type forLowCardinality
. It is needed for intermediate values of some expressions. Closes #21730. #25410 (Vladimir). - โ Add
==
operator on time conditions forsequenceMatch
andsequenceCount
functions. For eg: sequenceMatch('(?1)(?t==1)(?2)')(time, data = 1, data = 2). #25299 (Christophe Kalenzaga). - Add settings
http_max_fields
,http_max_field_name_size
,http_max_field_value_size
. #25296 (Ivan). - โ Add support for function
if
withDecimal
andInt
types on its branches. This closes #20549. This closes #10142. #25283 (alexey-milovidov). - โก๏ธ Update prompt in
clickhouse-client
and display a message when reconnecting. This closes #10577. #25281 (alexey-milovidov). - Correct memory tracking in aggregate function
topK
. This closes #25259. #25260 (alexey-milovidov). - ๐ Fix
topLevelDomain
for IDN hosts (i.e.example.ัั
), before it returns empty string for such hosts. #25103 (Azat Khuzhin). - Detect Linux kernel version at runtime (for worked nested epoll, that is required for
async_socket_for_remote
/use_hedged_requests
, otherwise remote queries may stuck). #25067 (Azat Khuzhin). - For distributed query, when
optimize_skip_unused_shards=1
, allow to skip shard with condition like(sharding key) IN (one-element-tuple)
. (Tuples with many elements were supported. Tuple with single element did not work because it is parsed as literal). #24930 (Amos Bird). - ๐ Improved log messages of S3 errors, no more double whitespaces in case of empty keys and buckets. #24897 (Vladimir Chebotarev).
- Some queries require multi-pass semantic analysis. Try reusing built sets for
IN
in this case. #24874 (Amos Bird). - Respect
max_distributed_connections
forinsert_distributed_sync
(otherwise for huge clusters and sync insert it may run out ofmax_thread_pool_size
). #24754 (Azat Khuzhin). - Avoid hiding errors like
Limit for rows or bytes to read exceeded
for scalar subqueries. #24545 (nvartolomei). - ๐ Make String-to-Int parser stricter so that
toInt64('+')
will throw. #24475 (Amos Bird). - If
SSD_CACHE
is created with DDL query, it can be created only insideuser_files
directory. #24466 (Maksim Kita). - 0๏ธโฃ PostgreSQL support for specifying non default schema for insert queries. Closes #24149. #24413 (Kseniia Sumarokova).
- ๐ Fix IPv6 addresses resolving (i.e. fixes
select * from remote('[::1]', system.one)
). #24319 (Azat Khuzhin). - ๐ Fix trailing whitespaces in FROM clause with subqueries in multiline mode, and also changes the output of the queries slightly in a more human friendly way. #24151 (Azat Khuzhin).
- Improvement for Distributed tables. Add ability to split distributed batch on failures (i.e. due to memory limits, corruptions), under
distributed_directory_monitor_split_batch_on_failure
(OFF by default). #23864 (Azat Khuzhin). - ๐ Handle column name clashes for
Join
table engine. Closes #20309. #23769 (Vladimir). - Display progress for
File
table engine inclickhouse-local
and on INSERT query inclickhouse-client
when data is passed to stdin. Closes #18209. #23656 (Kseniia Sumarokova). - Bugfixes and improvements of
clickhouse-copier
. Allow to copy tables with different (but compatible schemas). Closes #9159. Added test to copy ReplacingMergeTree. Closes #22711. Support TTL on columns and Data Skipping Indices. It simply removes it to create internal Distributed table (underlying table will have TTL and skipping indices). Closes #19384. Allow to copy MATERIALIZED and ALIAS columns. There are some cases in which it could be helpful (e.g. if this column is in PRIMARY KEY). Now it could be allowed by settingallow_to_copy_alias_and_materialized_columns
property to true in task configuration. Closes #9177. Closes #11007. Closes #9514. Added a propertyallow_to_drop_target_partitions
in task configuration to drop partition in original table before moving helping tables. Closes #20957. Get rid ofOPTIMIZE DEDUPLICATE
query. This hack was needed, becauseALTER TABLE MOVE PARTITION
was retried many times and plain MergeTree tables don't have deduplication. Closes #17966. Write progress to ZooKeeper node on pathtask_path + /status
in JSON format. Closes #20955. Support for ReplicatedTables without arguments. Closes #24834 .#23518 (Nikita Mikhaylov). - โ Added sleep with backoff between read retries from S3. #23461 (Vladimir Chebotarev).
- ๐ Respect
insert_allow_materialized_columns
(allows materialized columns) for INSERT intoDistributed
table. #23349 (Azat Khuzhin). - โ Add ability to push down LIMIT for distributed queries. #23027 (Azat Khuzhin).
- ๐ Fix zero-copy replication with several S3 volumes (Fixes #22679). #22864 (ianton-ru).
- ๐ฒ Resolve the actual port number bound when a user requests any available port from the operating system to show it in the log message. #25569 (bnaecker).
- ๐ Fixed case, when sometimes conversion of postgres arrays resulted in String data type, not n-dimensional array, because
attndims
works incorrectly in some cases. Closes #24804. #25538 (Kseniia Sumarokova). - ๐ Fix convertion of DateTime with timezone for MySQL, PostgreSQL, ODBC. Closes #5057. #25528 (Kseniia Sumarokova).
- ๐ Distinguish KILL MUTATION for different tables (fixes unexpected
Cancelled mutating parts
error). #25025 (Azat Khuzhin). - ๐ Allow to declare S3 disk at root of bucket (S3 virtual filesystem is an experimental feature under development). #24898 (Vladimir Chebotarev).
- Enable reading of subcolumns (e.g. components of Tuples) for distributed tables. #24472 (Anton Popov).
- A feature for MySQL compatibility protocol: make
user
function to return correct output. Closes #25697. #25697 (sundyli).
๐ Bug Fix
- ๐ Improvement for backward compatibility. Use old modulo function version when used in partition key. Closes #23508. #24157 (Kseniia Sumarokova).
- ๐ Fix extremely rare bug on low-memory servers which can lead to the inability to perform merges without restart. Possibly fixes #24603. #24872 (alesapin).
- ๐ Fix extremely rare error
Tagging already tagged part
in replication queue during concurrentalter move/replace partition
. Possibly fixes #22142. #24961 (alesapin). - ๐ Fix potential crash when calculating aggregate function states by aggregation of aggregate function states of other aggregate functions (not a practical use case). See #24523. #25015 (alexey-milovidov).
- ๐ Fixed the behavior when query
SYSTEM RESTART REPLICA
orSYSTEM SYNC REPLICA
does not finish. This was detected on server with extremely low amount of RAM. #24457 (Nikita Mikhaylov). - ๐ Fix bug which can lead to ZooKeeper client hung inside clickhouse-server. #24721 (alesapin).
- ๐จ If ZooKeeper connection was lost and replica was cloned after restoring the connection, its replication queue might contain outdated entries. Fixed failed assertion when replication queue contains intersecting virtual parts. It may rarely happen if some data part was lost. Print error in log instead of terminating. #24777 (tavplubix).
- Fix lost
WHERE
condition in expression-push-down optimization of query plan (settingquery_plan_filter_push_down = 1
by default). Fixes #25368. #25370 (Nikolai Kochetov). - Fix bug which can lead to intersecting parts after merges with TTL:
Part all_40_40_0 is covered by all_40_40_1 but should be merged into all_40_41_1. This shouldn't happen often.
. #25549 (alesapin). - ๐ On ZooKeeper connection loss
ReplicatedMergeTree
table might wait for background operations to complete before trying to reconnect. It's fixed, now background operations are stopped forcefully. #25306 (tavplubix). - ๐ Fix error
Key expression contains comparison between inconvertible types
for queries withARRAY JOIN
in case if array is used in primary key. Fixes #8247. #25546 (Anton Popov). - ๐ Fix wrong totals for query
WITH TOTALS
andWITH FILL
. Fixes #20872. #25539 (Anton Popov). - ๐ Fix data race when querying
system.clusters
while reloading the cluster configuration at the same time. #25737 (Amos Bird). - ๐ Fixed
No such file or directory
error on movingDistributed
table between databases. Fixes #24971. #25667 (tavplubix). - ๐
REPLACE PARTITION
might be ignored in rare cases if the source partition was empty. It's fixed. Fixes #24869. #25665 (tavplubix). - ๐ Fixed a bug in
Replicated
database engine that might rarely cause some replica to skip enqueued DDL query. #24805 (tavplubix). - ๐ Fix null pointer dereference in
EXPLAIN AST
without query. #25631 (Nikolai Kochetov). - ๐ Fix waiting of automatic dropping of empty parts. It could lead to full filling of background pool and stuck of replication. #23315 (Anton Popov).
- ๐ Fix restore of a table stored in S3 virtual filesystem (it is an experimental feature not ready for production). #25601 (ianton-ru).
- ๐ Fix nullptr dereference in
Arrow
format when usingDecimal256
. AddDecimal256
support forArrow
format. #25531 (Kruglov Pavel). - ๐ Fix excessive underscore before the names of the preprocessed configuration files. #25431 (Vitaly Baranov).
- A fix for
clickhouse-copier
tool: Fix segfault when sharding_key is absent in task config for copier. #25419 (Nikita Mikhaylov). - ๐ Fix
REPLACE
column transformer when used in DDL by correctly quoting the formated query. This fixes #23925. #25391 (Amos Bird). - ๐ Fix the possibility of non-deterministic behaviour of the
quantileDeterministic
function and similar. This closes #20480. #25313 (alexey-milovidov). - ๐ Support
SimpleAggregateFunction(LowCardinality)
forSummingMergeTree
. Fixes #25134. #25300 (Nikolai Kochetov). - ๐ Fix logical error with exception message "Cannot sum Array/Tuple in min/maxMap". #25298 (Kruglov Pavel).
- ๐ Fix error
Bad cast from type DB::ColumnLowCardinality to DB::ColumnVector<char8_t>
for queries whereLowCardinality
argument was used for IN (this bug appeared in 21.6). Fixes #25187. #25290 (Nikolai Kochetov). - ๐ Fix incorrect behaviour of
joinGetOrNull
with not-nullable columns. This fixes #24261. #25288 (Amos Bird). - ๐ Fix incorrect behaviour and UBSan report in big integers. In previous versions
CAST(1e19 AS UInt128)
returned zero. #25279 (alexey-milovidov). - ๐ Fixed an error which occurred while inserting a subset of columns using CSVWithNames format. Fixes #25129. #25169 (Nikita Mikhaylov).
- ๐ Do not use table's projection for
SELECT
withFINAL
. It is not supported yet. #25163 (Amos Bird). - ๐ Fix possible parts loss after updating up to 21.5 in case table used
UUID
in partition key. (It is not recommended to useUUID
in partition key). Fixes #25070. #25127 (Nikolai Kochetov). - Fix crash in query with cross join and
joined_subquery_requires_alias = 0
. Fixes #24011. #25082 (Nikolai Kochetov). - ๐ Fix bug with constant maps in mapContains function that lead to error
empty column was returned by function mapContains
. Closes #25077. #25080 (Kruglov Pavel). - โ Remove possibility to create tables with columns referencing themselves like
a UInt32 ALIAS a + 1
orb UInt32 MATERIALIZED b
. Fixes #24910, #24292. #25059 (alesapin). - Fix wrong result when using aggregate projection with not empty
GROUP BY
key to execute query withGROUP BY
by empty key. #25055 (Amos Bird). - ๐ Fix serialization of splitted nested messages in Protobuf format. This PR fixes #24647. #25000 (Vitaly Baranov).
- ๐ Fix limit/offset settings for distributed queries (ignore on the remote nodes). #24940 (Azat Khuzhin).
- ๐ Fix possible heap-buffer-overflow in
Arrow
format. #24922 (Kruglov Pavel). - ๐ Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3 (S3 virtual filesystem is an experimental feature under development that should not be used in production). #24885 (Pavel Kovalenko).
- ๐ Fix "Missing columns" exception when joining Distributed Materialized View. #24870 (Azat Khuzhin).
- ๐ Allow
NULL
values in postgresql compatibility protocol. Closes #22622. #24857 (Kseniia Sumarokova). - ๐ Fix bug when exception
Mutation was killed
can be thrown to the client on mutation wait when mutation not loaded into memory yet. #24809 (alesapin). - ๐ Fixed bug in deserialization of random generator state with might cause some data types such as
AggregateFunction(groupArraySample(N), T))
to behave in a non-deterministic way. #24538 (tavplubix). - ๐ Disallow building uniqXXXXStates of other aggregation states. #24523 (Raรบl Marรญn). Then allow it back by actually eliminating the root cause of the related issue. (alexey-milovidov).
- ๐ Fix usage of tuples in
CREATE .. AS SELECT
queries. #24464 (Anton Popov). - ๐ Fix computation of total bytes in
Buffer
table. In current ClickHouse version total_writes.bytes counter decreases too much during the buffer flush. It leads to counter overflow and totalBytes return something around 17.44 EB some time after the flush. #24450 (DimasKovas). - ๐ Fix incorrect information about the monotonicity of toWeek function. This fixes #24422 . This bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/5212 , and was exposed later by smarter partition pruner. #24446 (Amos Bird).
- ๐ When user authentication is managed by LDAP. Fixed potential deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. #24431 (Denis Glazachev).
- ๐ In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes #23905. #24399 (Ivan).
- ๐ Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. #24321 (Amos Bird).
- ๐ Fixed a bug in moving Materialized View from Ordinary to Atomic database (
RENAME TABLE
query). Now inner table is moved to new database together with Materialized View. Fixes #23926. #24309 (tavplubix). - ๐ Allow empty HTTP headers. Fixes #23901. #24285 (Ivan).
- โก๏ธ Correct processing of mutations (ALTER UPDATE/DELETE) in Memory tables. Closes #24274. #24275 (flynn).
- ๐ Make column LowCardinality property in JOIN output the same as in the input, close #23351, close #20315. #24061 (Vladimir).
- A fix for Kafka tables. Fix the bug in failover behavior when Engine = Kafka was not able to start consumption if the same consumer had an empty assignment previously. Closes #21118. #21267 (filimonov).
๐ Build/Testing/Packaging Improvement
- โ Add
darwin-aarch64
(Mac M1 / Apple Silicon) builds in CI #25560 (Ivan) and put the links to the docs and website (alexey-milovidov). - โ Adds cross-platform embedding of binary resources into executables. It works on Illumos. #25146 (bnaecker).
- โ Add join related options to stress tests to improve fuzzing. #25200 (Vladimir).
- ๐ Enable build with s3 module in osx #25217. #25218 (kevin wan).
- โ Add integration test cases to cover JDBC bridge. #25047 (Zhichun Wu).
- ๐ง Integration tests configuration has special treatment for dictionaries. Removed remaining dictionaries manual setup. #24728 (Ilya Yatsishin).
- โ Add libfuzzer tests for YAMLParser class. #24480 (BoloniniD).
- Ubuntu 20.04 is now used to run integration tests, docker-compose version used to run integration tests is updated to 1.28.2. Environment variables now take effect on docker-compose. Rework test_dictionaries_all_layouts_separate_sources to allow parallel run. #20393 (Ilya Yatsishin).
- ๐ Fix TOCTOU error in installation script. #25277 (alexey-milovidov).
- Improved performance of queries with explicitly defined large sets. Added compatibility setting
-
v21.3 Changes
March 12, 2021Backward Incompatible Change
- ๐ Now it's not allowed to create MergeTree tables in old syntax with table TTL because it's just ignored. Attach of old tables is still possible. #20282 (alesapin).
- Now all case-insensitive function names will be rewritten to their canonical representations. This is needed for projection query routing (the upcoming feature). #20174 (Amos Bird).
- ๐ Fix creation of
TTL
in cases, when its expression is a function and it is the same asORDER BY
key. Now it's allowed to set custom aggregation to primary key columns inTTL
withGROUP BY
. Backward incompatible: For primary key columns, which are not inGROUP BY
and aren't set explicitly now is applied functionany
instead ofmax
, when TTL is expired. Also if you use TTL withWHERE
orGROUP BY
you can see exceptions at merges, while making rolling update. #15450 (Anton Popov).
๐ New Feature
- Add file engine settings:
engine_file_empty_if_not_exists
andengine_file_truncate_on_insert
. #20620 (M0r64n). - โ Add aggregate function
deltaSum
for summing the differences between consecutive rows. #20057 (Russ Frank). - New
event_time_microseconds
column insystem.part_log
table. #20027 (Bharat Nallan). - โ Added
timezoneOffset(datetime)
function which will give the offset from UTC in seconds. This close #issue:19850. #19962 (keenwolf). - Add setting
insert_shard_id
to support insert data into specific shard from distributed table. #19961 (flynn). - โก๏ธ Function
reinterpretAs
updated to support big integers. Fixes #19691. #19858 (Maksim Kita). - โ Added Server Side Encryption Customer Keys (the
x-amz-server-side-encryption-customer-(key/md5)
header) support in S3 client. See the link. Closes #19428. #19748 (Vladimir Chebotarev). - โ Added
implicit_key
option forexecutable
dictionary source. It allows to avoid printing key for every record if records comes in the same order as the input keys. Implements #14527. #19677 (Maksim Kita). - Add quota type
query_selects
andquery_inserts
. #19603 (JackyWoo). - โ Add function
extractTextFromHTML
#19600 (zlx19950903), (alexey-milovidov). - Tables with
MergeTree*
engine now have two new table-level settings for query concurrency control. Settingmax_concurrent_queries
limits the number of concurrently executed queries which are related to this table. Settingmin_marks_to_honor_max_concurrent_queries
tells to apply previous setting only if query reads at least this number of marks. #19544 (Amos Bird). - โ Added
file
function to read file from user_files directory as a String. This is different from thefile
table function. This implements #issue:18851. #19204 (keenwolf).
Experimental feature
- โ Add experimental
Replicated
database engine. It replicates DDL queries across multiple hosts. #16193 (tavplubix). - Introduce experimental support for window functions, enabled with
allow_experimental_window_functions = 1
. This is a preliminary, alpha-quality implementation that is not suitable for production use and will change in backward-incompatible ways in future releases. Please see the documentation for the list of supported features. #20337 (Alexander Kuzmenkov). - โ Add the ability to backup/restore metadata files for DiskS3. #18377 (Pavel Kovalenko).
๐ Performance Improvement
- Hedged requests for remote queries. When setting
use_hedged_requests
enabled (off by default), allow to establish many connections with different replicas for query. New connection is enabled in case existent connection(s) with replica(s) were not established withinhedged_connection_timeout
or no data was received withinreceive_data_timeout
. Query uses the first connection which send non empty progress packet (or data packet, ifallow_changing_replica_until_first_data_packet
); other connections are cancelled. Queries withmax_parallel_replicas > 1
are supported. #19291 (Kruglov Pavel). This allows to significantly reduce tail latencies on very large clusters. - โ Added support for
PREWHERE
(and enable the corresponding optimization) when tables have row-level security expressions specified. #19576 (Denis Glazachev). - The setting
distributed_aggregation_memory_efficient
is enabled by default. It will lower memory usage and improve performance of distributed queries. #20599 (alexey-milovidov). - ๐ Improve performance of GROUP BY multiple fixed size keys. #20472 (alexey-milovidov).
- ๐ Improve performance of aggregate functions by more strict aliasing. #19946 (alexey-milovidov).
- ๐ Speed up reading from
Memory
tables in extreme cases (when reading speed is in order of 50 GB/sec) by simplification of pipeline and (consequently) less lock contention in pipeline scheduling. #20468 (alexey-milovidov). - ๐ Partially reimplement HTTP server to make it making less copies of incoming and outgoing data. It gives up to 1.5 performance improvement on inserting long records over HTTP. #19516 (Ivan).
- โ Add
compress
setting forMemory
tables. If it's enabled the table will use less RAM. On some machines and datasets it can also work faster on SELECT, but it is not always the case. This closes #20093. Note: there are reasons why Memory tables can work slower than MergeTree: (1) lack of compression (2) static size of blocks (3) lack of indices and prewhere... #20168 (alexey-milovidov). - ๐ Slightly better code in aggregation. #20978 (alexey-milovidov).
- โ Add back
intDiv
/modulo
specializations for better performance. This fixes #21293 . The regression was introduced in https://github.com/ClickHouse/ClickHouse/pull/18145 . #21307 (Amos Bird). - Do not squash blocks too much on INSERT SELECT if inserting into Memory table. In previous versions inefficient data representation was created in Memory table after INSERT SELECT. This closes #13052. #20169 (alexey-milovidov).
- ๐ Fix at least one case when DataType parser may have exponential complexity (found by fuzzer). This closes #20096. #20132 (alexey-milovidov).
- Parallelize SELECT with FINAL for single part with level > 0 when
do_not_merge_across_partitions_select_final
setting is 1. #19375 (Kruglov Pavel). - Fill only requested columns when querying
system.parts
andsystem.parts_columns
. Closes #19570. #21035 (Anmol Arora). - Perform algebraic optimizations of arithmetic expressions inside
avg
aggregate function. close #20092. #20183 (flynn).
๐ Improvement
- ๐ Case-insensitive compression methods for table functions. Also fixed LZMA compression method which was checked in upper case. #21416 (Vladimir Chebotarev).
- โ Add two settings to delay or throw error during insertion when there are too many inactive parts. This is useful when server fails to clean up parts quickly enough. #20178 (Amos Bird).
- ๐ Provide better compatibility for mysql clients. 1. mysql jdbc 2. mycli. #21367 (Amos Bird).
- Forbid to drop a column if it's referenced by materialized view. Closes #21164. #21303 (flynn).
- MySQL dictionary source will now retry unexpected connection failures (Lost connection to MySQL server during query) which sometimes happen on SSL/TLS connections. #21237 (Alexander Kazakov).
- ๐ Usability improvement: more consistent
DateTime64
parsing: recognize the case when unix timestamp with subsecond resolution is specified as scaled integer (like1111111111222
instead of1111111111.222
). This closes #13194. #21053 (alexey-milovidov). - Do only merging of sorted blocks on initiator with distributed_group_by_no_merge. #20882 (Azat Khuzhin).
- When loading config for mysql source ClickHouse will now randomize the list of replicas with the same priority to ensure the round-robin logics of picking mysql endpoint. This closes #20629. #20632 (Alexander Kazakov).
- Function 'reinterpretAs(x, Type)' renamed into 'reinterpret(x, Type)'. #20611 (Maksim Kita).
- ๐ Support vhost for RabbitMQ engine #20576. #20596 (Kseniia Sumarokova).
- ๐ Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the
Map
data type. Omitted values are now set by default. #20506 (Vitaly Baranov). - ๐ Fixed race between execution of distributed DDL tasks and cleanup of DDL queue. Now DDL task cannot be removed from ZooKeeper if there are active workers. Fixes #20016. #20448 (tavplubix).
- ๐ Make FQDN and other DNS related functions work correctly in alpine images. #20336 (filimonov).
- Do not allow early constant folding of explicitly forbidden functions. #20303 (Azat Khuzhin).
- Implicit conversion from integer to Decimal type might succeeded if integer value doe not fit into Decimal type. Now it throws
ARGUMENT_OUT_OF_BOUND
. #20232 (tavplubix). - Lockless
SYSTEM FLUSH DISTRIBUTED
. #20215 (Azat Khuzhin). - Normalize count(constant), sum(1) to count(). This is needed for projection query routing. #20175 (Amos Bird).
- ๐ Support all native integer types in bitmap functions. #20171 (Amos Bird).
- โก๏ธ Updated
CacheDictionary
,ComplexCacheDictionary
,SSDCacheDictionary
,SSDComplexKeyDictionary
to use LRUHashMap as underlying index. #20164 (Maksim Kita). - ๐ง The setting
access_management
is now configurable on startup by providingCLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT
, defaults to disabled (0
) which was the prior value. #20139 (Marquitos). - ๐ Fix toDateTime64(toDate()/toDateTime()) for DateTime64 - Implement DateTime64 clamping to match DateTime behaviour. #20131 (Azat Khuzhin).
- Quota improvements: SHOW TABLES is now considered as one query in the quota calculations, not two queries. SYSTEM queries now consume quota. Fix calculation of interval's end in quota consumption. #20106 (Vitaly Baranov).
- ๐ Supports
path IN (set)
expressions forsystem.zookeeper
table. #20105 (ๅฐ่ทฏ). - ๐ Show full details of
MaterializeMySQL
tables insystem.tables
. #20051 (Stig Bakken). - ๐ Fix data race in executable dictionary that was possible only on misuse (when the script returns data ignoring its input). #20045 (alexey-milovidov).
- The value of MYSQL_OPT_RECONNECT option can now be controlled by "opt_reconnect" parameter in the config section of mysql replica. #19998 (Alexander Kazakov).
- If user calls
JSONExtract
function withFloat32
type requested, allow inaccurate conversion to the result type. For example the number0.1
in JSON is double precision and is not representable in Float32, but the user still wants to get it. Previous versions return 0 for non-Nullable type and NULL for Nullable type to indicate that conversion is imprecise. The logic was 100% correct but it was surprising to users and leading to questions. This closes #13962. #19960 (alexey-milovidov). - โ Add conversion of block structure for INSERT into Distributed tables if it does not match. #19947 (Azat Khuzhin).
- Improvement for the
system.distributed_ddl_queue
table. Initialize MaxDDLEntryID to the last value after restarting. Before this PR, MaxDDLEntryID will remain zero until a new DDLTask is processed. #19924 (Amos Bird). - ๐ Show
MaterializeMySQL
tables insystem.parts
. #19770 (Stig Bakken). - โ Add separate config directive for
Buffer
profile. #19721 (Azat Khuzhin). - ๐ Move conditions that are not related to JOIN to WHERE clause. #18720. #19685 (hexiaoting).
- Add ability to throttle INSERT into Distributed based on amount of pending bytes for async send (
bytes_to_delay_insert
/max_delay_to_insert
andbytes_to_throw_insert
settings forDistributed
engine has been added). #19673 (Azat Khuzhin). - ๐ Fix some rare cases when write errors can be ignored in destructors. #19451 (Azat Khuzhin).
- ๐จ Print inline frames in stack traces for fatal errors. #19317 (Ivan).
๐ Bug Fix
- ๐ Fix redundant reconnects to ZooKeeper and the possibility of two active sessions for a single clickhouse server. Both problems introduced in #14678. #21264 (alesapin).
- ๐ Fix error
Bad cast from type ... to DB::ColumnLowCardinality
while inserting into table withLowCardinality
column fromValues
format. Fixes #21140 #21357 (Nikolai Kochetov). - ๐ Fix a deadlock in
ALTER DELETE
mutations for non replicated MergeTree table engines when the predicate contains the table itself. Fixes #20558. #21477 (alesapin). - ๐ Fix SIGSEGV for distributed queries on failures. #21434 (Azat Khuzhin).
- ๐ Now
ALTER MODIFY COLUMN
queries will correctly affect changes in partition key, skip indices, TTLs, and so on. Fixes #13675. #21334 (alesapin). - ๐ Fix bug with
join_use_nulls
and joiningTOTALS
from subqueries. This closes #19362 and #21137. #21248 (vdimir). - ๐ Fix crash in
EXPLAIN
for query withUNION
. Fixes #20876, #21170. #21246 (flynn). - ๐ Now mutations allowed only for table engines that support them (MergeTree family, Memory, MaterializedView). Other engines will report a more clear error. Fixes #21168. #21183 (alesapin).
- ๐ Fixes #21112. Fixed bug that could cause duplicates with insert query (if one of the callbacks came a little too late). #21138 (Kseniia Sumarokova).
- Fix
input_format_null_as_default
take effective when types are nullable. This fixes #21116 . #21121 (Amos Bird). - ๐ fix bug related to cast Tuple to Map. Closes #21029. #21120 (hexiaoting).
- ๐ Fix the metadata leak when the Replicated*MergeTree with custom (non default) ZooKeeper cluster is dropped. #21119 (fastio).
- ๐ Fix type mismatch issue when using LowCardinality keys in joinGet. This fixes #21114. #21117 (Amos Bird).
- fix default_replica_path and default_replica_name values are useless on Replicated(*)MergeTree engine when the engine needs specify other parameters. #21060 (mxzlxy).
- Out of bound memory access was possible when formatting specifically crafted out of range value of type
DateTime64
. This closes #20494. This closes #20543. #21023 (alexey-milovidov). - Block parallel insertions into storage join. #21009 (vdimir).
- ๐ Fixed behaviour, when
ALTER MODIFY COLUMN
created mutation, that will knowingly fail. #21007 (Anton Popov). - โก๏ธ Closes #9969. Fixed Brotli http compression error, which reproduced for large data sizes, slightly complicated structure and with json output format. Update Brotli to the latest version to include the "fix rare access to uninitialized data in ring-buffer". #20991 (Kseniia Sumarokova).
- ๐ Fix 'Empty task was returned from async task queue' on query cancellation. #20881 (Azat Khuzhin).
- ๐
USE database;
query did not work when using MySQL 5.7 client to connect to ClickHouse server, it's fixed. Fixes #18926. #20878 (tavplubix). - ๐ Fix usage of
-Distinct
combinator with-State
combinator in aggregate functions. #20866 (Anton Popov). - ๐ Fix subquery with union distinct and limit clause. close #20597. #20610 (flynn).
- ๐ Fixed inconsistent behavior of dictionary in case of queries where we look for absent keys in dictionary. #20578 (Nikita Mikhaylov).
- ๐ Fix the number of threads for scalar subqueries and subqueries for index (after #19007 single thread was always used). Fixes #20457, #20512. #20550 (Nikolai Kochetov).
- ๐ Fix crash which could happen if unknown packet was received from remove query (was introduced in #17868). #20547 (Azat Khuzhin).
- โ Add proper checks while parsing directory names for async INSERT (fixes SIGSEGV). #20498 (Azat Khuzhin).
- ๐ Fix function
transform
does not work properly for floating point keys. Closes #20460. #20479 (flynn). - ๐ Fix infinite loop when propagating WITH aliases to subqueries. This fixes #20388. #20476 (Amos Bird).
- ๐ Fix abnormal server termination when http client goes away. #20464 (Azat Khuzhin).
- Fix
LOGICAL_ERROR
forjoin_use_nulls=1
when JOIN contains const from SELECT. #20461 (Azat Khuzhin). - ๐ Check if table function
view
is used in expression list and throw an error. This fixes #20342. #20350 (Amos Bird). - Avoid invalid dereference in RANGE_HASHED() dictionary. #20345 (Azat Khuzhin).
- ๐ Fix null dereference with
join_use_nulls=1
. #20344 (Azat Khuzhin). - ๐ Fix incorrect result of binary operations between two constant decimals of different scale. Fixes #20283. #20339 (Maksim Kita).
- ๐ Fix too often retries of failed background tasks for
ReplicatedMergeTree
table engines family. This could lead to too verbose logging and increased CPU load. Fixes #20203. #20335 (alesapin). - Restrict to
DROP
orRENAME
version column of*CollapsingMergeTree
andReplacingMergeTree
table engines. #20300 (alesapin). - ๐ Fixed the behavior when in case of broken JSON we tried to read the whole file into memory which leads to exception from the allocator. Fixes #19719. #20286 (Nikita Mikhaylov).
- ๐ Fix exception during vertical merge for
MergeTree
table engines family which don't allow to perform vertical merges. Fixes #20259. #20279 (alesapin). - ๐ Fix rare server crash on config reload during the shutdown. Fixes #19689. #20224 (alesapin).
- ๐ Fix CTE when using in INSERT SELECT. This fixes #20187, fixes #20195. #20211 (Amos Bird).
- ๐ Fixes #19314. #20156 (Ivan).
- ๐ fix toMinute function to handle special timezone correctly. #20149 (keenwolf).
- ๐ Fix server crash after query with
if
function withTuple
type of then/else branches result.Tuple
type must containArray
or another complex type. Fixes #18356. #20133 (alesapin). - The
MongoDB
table engine now establishes connection only when it's going to read data.ATTACH TABLE
won't try to connect anymore. #20110 (Vitaly Baranov). - ๐ Bugfix in StorageJoin. #20079 (vdimir).
- ๐ Fix the case when calculating modulo of division of negative number by small divisor, the resulting data type was not large enough to accomodate the negative result. This closes #20052. #20067 (alexey-milovidov).
- โก๏ธ MaterializeMySQL: Fix replication for statements that update several tables. #20066 (Hรฅvard Kvรฅlen).
- ๐ณ Prevent "Connection refused" in docker during initialization script execution. #20012 (filimonov).
EmbeddedRocksDB
is an experimental storage. Fix the issue with lack of proper type checking. Simplified code. This closes #19967. #19972 (alexey-milovidov).- ๐ Fix a segfault in function
fromModifiedJulianDay
when the argument type isNullable(T)
for any integral types other than Int32. #19959 (PHO). - ๐ BloomFilter index crash fix. Fixes #19757. #19884 (Maksim Kita).
- ๐ Deadlock was possible if system.text_log is enabled. This fixes #19874. #19875 (alexey-milovidov).
- ๐ Fix starting the server with tables having default expressions containing dictGet(). Allow getting return type of dictGet() without loading dictionary. #19805 (Vitaly Baranov).
- ๐ Fix clickhouse-client abort exception while executing only
select
. #19790 (taiyang-li). - ๐ Fix a bug that moving pieces to destination table may failed in case of launching multiple clickhouse-copiers. #19743 (madianjun).
- ๐ Background thread which executes
ON CLUSTER
queries might hang waiting for dropped replicated table to do something. It's fixed. #19684 (yiguolei).
๐ Build/Testing/Packaging Improvement
- ๐ Allow to build ClickHouse with AVX-2 enabled globally. It gives slight performance benefits on modern CPUs. Not recommended for production and will not be supported as official build for now. #20180 (alexey-milovidov).
- ๐ Fix some of the issues found by Coverity. See #19964. #20010 (alexey-milovidov).
- ๐ Allow to start up with modified binary under gdb. In previous version if you set up breakpoint in gdb before start, server will refuse to start up due to failed integrity check. #21258 (alexey-milovidov).
- โ Add a test for different compression methods in Kafka. #21111 (filimonov).
- Fixed port clash from test_storage_kerberized_hdfs test. #19974 (Ilya Yatsishin).
- ๐ณ Print
stdout
andstderr
to log when failed to start docker in integration tests. Before this PR there was a very short error message in this case which didn't help to investigate the problems. #20631 (Vitaly Baranov).