ClickHouse SELECT Advances
Dynamic column selection (also known as a COLUMNS
expression) allows you to match some columns in a result with a re2 regular expression.
Matchers
Select all columns:
SELECT * FROM hits;
Select only specific subset of columns:
SELECT COLUMNS('URL.*ID') FROM hits;
┌─URLCategoryID─┬─URLRegionID─┐
│ 15664 │ 216 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
└───────────────┴─────────────┘
SELECT COLUMNS(WatchID, UserID) FROM hits;
┌─────────────WatchID─┬───────────────UserID─┐
│ 5365534025466744368 │ 2213399965098237040 │
│ 5287559196528052048 │ 2213399965098237040 │
│ 9057961221679572813 │ 2213399965098237040 │
│ 5520508008786474572 │ 9141107111321352513 │
Column transformers
Apply transformations for selected columns or remove some columns from selections:
APPLY: Allows you to invoke some function for each row returned by an outer table expression of a query.
SELECT * APPLY toString FROM hits;
SELECT COLUMNS('URL.*ID') APPLY toString FROM hits;
SELECT COLUMNS('URL.*ID') APPLY x -> toString(x) FROM hits;
┌─toString(URLCategoryID)─┬─toString(URLRegionID)─┐
│ 15664 │ 216 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
└─────────────────────────┴───────────────────────┘
EXCEPT: exclude one or more columns from the result.
SELECT * EXCEPT (UserID, URLRegionID) FROM hits;
SELECT COLUMNS('URL.*ID') EXCEPT URLCategoryID FROM hits;
┌─URLRegionID─┐
│ 216 │
│ 0 │
│ 0 │
│ 0 │
│ 0 │
└─────────────┘
REPLACE: Specifies one or more expression aliases
SELECT COLUMNS('URL.*ID') REPLACE (URLCategoryID * 10 AS URLCategoryID)
FROM hits;
┌─URLCategoryID─┬─URLRegionID─┐
│ 156640 │ 216 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
│ 0 │ 0 │
└───────────────┴─────────────┘
SELECT COLUMNS('URL.*ID') REPLACE (leftPad(toString(URLRegionID), 10, '*') AS URLRegionID)
FROM hits;
┌─URLCategoryID─┬─URLRegionID─┐
│ 15664 │ *******216 │
│ 0 │ *********0 │
│ 0 │ *********0 │
│ 0 │ *********0 │
│ 0 │ *********0 │
└───────────────┴─────────────┘
We can also combine them:
SELECT COLUMNS('URL.*ID') APPLY(toString) APPLY(length) APPLY(max) FROM hits;
┌─max(length(toString(URLCategoryID)))─┬─max(length(toString(URLRegionID)))─┐
│ 5 │ 5 │
└──────────────────────────────────────┴────────────────────────────────────┘
Select from multiple tables
merge()
create a temporary Merge table with structure is taken from the first table encountered that matches the regular expression.
When upgrading ClickHouse usually rename system table if schema changed in a new release: system.query_log
, system.query_log_0
, system.query_log_1
, ... The query below help querying from all them:
SELECT * FROM merge(system, '^query_log')
Series: ClickHouse on Kubernetes
ClickHouse has been both exciting and incredibly challenging based on my experience migrating and scaling from Iceberg to ClickHouse, zero to a large cluster of trillions of rows. I have had to deal with many of use cases and resolve issues. I have been trying to take notes every day for myself, although it takes time to publish them as a series of blog posts. I hope I can do so on this ClickHouse on Kubernetes series.
Dynamic column selection (also known as a `COLUMNS` expression) allows you to match some columns in a result with a re2 regular expression.
Now that you have your first ClickHouse instance on Kubernetes and are starting to use it, you need to monitoring and observing what happens on it is an important task to achieve stability.
After starting this series ClickHouse on Kubernetes, you can now configure your first single-node ClickHouse server. Let's dive into creating your first table and understanding the basic concepts behind the ClickHouse engine, its data storage, and some cool features
My favorite ClickHouse table engine is `ReplacingMergeTree`. The main reason is that it is similar to `MergeTree` but can automatically deduplicate based on columns in the `ORDER BY` clause, which is very useful.
Now you have a large single node cluster with a ReplacingMergeTree table that can deduplicate itself. This time, you need more replicated nodes to serve more data users or improve the high availability.