Clickhouse 字符串函数

1.基本字符串操作

 

1.1判断是否为空

clickhouse> select empty(''),empty(' '),notEmpty(''),notEmpty(' ');

┌─empty('')─┬─empty(' ')─┬─notEmpty('')─┬─notEmpty(' ')─┐
│         1 │          0 │            0 │             1 │
└───────────┴────────────┴──────────────┴───────────────┘

1 rows in set. Elapsed: 0.007 sec.

clickhouse的函数名严格区分大小写。

'' 表示空字符串
' ' 表示的一个空格

1.2拼接concat

clickhouse> select concat('China',' Huawei',' Shenzhen') str;
┌─str───────────────────┐
│ China Huawei Shenzhen │
└───────────────────────┘

clickhouse> select concat('China', null) str;

┌─str──┐
│ ᴺᵁᴸᴸ │
└──────┘

1.3截取:substring(s, offset, length), mid(s, offset, length), substr(s, offset, length)

三个函数的用法和含义是一样的。

clickhouse> select substring('china',1,2) str;
┌─str─┐
│ ch  │
└─────┘

1.4字符串长度

length
char_length, CHAR_LENGTH
character_length, CHARACTER_LENGTH

clickhouse> select length('China'),char_length('China'),character_length('China');

┌─length('China')─┬─char_length('China')─┬─character_length('China')─┐
│               5 │                    5 │                         5 │
└─────────────────┴──────────────────────┴───────────────────────────┘


clickhouse> select length('中国'),char_length('中国'),character_length('中国');
┌─length('中国')─┬─char_length('中国')─┬─character_length('中国')─┐
│              6 │                   2 │                        2 │
└────────────────┴─────────────────────┴──────────────────────────┘

1.5字符串重复: repeat(s, n)

clickhouse> select repeat('China ',5)

┌─repeat('China ', 5)────────────┐
│ China China China China China  │
└────────────────────────────────┘

1.6字符串反转:reverse

1.7追加:appendTrailingCharIfAbsent(s, c)

1.8获取字符串的开头结尾:endsWith(s, suffix)  startsWith(str, prefix)

1.9 字符串大小写转换:

lower, lcase
upper, ucase

1.10 字符串trim函数:trim

trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)

trimLeft(input_string)  ==ltrim(input_string)
trimRight(input_string) ==rtrim(input_string)
trimBoth(input_string)  ==trim(input_string)

1.11 字符串加密和解密:

base64Encode(s)
base64Decode(s)
tryBase64Decode(s)
CRC32(s)
CRC32IEEE(s)
CRC64(s)

1.12 转换函数:

convertCharset(s, from, to)

 1.13其他函数:

concatAssumeInjective

format(pattern, s0, s1, …)

isValidUTF8
toValidUTF8

normalizeQuery
normalizedQueryHash

2.字符串查找

2.1 字符串定位:

position(haystack, needle), locate(haystack, needle),positionCaseInsensitive
语法:
position(haystack, needle[, start_pos])
locate(haystack, needle[, start_pos])
positionCaseInsensitive(haystack, needle[, start_pos])

2.2  多个字符串定位:

multiSearchAllPositions,multiSearchAllPositionsCaseInsensitive
multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])

multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])
multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])
multiSearchAny(haystack, [needle1, needle2, …, needlen])

2.3 字符串匹配:

match(haystack, pattern)
multiMatchAny(haystack, [pattern1, pattern2, …, patternn])
multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])
multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])
multiFuzzyMatchAny(haystack, distance, [pattern1, pattern2, …, patternn])
multiFuzzyMatchAnyIndex(haystack, distance, [pattern1, pattern2, …, patternn])
multiFuzzyMatchAllIndices(haystack, distance, [pattern1, pattern2, …, patternn])

 2.4 字符串提取:

extract(haystack, pattern)
extractAll(haystack, pattern)
extractAllGroupsHorizontal
extractAllGroupsVertical
注意:extractAllGroupsHorizontal function is slower than extractAllGroupsVertical.

2.5字符串模糊查询:

like(haystack, pattern), haystack LIKE pattern operator
notLike(haystack, pattern), haystack NOT LIKE pattern operator
ilike
ilike(haystack, pattern)

2.6

ngramDistance(haystack, needle)
ngramSearch(haystack, needle)

 2.7字符串出现的次数:

countSubstrings(haystack, needle)
countSubstrings(haystack, needle[, start_pos])

类似的函数:
countSubstrings()/countSubstringsCaseInsensitive()/countSubstringsCaseInsensitiveUTF8()

该功能在clickhouse 20.12.+版本实现。

clickhouse> select countSubstrings('I love China,he love china too,she also love china','china') china_count;

 

3.字符串替换

replaceOne(haystack, pattern, replacement)
replaceAll(haystack, pattern, replacement), replace(haystack, pattern, replacement)
replaceRegexpOne(haystack, pattern, replacement)
replaceRegexpAll(haystack, pattern, replacement)

regexpQuoteMeta(s)
Predefined characters: \0, \\, |, (, ), ^, $, ., [, ], ?, *, +, {, :, -.

4.字符串切割

splitByChar(separator, s)
splitByString(separator, s)
arrayStringConcat(arr[, separator])
alphaTokens(s)
extractAllGroups(text, regexp)

5.UTF8 相关的函数:

Clickhouse> select * from system.functions where name like '%UTF%';

SELECT *
FROM system.functions
WHERE name LIKE '%UTF%'

┌─name────────────────────────────────────────┬─is_aggregate─┬─case_insensitive─┬─alias_to─┐
│ ngramSearchCaseInsensitiveUTF8              │            0 │                0 │          │
│ ngramDistanceCaseInsensitiveUTF8            │            0 │                0 │          │
│ multiSearchFirstIndexCaseInsensitiveUTF8    │            0 │                0 │          │
│ multiSearchAnyUTF8                          │            0 │                0 │          │
│ positionCaseInsensitiveUTF8                 │            0 │                0 │          │
│ positionUTF8                                │            0 │                0 │          │
│ reverseUTF8                                 │            0 │                0 │          │
│ lengthUTF8                                  │            0 │                0 │          │
│ ngramSearchUTF8                             │            0 │                0 │          │
│ randomStringUTF8                            │            0 │                0 │          │
│ javaHashUTF16LE                             │            0 │                0 │          │
│ multiSearchFirstIndexUTF8                   │            0 │                0 │          │
│ isValidUTF8                                 │            0 │                0 │          │
│ multiSearchAllPositionsCaseInsensitiveUTF8  │            0 │                0 │          │
│ multiSearchFirstPositionCaseInsensitiveUTF8 │            0 │                0 │          │
│ lowerUTF8                                   │            0 │                0 │          │
│ multiSearchAllPositionsUTF8                 │            0 │                0 │          │
│ multiSearchFirstPositionUTF8                │            0 │                0 │          │
│ substringUTF8                               │            0 │                0 │          │
│ multiSearchAnyCaseInsensitiveUTF8           │            0 │                0 │          │
│ toValidUTF8                                 │            0 │                0 │          │
│ ngramDistanceUTF8                           │            0 │                0 │          │
│ upperUTF8                                   │            0 │                0 │          │
└─────────────────────────────────────────────┴──────────────┴──────────────────┴──────────┘

23 rows in set. Elapsed: 0.016 sec. 

参考:

https://clickhouse.tech/docs/en/sql-reference/functions/string-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/string-replace-functions/

https://clickhouse.tech/docs/en/sql-reference/functions/splitting-merging-functions/

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 技术工厂 设计师:CSDN官方博客 返回首页