Cassandra SELECT 中允许的最大分区键数?

问题描述 投票:0回答:1

执行选择时 Cassandra 分区键的数量是否有限制?

SELECT * FROM series_pfc_gas WHERE as_of='2024-07-25'  AND name IN ('ZTP_FLX_B_LUX_ALL', 'ZTP_FLX_B_TCP_ALL', 'ZTP_FLX_B_EED_ALL', 'NCG_FLX_B_LUX_ALL', 'NCG_FLX_B_TCP_ALL', 'NCG_FLX_B_EED_ALL', 'PEG_FLX_B_LUX_ALL', 'PEG_FLX_B_TCP_ALL', 'PEG_FLX_B_EED_ALL', 'TTF_FLX_B_LUX_ALL', 'TTF_FLX_B_TCP_ALL', 'TTF_FLX_B_EED_ALL', 'ZTP_STC_B_LUX_ALL', 'ZTP_STC_B_TCP_ALL', 'ZTP_STC_B_EED_ALL', 'NCG_STC_B_LUX_ALL', 'NCG_STC_B_TCP_ALL', 'NCG_STC_B_EED_ALL', 'PEG_STC_B_LUX_ALL', 'PEG_STC_B_TCP_ALL', 'PEG_STC_B_EED_ALL', 'TTF_STC_B_LUX_ALL', 'TTF_STC_B_TCP_ALL', 'TTF_STC_B_EED_ALL')  AND time >= '2024-01-01T00:00:00.000+01:00' AND time < '2029-01-01T00:00:00.000+01:00'

将返回以下错误消息:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Select query cannot be completed because it selects 24 partitions keys - more than the maximum allowed 20"

这是表格的结构:

CREATE TABLE py2api.series_op_gas (
    name text,
    as_of timestamp,
    time timestamp,
    day int,
    month int,
    quarter int,
    se_year int,
    season int,
    value double,
    week int,
    wk_year int,
    year int,
    PRIMARY KEY ((name, as_of), time)
) WITH CLUSTERING ORDER BY (time ASC)
    AND additional_write_policy = '99PERCENTILE'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99PERCENTILE';

一些行的示例:

 name              | as_of                           | time                            | day | month | quarter | se_year | season | value      | week | wk_year | year
-------------------+---------------------------------+---------------------------------+-----+-------+---------+---------+--------+------------+------+---------+------
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-12 05:00:00.000000+0000 |  12 |     9 |       3 |    2023 |      1 |          0 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-13 05:00:00.000000+0000 |  13 |     9 |       3 |    2023 |      1 | 4.4409e-16 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-14 05:00:00.000000+0000 |  14 |     9 |       3 |    2023 |      1 | 4.4409e-16 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-15 05:00:00.000000+0000 |  15 |     9 |       3 |    2023 |      1 |          0 |   37 |    2023 | 2023
 ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-16 05:00:00.000000+0000 |  16 |     9 |       3 |    2023 |      1 | 4.4409e-16 |   37 |    2023 | 2023

有人想到了在 Cassandra 中编写密钥的另一种结构吗?

cassandra datastax
1个回答
0
投票

答案取决于版本和您所拥有的护栏:

partition_keys_in_select_warn_threshold: -1
partition_keys_in_select_fail_threshold: -1

https://cassandra.apache.org/_/blog/Apache-Cassandra-4.1-Features-Guardrails-Framework.html

如上所述,您在 IN 子句中添加太多分区键是在自找麻烦。协调器将负责收集您提供的所有分区键的数据,这可能会给协调器带来很大的负载。一般来说,我建议使用异步查询。从长远来看,它将防止随着负载增加而出现延迟问题和查询超时。

PS:由于按键太多,这可能取决于,但通常很低。如果你必须走这条路,我会把上限限制在 10 或 20 左右。

© www.soinside.com 2019 - 2024. All rights reserved.