执行选择时 Cassandra 分区键的数量是否有限制?
SELECT * FROM series_pfc_gas WHERE as_of='2024-07-25' AND name IN ('ZTP_FLX_B_LUX_ALL', 'ZTP_FLX_B_TCP_ALL', 'ZTP_FLX_B_EED_ALL', 'NCG_FLX_B_LUX_ALL', 'NCG_FLX_B_TCP_ALL', 'NCG_FLX_B_EED_ALL', 'PEG_FLX_B_LUX_ALL', 'PEG_FLX_B_TCP_ALL', 'PEG_FLX_B_EED_ALL', 'TTF_FLX_B_LUX_ALL', 'TTF_FLX_B_TCP_ALL', 'TTF_FLX_B_EED_ALL', 'ZTP_STC_B_LUX_ALL', 'ZTP_STC_B_TCP_ALL', 'ZTP_STC_B_EED_ALL', 'NCG_STC_B_LUX_ALL', 'NCG_STC_B_TCP_ALL', 'NCG_STC_B_EED_ALL', 'PEG_STC_B_LUX_ALL', 'PEG_STC_B_TCP_ALL', 'PEG_STC_B_EED_ALL', 'TTF_STC_B_LUX_ALL', 'TTF_STC_B_TCP_ALL', 'TTF_STC_B_EED_ALL') AND time >= '2024-01-01T00:00:00.000+01:00' AND time < '2029-01-01T00:00:00.000+01:00'
将返回以下错误消息:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Select query cannot be completed because it selects 24 partitions keys - more than the maximum allowed 20"
这是表格的结构:
CREATE TABLE py2api.series_op_gas (
name text,
as_of timestamp,
time timestamp,
day int,
month int,
quarter int,
se_year int,
season int,
value double,
week int,
wk_year int,
year int,
PRIMARY KEY ((name, as_of), time)
) WITH CLUSTERING ORDER BY (time ASC)
AND additional_write_policy = '99PERCENTILE'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99PERCENTILE';
一些行的示例:
name | as_of | time | day | month | quarter | se_year | season | value | week | wk_year | year
-------------------+---------------------------------+---------------------------------+-----+-------+---------+---------+--------+------------+------+---------+------
ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-12 05:00:00.000000+0000 | 12 | 9 | 3 | 2023 | 1 | 0 | 37 | 2023 | 2023
ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-13 05:00:00.000000+0000 | 13 | 9 | 3 | 2023 | 1 | 4.4409e-16 | 37 | 2023 | 2023
ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-14 05:00:00.000000+0000 | 14 | 9 | 3 | 2023 | 1 | 4.4409e-16 | 37 | 2023 | 2023
ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-15 05:00:00.000000+0000 | 15 | 9 | 3 | 2023 | 1 | 0 | 37 | 2023 | 2023
ZTP_FLX_L_LUX_PHY | 2023-09-11 00:00:00.000000+0000 | 2023-09-16 05:00:00.000000+0000 | 16 | 9 | 3 | 2023 | 1 | 4.4409e-16 | 37 | 2023 | 2023
有人想到了在 Cassandra 中编写密钥的另一种结构吗?
答案取决于版本和您所拥有的护栏:
partition_keys_in_select_warn_threshold: -1
partition_keys_in_select_fail_threshold: -1
https://cassandra.apache.org/_/blog/Apache-Cassandra-4.1-Features-Guardrails-Framework.html
如上所述,您在 IN 子句中添加太多分区键是在自找麻烦。协调器将负责收集您提供的所有分区键的数据,这可能会给协调器带来很大的负载。一般来说,我建议使用异步查询。从长远来看,它将防止随着负载增加而出现延迟问题和查询超时。
PS:由于按键太多,这可能取决于,但通常很低。如果你必须走这条路,我会把上限限制在 10 或 20 左右。