有没有办法在cassandra中索引地图类型列

Question

我有一个表用户，它将包含数百万个数据。

表模式如下cassandra -

CREATE TABLE susbcriber (
    id int PRIMARY KEY,
    age_identifier text,
    alternate_mobile_identifier text,
    android_identifier text,
    batch_id text,
    circle text,
    city_identifier text,
    country text,
    country_identifier text,
    created_at text,
    deleted_at text,
    email_identifier text,
    gender_identifier text,
    ios_identifier text,
    list_master_id int,
    list_subscriber_id text,
    mobile_identifier text,
    operator text,
    partition_id text,
    raw_data map<text, text>,
    region_identifier text,
    unique_identifier text,
    updated_at text,
    web_push_identifier text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

我必须主要在'raw_data map<text, text>,'上进行过滤查询，这个列包含JSON值和密钥，我如何对数据建模以便选择和更新必须快速执行？

我正在尝试实现一些批量更新操作。

任何建议都非常感谢。

Answer 1

如果数据已经在地图中，那么你真的不需要将值保存在它们自己的列中，如果它只是地图的一个键，那么cassandra就可以更容易地将它表示为一个聚类键而不是像以下那样的集合：

CREATE TABLE susbcriber_data (
    id int,
    key text,
    value text,
    PRIMARY KEY((id), key))

然后您可以通过任何ID和密钥进行查询。如果您正在查找特定键的值大于的位置

CREATE TABLE susbcriber_data_by_value (
    id int,
    shard int,
    key text,
    value text,
    PRIMARY KEY((key, shard), value, id))

然后，当您插入时，将shard设置为id % 12或某个值，使您的分区不会变大（需要根据预期的负载进行一些猜测）。然后，要查看key = value所需的所有值，您需要查询所有12个分片（对每个分片的异步调用并合并）。虽然如果键/值对的基数足够低，则可能不需要分片。然后，您将获得可以查找的ID列表。如果你想避免查找，你可以为该表添加一个额外的键和值，但是你的数据可能会爆炸很多，具体取决于你在地图中拥有的键数，并保持更新一切都会很痛苦。

我不推荐但可用的选项是索引地图，即：

CREATE INDEX raw_data_idx ON susbcriber ( ENTRIES (raw_data) );

SELECT * FROM susbcriber WHERE raw_data['ios_identifier'] = 'id';

记住issues with secondary indexes。

有没有办法在cassandra中索引地图类型列

问题描述投票：0回答：1

1个回答

最新问题

有没有办法在cassandra中索引地图类型列

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1