我正在使用 Terraform 设置由 Amazon EMR 管理的 Trino 集群。
这是我的 Terraform 代码:
resource "aws_emr_cluster" "hm_amazon_emr_cluster" {
name = "hm-trino"
release_label = "emr-7.1.0"
applications = ["HCatalog", "Trino"]
master_instance_fleet {
name = "Primary"
target_on_demand_capacity = 3
launch_specifications {
on_demand_specification {
allocation_strategy = "lowest-price"
}
}
instance_type_configs {
weighted_capacity = 1
instance_type = "r7g.xlarge"
}
}
# ...
configurations_json = <<EOF
[
{
"Classification": "trino-connector-hive",
"Properties": {
"hive.metastore": "glue"
}
}
]
EOF
}
为此 Trino 集群启用 高可用性 (HA),此外
HCatalog
。master_instance_fleet.target_on_demand_capacity = 3
。trino-connector-hive
以在configurations_json中使用glue
。我需要在“AWS Glue 数据目录设置”中设置“用于 Hive 表元数据”,如下 UI:
但是,我在 https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_cluster
没有找到任何有关设置此配置的信息有什么想法吗?
基本上我需要添加
hive-site
和 "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
,这是最终的代码:
resource "aws_emr_cluster" "hm_amazon_emr_cluster" {
name = "hm-trino"
release_label = "emr-7.1.0"
applications = ["HCatalog", "Trino"]
master_instance_fleet {
name = "Primary"
target_on_demand_capacity = 3
launch_specifications {
on_demand_specification {
allocation_strategy = "lowest-price"
}
}
instance_type_configs {
weighted_capacity = 1
instance_type = "r7g.xlarge"
}
}
# ...
configurations_json = <<EOF
[
{
"Classification": "hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "trino-connector-hive",
"Properties": {
"hive.metastore": "glue"
}
}
]
EOF
}