我正在尝试在 terraform 中编写基于日志的警报策略。
每当日志中出现特定消息时,我想近乎实时地生成警报。具体来说,我想知道 Composer DAG 何时失败。
我成功地使用以下查询过滤器在控制台中设置了基于日志的警报:
resource.type="cloud_composer_environment"
severity="ERROR"
log_name="projects/my_project/logs/airflow-scheduler"
resource.labels.project_id="project-id"
textPayload=~"my_dag_name"
但是,我无法将这个基于日志的警报策略转换为 terraform 作为“google_monitoring_alert_policy”。
我尝试将以下过滤条件添加到地形中
google_monitoring_alert_policy
:
filter = "resource.type=cloud_composer_environment AND resource.label.project_id=${var.project} AND log_name=projects/${var.project}/logs/airflow-scheduler AND severity=ERROR AND textPayload=~my_dag_name"
但是运行
terraform apply
时,出现以下错误:
build 10-Nov-2022 12:21:00 [31mâ[0m [0m[1m[31mError: [0m[0m[1mError creating AlertPolicy: googleapi: Error 400: Field alert_policy.conditions[0].condition_threshold.filter had an invalid value of "resource.type=cloud_composer_environment AND resource.labels.project_id=my_project AND log_name=projects/my_project/logs/airflow-scheduler AND severity=ERROR AND textPayload=my_dag_name": The lefthand side of each expression must be prefixed with one of {group, metadata, metric, project, resource}.[0m
所以我有两个问题:
可以在 terraform 中配置“基于日志的”警报吗?
如何在 terraform 中设置警报以过滤日志“textPayload”字段中的特定字符串?
据我所知,您想创建一个
log based metric
。
在这种情况下,您首先需要使用 Terraform 创建这个
log based metric
:
在 json 文件中配置指标的示例,
logging_metrics.json
:
{
"metrics": {
"composer_dags_tasks_bigquery_errors": {
"name": "composer_dags_tasks_bigquery_errors",
"filter": "severity=ERROR AND resource.type=\"cloud_composer_environment\" AND textPayload =~ \"{taskinstance.py:.*} ERROR -.*bigquery.googleapis.com/bigquery/v2/projects\"",
"description": "Metric for Cloud Composer Bigquery tasks errors.",
"metric_descriptor": {
"metric_kind": "DELTA",
"value_type": "INT64",
"labels": [
{
"key": "task_id",
"value_type": "STRING",
"description": "Task ID of current Airflow task",
"extractor": "EXTRACT(labels.\"task-id\")"
},
{
"key": "execution_date",
"value_type": "STRING",
"description": "Execution date of the current Airflow task",
"extractor": "EXTRACT(labels.\"execution-date\")"
}
]
}
}
}
}
此指标过滤
BigQuery
日志中的 Composer
错误。
我在 label
DAG
和任务 task_id
上使用了 execution_date
提取器,根据这些参数使该指标具有唯一性。
检索
locals.tf
文件中的指标:
locals {
logging_metrics = jsondecode(file("${path.module}/resource/logging_metrics.json"))["metrics"]
}
resource "google_logging_metric" "logging_metrics" {
for_each = local.logging_metrics
project = var.project_id
name = "${each.value["name"]}"
filter = each.value["filter"]
description = each.value["description"]
metric_descriptor {
metric_kind = each.value["metric_descriptor"]["metric_kind"]
value_type = each.value["metric_descriptor"]["value_type"]
dynamic "labels" {
for_each = try(each.value["metric_descriptor"]["labels"], [])
content {
key = try(labels.value["key"], null)
value_type = try(labels.value["value_type"], null)
description = try(labels.value["description"], null)
}
}
}
label_extractors = {for label in try(each.value["metric_descriptor"]["labels"], []): label.key => label.extractor}
}
然后根据之前的内容创建警报资源
log based metric
:
resource "google_monitoring_alert_policy" "alert_policy" {
project = var.project_id
display_name = "alert_name"
combiner = "..."
conditions {
display_name = "alert_name"
condition_threshold {
filter = "metric.type=\"logging.googleapis.com/user/composer_dags_tasks_bigquery_errors\" AND resource.type=\"cloud_composer_environment\""
...........
}
警报策略资源使用之前通过
log based metric
创建的 metric.type
。
如果您使用
condition_matched_log
块,您可以设置基于日志的警报策略。有关更多信息,请查看文档。
resource "google_monitoring_alert_policy" "alert_policy" {
documentation {
content = "Test Message"
mime_type = "text/markdown"
subject = "Alert"
}
combiner = "OR"
conditions {
display_name = "Condition 1"
condition_matched_log {
filter = <<-EOT
resource.type="cloud_composer_environment"
severity="ERROR"
log_name="projects/my_project/logs/airflow-scheduler"
resource.labels.project_id="project-id"
textPayload=~"my_dag_name"
EOT
}
}
alert_strategy {
notification_rate_limit { # this block with 'period' parameter is mandatory for log-based alerts
period = "86400.0s"
}
}
notification_channels = [google_monitoring_notification_channel.email.name]
}