如何在 Terraform 中设置基于 GCP 监控日志的警报?

问题描述 投票:0回答:2

我正在尝试在 terraform 中编写基于日志的警报策略。

每当日志中出现特定消息时,我想近乎实时地生成警报。具体来说,我想知道 Composer DAG 何时失败。

我成功地使用以下查询过滤器在控制台中设置了基于日志的警报:

resource.type="cloud_composer_environment"
severity="ERROR"
log_name="projects/my_project/logs/airflow-scheduler"
resource.labels.project_id="project-id"
textPayload=~"my_dag_name"

但是,我无法将这个基于日志的警报策略转换为 terraform 作为“google_monitoring_alert_policy”。

我尝试将以下过滤条件添加到地形中

google_monitoring_alert_policy

filter = "resource.type=cloud_composer_environment AND resource.label.project_id=${var.project} AND log_name=projects/${var.project}/logs/airflow-scheduler AND severity=ERROR AND textPayload=~my_dag_name"

但是运行

terraform apply
时,出现以下错误:

build   10-Nov-2022 12:21:00    [31mâ[0m [0m[1m[31mError: [0m[0m[1mError creating AlertPolicy: googleapi: Error 400: Field alert_policy.conditions[0].condition_threshold.filter had an invalid value of "resource.type=cloud_composer_environment AND resource.labels.project_id=my_project AND log_name=projects/my_project/logs/airflow-scheduler AND severity=ERROR AND textPayload=my_dag_name": The lefthand side of each expression must be prefixed with one of {group, metadata, metric, project, resource}.[0m

所以我有两个问题:

  1. 可以在 terraform 中配置“基于日志的”警报吗?

  2. 如何在 terraform 中设置警报以过滤日志“textPayload”字段中的特定字符串?

google-cloud-platform logging terraform policy
2个回答
2
投票

据我所知,您想创建一个

log based metric

在这种情况下,您首先需要使用 Terraform 创建这个

log based metric
:

在 json 文件中配置指标的示例,

logging_metrics.json
:

{
    "metrics": { 
        "composer_dags_tasks_bigquery_errors": {
            "name": "composer_dags_tasks_bigquery_errors",
            "filter": "severity=ERROR AND resource.type=\"cloud_composer_environment\" AND textPayload =~ \"{taskinstance.py:.*} ERROR -.*bigquery.googleapis.com/bigquery/v2/projects\"",
            "description": "Metric for Cloud Composer Bigquery tasks errors.",
            "metric_descriptor": {
                "metric_kind": "DELTA",
                "value_type": "INT64",
                "labels": [
                    {
                        "key": "task_id",
                        "value_type": "STRING",
                        "description": "Task ID of current Airflow task",
                        "extractor": "EXTRACT(labels.\"task-id\")"
                    },
                    {
                        "key": "execution_date",
                        "value_type": "STRING",
                        "description": "Execution date of the current Airflow task",
                        "extractor": "EXTRACT(labels.\"execution-date\")"
                    }
                ]
            }
        }
    }
}

此指标过滤

BigQuery
日志中的
Composer
错误。 我在
label
DAG
和任务
task_id
上使用了
execution_date
提取器,根据这些参数使该指标具有唯一性。

检索

locals.tf
文件中的指标:

locals {
  logging_metrics = jsondecode(file("${path.module}/resource/logging_metrics.json"))["metrics"]
}
resource "google_logging_metric" "logging_metrics" {
  for_each = local.logging_metrics
  project = var.project_id
  name = "${each.value["name"]}"
  filter = each.value["filter"]
  description = each.value["description"]
  metric_descriptor {
    metric_kind = each.value["metric_descriptor"]["metric_kind"]
    value_type = each.value["metric_descriptor"]["value_type"]

    dynamic "labels" {
      for_each = try(each.value["metric_descriptor"]["labels"], [])
      content {
        key = try(labels.value["key"], null)
        value_type = try(labels.value["value_type"], null)
        description = try(labels.value["description"], null)
      }
    }
  }

  label_extractors = {for label in try(each.value["metric_descriptor"]["labels"], []): label.key => label.extractor}
}

然后根据之前的内容创建警报资源

log based metric
:

resource "google_monitoring_alert_policy" "alert_policy" {
  project = var.project_id
  display_name = "alert_name"
  combiner = "..."
  conditions {
    display_name = "alert_name"
    condition_threshold {
      filter = "metric.type=\"logging.googleapis.com/user/composer_dags_tasks_bigquery_errors\" AND resource.type=\"cloud_composer_environment\""
      ...........
}

警报策略资源使用之前通过

log based metric
创建的
metric.type


0
投票

如果您使用

condition_matched_log
块,您可以设置基于日志的警报策略。有关更多信息,请查看文档

resource "google_monitoring_alert_policy" "alert_policy" {
  documentation {
    content   = "Test Message"
    mime_type = "text/markdown"
    subject   = "Alert"
  }
  combiner = "OR"
  conditions {
    display_name = "Condition 1"
    condition_matched_log {
      filter = <<-EOT
            resource.type="cloud_composer_environment"
            severity="ERROR"
            log_name="projects/my_project/logs/airflow-scheduler"
            resource.labels.project_id="project-id"
            textPayload=~"my_dag_name"
    EOT
    }
  }

  alert_strategy {
    notification_rate_limit { # this block with 'period' parameter is mandatory for log-based alerts
      period = "86400.0s"
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.name]
}
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.