AWS EKS 外部 DNS 不断删除和重新创建记录

问题描述 投票:0回答:1

我有一个 EKS 集群,它使用 external-dns 控制器在 Route53 中为入口创建 DNS 记录。这一直在无缝工作,直到最近它开始删除和重新创建记录集,导致应用程序每分钟关闭并重新上线。

这是我的入口清单的示例:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-ingress
  namespace: test
  annotations:
    external-dns.alpha.kubernetes.io/hostname: stg.test.domain.com
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.name: "staging-external"
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
  ingressClassName: alb
  rules:
  - host: "stg.test.domain.com"
    http:
      paths:
      - pathType: Prefix
        path: /
        backend:
          service:
            name: test-service. ##service name
            port:
              number: 80

编辑 外部 dns pod 日志

time="2025-01-10T08:51:45Z" level=debug msg="Refreshing zones list cache"
time="2025-01-10T08:51:45Z" level=debug msg="Considering zone: /hostedzone/<hostedzonename> (domain: domain.com.)"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/service-name"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service flux-system/notification-controller"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service flux-system/source-controller"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/metrics-server"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/aws-load-balancer-webhook-service"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service external-secrets/external-secrets-webhook"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service flux-system/webhook-receiver"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service default/external-dns"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service default/kubernetes"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/eks-extension-metrics-api"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/service-name-ingress: [app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com [] app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app4.domain.com 300 IN A alb-FQDN.amazonaws.com [{alias true}], setting ttl=300"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Refreshing zones list cache"
time="2025-01-10T08:51:46Z" level=debug msg="Considering zone: /hostedzone/<hostedzonename> (domain: domain.com.)"
time="2025-01-10T08:51:46Z" level=info msg="Applying provider record filter for domains: [domain.com. .domain.com.]"
time="2025-01-10T08:51:46Z" level=debug msg="Refreshing zones list cache"
time="2025-01-10T08:51:46Z" level=debug msg="Considering zone: /hostedzone/<hostedzoneId> (domain: domain.com.)"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1-backend.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app2.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app3.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app4.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app5.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app1.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1-backend.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app1-backend.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app2.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app2.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app3.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app3.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app4.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app4.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app5.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app5.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app3.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app3.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app2.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app2.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app3.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app2.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app1-backend.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app1.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app4.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app5.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1-backend.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1-backend.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app4.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app4.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app5.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app5.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="18 record(s) were successfully updated" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.

不断重复这些动作

amazon-web-services kubernetes amazon-eks amazon-route53 external-dns
1个回答
0
投票

我找出了导致问题的原因。

所以我有两个几乎相同的集群(暂存和生产),它们都在外部 DNS 控制器中的 Route53 上使用相同的托管区域,因此它们都可以访问那里的所有记录。因此,我没有检查的日志是生产集群上的外部 dns 控制器上的日志,该日志实际上记录了 DELETE 事件,导致登台集群继续重新创建它们。

通过将以下参数添加到 external-dns 部署清单来修复此问题,以确保每个 external-dns 实例仅有权管理其创建的记录。

containers:
        - name: external-dns
          ## other config ...
          args:
            - --txt-owner-id=unique.staging.cluster.string.id
            ## other args ...

--txt-owner-id 参数为每个记录提供一个唯一的字符串 Id,使用该 ID 进行管理不会发生冲突。

感谢大家的时间和建议

© www.soinside.com 2019 - 2024. All rights reserved.