我有一个 EKS 集群,它使用 external-dns 控制器在 Route53 中为入口创建 DNS 记录。这一直在无缝工作,直到最近它开始删除和重新创建记录集,导致应用程序每分钟关闭并重新上线。
这是我的入口清单的示例:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: test-ingress
namespace: test
annotations:
external-dns.alpha.kubernetes.io/hostname: stg.test.domain.com
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/group.name: "staging-external"
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
ingressClassName: alb
rules:
- host: "stg.test.domain.com"
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: test-service. ##service name
port:
number: 80
编辑 外部 dns pod 日志
time="2025-01-10T08:51:45Z" level=debug msg="Refreshing zones list cache"
time="2025-01-10T08:51:45Z" level=debug msg="Considering zone: /hostedzone/<hostedzonename> (domain: domain.com.)"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/service-name"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service flux-system/notification-controller"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service flux-system/source-controller"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/metrics-server"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/aws-load-balancer-webhook-service"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service external-secrets/external-secrets-webhook"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service flux-system/webhook-receiver"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service default/external-dns"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service default/kubernetes"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/eks-extension-metrics-api"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service kube-system/kube-dns"
time="2025-01-10T08:51:46Z" level=debug msg="No endpoints could be generated from service namespace/servicename"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/service-name-ingress: [app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com [] app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Endpoints generated from ingress: namespace/servicename-ingress: [app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [] app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []]"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Removing duplicate endpoint app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com []"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app1.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app2.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app3.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app3-backend.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app4.domain.com 300 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app4.domain.com 300 IN A alb-FQDN.amazonaws.com [{alias true}], setting ttl=300"
time="2025-01-10T08:51:46Z" level=debug msg="Modifying endpoint: app5.domain.com 0 IN CNAME alb-FQDN.amazonaws.com [], setting alias=true"
time="2025-01-10T08:51:46Z" level=debug msg="Refreshing zones list cache"
time="2025-01-10T08:51:46Z" level=debug msg="Considering zone: /hostedzone/<hostedzonename> (domain: domain.com.)"
time="2025-01-10T08:51:46Z" level=info msg="Applying provider record filter for domains: [domain.com. .domain.com.]"
time="2025-01-10T08:51:46Z" level=debug msg="Refreshing zones list cache"
time="2025-01-10T08:51:46Z" level=debug msg="Considering zone: /hostedzone/<hostedzoneId> (domain: domain.com.)"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1-backend.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app2.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app3.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app4.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app5.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app1.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app1-backend.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app1-backend.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app2.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app2.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app3.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app3.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app4.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app4.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding app5.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=debug msg="Adding cname-app5.domain.com. to zone domain.com. [Id: /hostedzone/<hostedzoneId>]"
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app3.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app3.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app2.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app2.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app3.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app2.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app1-backend.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app1.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app4.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE cname-app5.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1-backend.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1-backend.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app1.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app4.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app4.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app5.domain.com A" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="Desired change: CREATE app5.domain.com TXT" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
time="2025-01-10T08:51:46Z" level=info msg="18 record(s) were successfully updated" profile=default zoneID=/hostedzone/<hostedzoneId> zoneName=domain.com.
不断重复这些动作
我找出了导致问题的原因。
所以我有两个几乎相同的集群(暂存和生产),它们都在外部 DNS 控制器中的 Route53 上使用相同的托管区域,因此它们都可以访问那里的所有记录。因此,我没有检查的日志是生产集群上的外部 dns 控制器上的日志,该日志实际上记录了 DELETE 事件,导致登台集群继续重新创建它们。
通过将以下参数添加到 external-dns 部署清单来修复此问题,以确保每个 external-dns 实例仅有权管理其创建的记录。
containers:
- name: external-dns
## other config ...
args:
- --txt-owner-id=unique.staging.cluster.string.id
## other args ...
--txt-owner-id 参数为每个记录提供一个唯一的字符串 Id,使用该 ID 进行管理不会发生冲突。
感谢大家的时间和建议