我正在使用 Amazon EKS 运行 kubernetes 1.25 集群。 我使用 Helm 图表部署了 Anchore 应用程序。我修改了容器映像以从我的 AWS ECR 存储库而不是 docker 中提取。
查看其中一个 pod 的日志,我发现它正在尝试访问数据库服务但无法解析它。
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[MainThread] 2023-04-30T00:06:41.155167 [anchore_enterprise_manager.util.db/connect_database()] [INFO] DB attempting to connect...
[MainThread] 2023-04-30T00:06:41.156165 [anchore_enterprise_manager.util.db/connect_database()] [WARN] DB connection failed, retrying - exception: test connection failed - exception: (psycopg2.OperationalError) could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known
这是我的 postgresql 服务 ➜ ~ k 获取服务 postgres-postgresql 名称类型集群 IP 外部 IP 端口年龄 postgres-postgresql ClusterIP 172.20.191.83 5432/TCP 27h
➜ ~ k 获取端点 postgres-postgresql 名称端点年龄 postgres-postgresql 10.1.0.74:5432 27h
postgres pod 日志中没有任何内容。
我已经验证 AWS 安全组是完全开放的,并允许集群和点头之间的所有流量。 已验证的核心 DNS 正在运行。 启动一个繁忙的 box pod 并解决上述服务。
➜ anchore git:(main) ✗ k exec -it busybox-pod -- nslookup postgresql.anchore.svc.cluster.local
Server: 172.20.0.10
Address: 172.20.0.10:53
Name: postgresql.anchore.svc.cluster.local
Address: 172.20.191.83
这是来自 postgresql pod 的日志
k logs postgres-postgresql-59468ff768-zhn6z
Defaulted container "postgresql" out of: postgresql, postgres-postgresql
PostgreSQL Database directory appears to contain a database; Skipping initialization
2023-04-30 14:52:22.289 UTC [1] LOG: starting PostgreSQL 14.6 (Debian 14.6-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-04-30 14:52:22.289 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2023-04-30 14:52:22.289 UTC [1] LOG: listening on IPv6 address "::", port 5432
2023-04-30 14:52:22.292 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-04-30 14:52:22.296 UTC [27] LOG: database system was shut down at 2023-04-30 14:52:21 UTC
2023-04-30 14:52:22.300 UTC [1] LOG: database system is ready to accept connections
我已验证 svc 选择器与 pod 标签匹配。
➜ anchore git:(main) ✗ k describe svc postgresql
Name: postgresql
Namespace: anchore
Labels: app=postgresql
app.kubernetes.io/managed-by=Helm
chart=postgresql-1.0.1
heritage=Helm
release=postgres
Annotations: meta.helm.sh/release-name: postgres
meta.helm.sh/release-namespace: anchore
Selector: app=postgresql,release=postgres
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.191.83
IPs: 172.20.191.83
Port: postgresql 5432/TCP
TargetPort: postgresql/TCP
Endpoints:
Session Affinity: None
Events: <none>
k describe pods postgres-postgresql-59468ff768-zhn6z
Name: postgres-postgresql-59468ff768-zhn6z
Namespace: anchore
Priority: 0
Service Account: default
Node: ip-10-1-0-223.us-gov-east-1.compute.internal/10.1.0.223
Start Time: Sun, 30 Apr 2023 09:52:21 -0500
Labels: app=postgresql
pod-template-hash=59468ff768
release=postgres
Annotations: <none>
Status: Running
IP: 10.1.0.95
IPs:
IP: 10.1.0.95
Controlled By: ReplicaSet/postgres-postgresql-59468ff768
Containers:
postgresql:
Container ID: containerd://4a76d4582bc4e443cd9dc93e578576f13de0194cc36ec1acff62e5e45dd0e070
Image: 247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres:14
Image ID: 247301905713.dkr.ecr.us-gov-east-1.amazonaws.com/postgres@sha256:db02f92063fb6083cb9dbf9d967ae0563d17d1e6332b6dfba6bdd7266c420ffa
Port: 5432/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 30 Apr 2023 09:52:22 -0500
Ready: True
Restart Count: 0
我还想补充一点,我在几个 pod 中看到就绪/实时探测失败。
我已经确认没有使用网络策略。没有 IP 表。没有安全组阻止流量。
消息类型原因年龄
Warning BackOff 17m (x5347 over 43h) kubelet Back-off 重启失败的容器
Warning Unhealthy 7m26s (x13887 over 43h) kubelet Readiness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (7) Failed to connect to localhost port 8089: Connection refused
Warning Unhealthy 2m30s (x14341 over 43h) kubelet Readiness probe failed: Get "http://10.1.1.67:8668/health": dial tcp 10.1.1.67:8668: connect: connection refused
如果有人能指出我正确的方向,将不胜感激。我现在只研究了 k8s 大约 2 个月,所以我可能在这里犯了一个明显的错误。如果有任何其他输出在这里有帮助,请告诉我。
我试过了
这个错误:
could not translate host name "postgresql.anchore.svc.cluster.local:5432" to address: Name or service not known
在我看来,
:5432
已包含在主机名中。您尚未共享应用程序配置或此主机名的传入方式,但请确保主机名不包含端口。