我在 GCP 中有一个 Linux 盒子,安装了 docker,Prometheus 作为容器运行。
我还有 2 个安装了 istio 的 GKE 集群(A 和 B)(仅作为入口控制器) - 未启用服务网格。
所有 3 个项目都在不同的项目中。 linux box 项目与集群“A”的项目对等,而集群“A”的项目与包含集群“B”的项目对等。
基本上,GKE“A”集群位于 Linux 机器和 GKE“B”集群之间,就像代理一样。
我正在尝试从 GKE“B”中的 Prometheus 服务器抓取指标。
在 GKE“A”内部,我有以下网关、serviceEntry 和虚拟服务
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: istio-ilb
namespace: istio-ingress
spec:
selector:
istio: ingress
servers:
---rest ommited
- hosts:
- '*'
port:
name: http-web
number: 9090
protocol: HTTP
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
spec:
endpoints:
- address: 192.XX.XX.XX # istio loadbalancer IP of GKE B
hosts:
- prometheus.B.infra.internal
location: MESH_EXTERNAL
ports:
- name: http-web
number: 9090
protocol: HTTP
resolution: DNS
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
annotations:
external-dns.alpha.kubernetes.io/target: ilb.A.infra.internal
labels:
app.kubernetes.io/instance: kube-prometheus
name: kube-prometheus
namespace: monitoring
spec:
gateways:
- istio-ingress/istio-ilb
hosts:
- prometheus.A.infra.internal
http:
- match:
- uri:
exact: /metrics
route:
- destination:
host: prometheus.B.infra.internal
port:
number: 9090
- match:
- uri:
prefix: /
route:
- destination:
host: prometheus.B.infra.internal
port:
number: 9090
在 GKE 'B 中;我有以下配置:
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: istio-ilb
namespace: istio-ingress
spec:
selector:
istio: ingress
servers:
---rest ommited
- hosts:
- '*'
port:
name: http-web
number: 9090
protocol: HTTP
piVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
annotations:
external-dns.alpha.kubernetes.io/target: ilb.B.infra.internal
labels:
app.kubernetes.io/instance: kube-prometheus
name: kube-prometheus
namespace: monitoring
spec:
gateways:
- istio-ingress/istio-ilb
hosts:
- prometheus.B.infra.internal
http:
- match:
- uri:
exact: /metrics
route:
- destination:
host: kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local
port:
number: 9090
- match:
- uri:
prefix: /
route:
- destination:
host: kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local
port:
number: 9090
集群 B 中的端点
kube-prometheus-kube-prome-prometheus 10.31.0.89:9090,10.31.0.89:8080
我在每个集群的 istio 负载均衡器中收到以下错误
GKE“A”
[2024-07-25T14:32:53.435Z] "GET /metrics HTTP/1.1" 404 - via_upstream - "-" 0 0 2 2 "192.168.31.94" "Prometheus/2.52.0" "a4cfdd83-9e23-47e7-8f │
│ 35-7e0f5d6e1943" "prometheus.A.infra.internal:9090" "192.<cluster B IP>:9090" outbound|9090||prometheus.B.infra.internal 10.33.2.167:42320 10.33.2.167:9090 192.<cluster A IP>:42 │
│ 890 - -
GKE“B”
[2024-07-25T14:38:27.043Z] "GET /metrics HTTP/1.1" 404 NR route_not_found - "-" 0 0 0 - "192.<cluster A IP>,192.<cluster B IP>" "Prometheus/2.52.0" "367 │
│ 5e168-e5c8-435c-8e18-43b07162df76" "prometheus.A.infra.internal:9090" "-" - - 10.31.0.31:9090 192.168.30.83:21654 - -
从 linux box 到 GKE 'A' 的卷曲和 nslookup
curl -I http://prometheus.A.internal:9090/metrics
HTTP/1.1 404 Not Found
date: Thu, 25 Jul 2024 14:41:45 GMT
server: istio-envoy
x-envoy-upstream-service-time: 2
transfer-encoding: chunked
nslookup prometheus.A.infra.internal
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
prometheus.A.infra.internal canonical name = ilb.A.infra.internal.
Name: ilb.A.infra.internal
Address: 192.<cluster A IP>
来自 GKE“A”中的 Pod
kubectl run curlpod-proxy --image=radial/busyboxplus:curl -i --tty --rm
root@curlpod-proxy:/ ]$ nslookup prometheus.B.infra.internal
Server: 10.32.240.10
Address 1: 10.32.240.10 kube-dns.kube-system.svc.cluster.local
Name: prometheus.B.infra.internal
Address 1: 192.<cluster B IP>
curl -s http://prometheus.B.infra.internal:9090/metrics | head -n 5
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 16205
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
istioctl pc 路由命令
NAME VHOST NAME DOMAINS MATCH VIRTUAL SERVICE
http.80 prometheus.A.infra.internal:80 prometheus.A.infra.internal /metrics kube-prometheus.monitoring
http.80 prometheus.A.infra.internal:80 prometheus.A.infra.internal /* kube-prometheus.monitoring
http.9090 prometheus.B.infra.internal:9090 prometheus.B.infra.internal /metrics kube-prometheus.monitoring
http.9090 prometheus.B.infra.internal:9090 prometheus.B.infra.internal /* kube-prometheus.monitoring
我做错了什么? - 我想做的事情可能吗?
当 istio-ingress 网关返回 404 时,通常意味着为端点端口配置了一个侦听器,但是没有匹配该特定请求的路由,因此 Istio 使用返回 404 的默认路由。
区分 Istio 返回的 404 和应用程序本身返回的 404 非常重要。如果 istio-ingressgateway 返回 404,您将在其访问日志中看到 404 NR 条目,并且没有流量会到达实际的应用程序后端(即请求不会显示在应用程序的 sidecar 日志中)。
尝试使用 istioctlanalyze 对集群进行故障排除。
另请检查此入口网关故障排除,这可能会有所帮助。
参考此Stack Link,这将帮助您解决问题。