我们在 docker 和 kubernetes 环境中使用 curator 服务发现。 我们使用容器/Pod 的 DNS 名称设置连接字符串。 我看到的问题是它似乎将这些解释为 IP 地址。 容器或 Pod 可以更改 IP 地址,而 curator 似乎没有注意到更改。
如果我建立一个 3 节点 Zookeeper 集群并建立 1 个或多个代理,我所看到的行为。 然后,我一次滚动 1 个 Zookeeper 节点,它们每个都更改其 IP 地址,当我弹回第三个 Zookeeper 实例时,所有客户端都会失去连接。
有没有办法强制它始终使用 DNS 名称进行连接?
这是我的撰写示例
version: '2.4'
x-zookeeper:
&zookeeper-env
JVMFLAGS: -Dzookeeper.4lw.commands.whitelist=ruok
ZOO_ADMINSERVER_ENABLED: 'true'
ZOO_STANDALONE_ENABLED: 'false'
ZOO_SERVERS: server.1=zookeeper1:2888:3888;2181 server.2=zookeeper2:2888:3888;2181 server.3=zookeeper3:2888:3888;2181
x-agent:
&agent-env
ZK_CONNECTION: zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
SERVICE_NAME: myservice
services:
zookeeper1:
image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
restart: always
ports:
- 2181
- 8080
healthcheck:
test: echo ruok | nc localhost 2181 | grep imok
interval: 15s
environment:
<<: *zookeeper-env
ZOO_MY_ID: 1
zookeeper2:
image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
restart: always
ports:
- 2181
- 8080
healthcheck:
test: echo ruok | nc localhost 2181 | grep imok
interval: 15s
environment:
<<: *zookeeper-env
ZOO_MY_ID: 2
zookeeper3:
image: artifactory.rd2.thingworx.io/zookeeper:${ZOOKEEPER_IMAGE_VERSION}
restart: always
ports:
- 2181
- 8080
healthcheck:
test: echo ruok | nc localhost 2181 | grep imok
interval: 15s
environment:
<<: *zookeeper-env
ZOO_MY_ID: 3
agent1:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT1
agent2:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT2
agent3:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT3
agent4:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT4
agent5:
image: artifactory.rd2.thingworx.io/twxdevops/discovery-tool:latest
environment:
<<: *agent-env
GLOBAL_ID: AGENT5
运行步骤是
docker-compose up -d zookeeper1 zookeeper2 zookeeper3 agent1
docker-compose rm -sf zookeeper3
docker-compose up -d agent2
docker-compose up -d zookeeper3
docker-compose rm -sf zookeeper2
docker-compose up -d agent3
docker-compose up -d zookeeper2
docker-compose rm -sf zookeeper1
docker-compose up -d agent5
docker-compose up -d zookeeper1
在我杀死最后一个 Zookeeper 节点后,代理收到以下错误并且无法恢复。 您可以看到它正在引用 IP 地址
Path:null finished:false header:: 5923,4 replyHeader:: 5923,8589934594,0 request:: '/services/myservice/cc1996fb-cca5-4108-bd06-567b45f594d7,F response:: #7b226e616d65223a226d7973657276696365222c226964223a2263633139393666622d636361352d343130382d626430362d353637623435663539346437222c2261646472657373223a223137322e32312e302e33222c22706f7274223a383038302c2273736c506f7274223a6e756c6c2c227061796c6f6164223a7b2240636c617373223a22636f6d2e7468696e67776f72782e646973636f766572792e7a6b2e53657276696365496e7374616e636544657461696c73222c2261747472696275746573223a7b22474c4f42414c4944223a224147454e5433227d7d2c22726567697374726174696f6e54696d65555443223a313634393739313735353936322c227365727669636554797065223a2244594e414d4943222c2275726953706563223a7b227061727473223a5b7b2276616c7565223a2261646472657373222c227661726961626c65223a747275657d2c7b2276616c7565223a223a222c227661726961626c65223a66616c73657d2c7b2276616c7565223a22706f7274222c227661726961626c65223a747275657d5d7d7d,s{4294967301,4294967301,1649791757073,1649791757073,0,0,0,144117976615550976,404,0,4294967301}
agent1_1 | 19:48:46.438 [ServiceEventWatcher-myservice] DEBUG com.thingworx.discovery.zk.ZookeeperProvider - ZooKeeper resolved addresses for service myservice: [ServiceDefinition [serviceName=myservice, host=172.21.0.7, port=8080, tags={GLOBALID=AGENT2}], ServiceDefinition [serviceName=myservice, host=172.21.0.4, port=8080, tags={GLOBALID=AGENT1}], ServiceDefinition [serviceName=myservice, host=172.21.0.3, port=8080, tags={GLOBALID=AGENT3}]]
agent1_1 | 19:48:47.070 [main-SendThread(172.21.0.5:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x200028941eb0001 for sever service-discovery-docker-tests_zookeeper2_1.service-discovery-docker-tests_default/172.21.0.5:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
agent1_1 | org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x200028941eb0001, likely server has closed socket
agent1_1 | at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
agent1_1 | at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
agent1_1 | at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275)
agent1_1 | 19:48:47.171 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
agent1_1 | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] DEBUG org.apache.zookeeper.SaslServerPrincipal - Canonicalized address to 172.21.0.9
agent1_1 | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 172.21.0.9/172.21.0.9:2181.
agent1_1 | 19:48:47.363 [main-SendThread(172.21.0.9:2181)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error)
agent1_1 | 19:48:47.430 [ServiceEventWatcher-myservice] DEBUG com.thingworx.discovery.zk.ZookeeperProvider - Getting registered addresses from ZooKeeper for service myservice
Zookeeper 集群快乐又美好。 所以主要问题是有没有办法让它使用 DNS 名称而不是 IP 地址? 还应该提到服务发现使用临时节点,因此断开连接并重新连接是不好的。
在 Curator 5.4.0 中,这个问题有一个解决方案。 根据https://github.com/apache/curator/pull/452 和https://github.com/apache/curator/pull/425
您可以使用 Curator 5.4.0 或更高版本,您的问题已解决。