Kubernetes 环境中的 Apache ActiveMQ Artemis 集群将 Istio 代理注入 Artemis pod 记录了很多错误,显示主(活动)和从(备份)实例之间的连接丢失。我们还遇到了从 JBoss EAP 到 Artemis 的间歇性连接丢失,该连接在随后的调用中恢复(例如,发送 JMS 消息时)。
活动(主)实例的 tcpdump 显示了备份和活动 artemis 实例之间连接的大量 RST(与 Artemis pod 没有注入 Istio 代理的环境的 tcpdump 相比)。
Artemis 接受器允许/使用的协议是 CORE、AMQP。为 Artemis 集群保留默认通信端口。
Artemis 集群用于 JMS 消息传递。 JMS 通信是从 Jboss EAP 执行的,它“位于”同一 kubernetes 命名空间中的另一个 pod 中。
静态连接器用于形成 Artemis 集群。复制机制用于数据交换。 Jboss 配置中也使用了静态连接器。
Istio 代理中的 TCP 和 HTTP 连接 idleTimeout 设置为无限(对于 INBOUND 和 OUTBOUND)。
当 Istio 代理未注入时,Artemis 日志不会显示任何错误,并且在这种情况下没有观察到 JMS 消息传递问题。
注意:安装 artemis/jboss 的容器上没有可用的 ping 命令(以防执行“活动检查”很重要)
来自活动(主)Artemis pod 的堆栈跟踪(多个片段):
2023-04-10 20:56:19,436 WARN [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 has been detected: AMQ219011: Did not receive data from server for org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection@6243c958[ID=fef52fc9, local= /<***artemis active instance's IP***>:55842, remote=<***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616] [code=CONNECTION_TIMEDOUT]
2023-04-10 20:56:19,562 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection
java.lang.IllegalStateException: No ActiveMQChannelHandler has been found while connecting to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 from Channel with id = cf68f05a
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:954) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:840) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:822) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1105) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1212) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1146) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1375) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:967) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:858) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:799) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:656) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:534) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:527) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$1.run(ClientSessionFactoryImpl.java:390) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:58) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:33) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:69) ~[artemis-commons-2.27.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.27.0.jar:?]
2023-04-11 01:22:33,180 WARN [org.apache.activemq.artemis.core.server] AMQ222092: Connection to the backup node failed, removing replication now
org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ229014: Did not receive data from /127.0.0.6:55653 within the 60000ms connection TTL. The connection will now be closed.
at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread$2.run(RemotingServiceImpl.java:781) ~[artemis-server-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:58) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:33) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:69) ~[artemis-commons-2.27.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.27.0.jar:?]
2023-04-11 01:22:33,405 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection
java.lang.IllegalStateException: No ActiveMQChannelHandler has been found while connecting to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 from Channel with id = f979331a
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:954) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:840) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:822) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1105) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1212) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1146) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1375) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:967) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:858) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:799) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:656) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:534) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:527) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$1.run(ClientSessionFactoryImpl.java:390) ~[artemis-core-client-2.27.0.jar:2.27.0]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:58) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:33) ~[artemis-commons-2.27.0.jar:?]
at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:69) ~[artemis-commons-2.27.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) ~[artemis-commons-2.27.0.jar:?]
来自 Jboss pod 的堆栈跟踪:
2023-04-10 20:56:10,649 AMQ214016: Failed to create netty connection: java.lang.IllegalStateException: No ActiveMQChannelHandler has been found while connecting to <***artemis backup instance's DNS***>/<***artemis backup instance's IP***>:61616 from Channel with id = 98ac7598
at [email protected]//org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:970)
at [email protected]//org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:856)
at [email protected]//org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnector.createConnection(NettyConnector.java:838)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.openTransportConnection(ClientSessionFactoryImpl.java:1097)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.createTransportConnection(ClientSessionFactoryImpl.java:1146)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.establishNewConnection(ClientSessionFactoryImpl.java:1378)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnection(ClientSessionFactoryImpl.java:952)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.getConnectionWithRetry(ClientSessionFactoryImpl.java:841)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:779)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:638)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:525)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:518)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$100(ClientSessionFactoryImpl.java:74)
at [email protected]//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$1.run(ClientSessionFactoryImpl.java:381)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
执行的操作: 我们尝试使用 EnvoyFilter 配置将 artemis 和 jboss Istio 代理的 INBOUND 和 OUTBOUND 连接的 idleTimeout 设置为无限(Envoy Filter 是 Istio 代理的配置资源 - 有助于配置入站/出站流量)。我们还为 jboss 中的 pooled-connection-factory(standallone-full.xml 文件)将 reconnect-attempts 设置为“-1”,将 connection-ttl 设置为“86400000”。
我们期望与 Artemis 的连接至少保持 24 小时,但它没有发生。
我想知道可能的根本原因是什么,以及需要对 Artemis 和 Jboss 安装应用哪些额外的配置以保持连接有效。这个问题与“keep alive”检查有关吗?
这个问题似乎与Artemis无关,而是与Kubernetes中用于Artemis集群的配置有关。