activemq Artemis Primary Pod更改为共享商店HA选项

问题描述 投票:0回答:0

Context

我正在尝试准备Artemis 2.37.0在AKS(K8S)群集上的工作对(单个主要+备份),并具有

垂直量的量(我们使用Azure存储帐户)。我们使用 -kube_ping进行地址发现。 我们已经使用复制功能了几个月,但是分裂的大脑问题经常发生。我想将其更改为

shared-store

.当前(不工作场景)

在更改共享商店解决方案后,我将面临4个步骤面对方案:

主机不起作用,备份更改为主要模式

〜30秒后重新启动后,备份转到备份模式
  1. 主要启动,备份转到主要模式
  2. 过程重复
  3. 指望的行为
  4. 主要工作原理而无需重新启动。备份可作为备份。 depebugging

I搜索了主要日志(我无法在此处粘贴5.5k行),并在POD重新启动之前找到了这些日志: 2024-09-24 08:03:05,344 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock appears to be valid; double check by reading status 2024-09-24 08:03:05,344 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] getting state... 2024-09-24 08:03:05,344 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] trying to lock position: 0 2024-09-24 08:03:05,350 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] locked position: 0 2024-09-24 08:03:05,350 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] lock: sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid] 2024-09-24 08:03:05,355 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] state: L 2024-09-24 08:03:05,355 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock appears to be valid; triple check by comparing timestamp 2024-09-24 08:03:05,357 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock file /var/lib/artemis-instance/data/journal/server.lock originally locked at 2024-09-24T08:02:33.067+0000 was modified at 2024-09-24T08:02:35.181+0000 2024-09-24 08:03:05,358 WARN [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lost the lock according to the monitor, notifying listeners 2024-09-24 08:03:05,358 ERROR [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL java.io.IOException: lost lock

同时,有与

netty

连接有关的错误,看起来更像是警告Artemis实例还没有盯着。

Artemis.Artemis.SVC.Cluster.local是主要的POD地址(如果我正确理解的话,主demartty在primarter上正确地询问它是否有效)。

2024-09-24 08:03:02,454 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection java.net.UnknownHostException: artemis.artemis.svc.cluster.local

问题
我错了什么?我想念一个重要的参数吗?也许有一些时间可以增加我在文档中错过的?

对于复制,相同的配置正在工作(主要启动而无需重新启动循环)。 配置文件 artemis-roles.properties: | amq = admin admin = admin,guest artemis-users.properties: | admin = admin guest = guest artemis.profile: | ARTEMIS_HOME='/opt/artemis' ARTEMIS_INSTANCE='/var/lib/artemis-instance' ARTEMIS_DATA_DIR='/var/lib/artemis-instance/data' ARTEMIS_ETC_DIR='/var/lib/artemis-instance/etc' ARTEMIS_OOME_DUMP='/var/lib/artemis-instance/log/oom_dump.hprof' ARTEMIS_INSTANCE_URI='file:/var/lib/artemis-instance/./' ARTEMIS_INSTANCE_ETC_URI='file:/var/lib/artemis-instance/./etc/' HAWTIO_ROLE='amq' if [ -z "$JAVA_ARGS" ]; then JAVA_ARGS="-XX:AutoBoxCacheMax=20000 -XX:+PrintClassHistogram -XX:+UseG1GC -XX:+UseStringDeduplication -Xms512M -Xmx2G -Dhawtio.disableProxy=true -Dhawtio.realm=activemq -Dhawtio.offline=true -Dhawtio.rolePrincipalClasses=org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal -Dhawtio.http.strictTransportSecurity=max-age=31536000;includeSubDomains;preload -Djolokia.policyLocation=${ARTEMIS_INSTANCE_ETC_URI}jolokia-access.xml -Dlog4j2.disableJmx=true " fi JAVA_ARGS="$JAVA_ARGS -Djava.net.preferIPv4Stack=true -Dipv4addr=$(hostname -i)" if [ "$1" = "run" ]; then : fi; broker.xml: | <configuration xmlns="urn:activemq" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xi="http://www.w3.org/2001/XInclude" xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd"> <core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:activemq:core "> <name>{{ include "artemis.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local</name> <persistence-enabled>true</persistence-enabled> <max-redelivery-records>1</max-redelivery-records> <paging-directory>/var/lib/artemis-instance/data/paging</paging-directory> <bindings-directory>/var/lib/artemis-instance/data/bindings</bindings-directory> <large-messages-directory>/var/lib/artemis-instance/data/large-messages</large-messages-directory> <id-cache-size xmlns="urn:activemq:core">20000</id-cache-size> <disk-scan-period>5000</disk-scan-period> <max-disk-usage>90</max-disk-usage> <critical-analyzer>true</critical-analyzer> <critical-analyzer-timeout>180000</critical-analyzer-timeout> <critical-analyzer-check-period>60000</critical-analyzer-check-period> <critical-analyzer-policy>SHUTDOWN</critical-analyzer-policy> <page-sync-timeout>512000</page-sync-timeout> <global-max-messages>-1</global-max-messages> <journal-type>ASYNCIO</journal-type> <journal-directory>/var/lib/artemis-instance/data/journal</journal-directory> <journal-datasync>true</journal-datasync> <journal-min-files>2</journal-min-files> <journal-pool-files>10</journal-pool-files> <journal-device-block-size>4096</journal-device-block-size> <journal-file-size>10M</journal-file-size> <journal-buffer-timeout>144000</journal-buffer-timeout> <journal-max-io>4096</journal-max-io> <xi:include href="/var/lib/artemis-instance/etc/acceptor.xml"/> <xi:include href="/var/lib/artemis-instance/etc/security-setting.xml"/> <xi:include href="/var/lib/artemis-instance/etc/cluster-connection.xml"/> <xi:include href="/var/lib/artemis-instance/etc/broadcast.xml"/> <xi:include href="/var/lib/artemis-instance/etc/address.xml"/> <xi:include href="/var/lib/artemis-instance/etc/address-setting.xml"/> <xi:include href="/var/lib/artemis-instance/etc/discovery.xml"/> <xi:include href="/var/lib/artemis-instance/etc/ha.xml"/> <xi:include href="/var/lib/artemis-instance/etc/connector.xml"/> </core> </configuration> acceptor.xml: | <acceptors xmlns="urn:activemq:core"> <acceptor name="artemis">tcp://0.0.0.0:{{ .Values.conf.protocols.netty.port }}?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;supportAdvisory=false;suppressInternalManagementObjects=false</acceptor> {{ if .Values.conf.protocols.amqp.enabled }} <acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor> {{ end }} {{ if .Values.conf.protocols.stomp.enabled }} <acceptor name="stomp">tcp://0.0.0.0:{{ .Values.conf.protocols.stomp.port }}?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor> {{ end }} {{ if .Values.conf.protocols.hornetq.enabled }} <acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor> {{ end }} {{ if .Values.conf.protocols.mqtt.enabled }} <acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor> {{ end }} {{ if .Values.conf.protocols.ws.enabled }} <acceptor name="stomp-ws-acceptor">tcp://0.0.0.0:61614?protocols=STOMP_WS</acceptor> {{ end }} </acceptors> ha.xml: | <ha-policy xmlns="urn:activemq:core"> # <replication> when replication enabled <shared-store> {{ .Values.conf.broker.ha | indent 20 }} </shared-store> </ha-policy> cluster-connection.xml: | <cluster-connections xmlns="urn:activemq:core"> <cluster-connection name="artemis"> <address>jms</address> <connector-ref>{{ include "artemis.fullname" . }}</connector-ref> <check-period>1000</check-period> <connection-ttl>5000</connection-ttl> <min-large-message-size>50000</min-large-message-size> <call-timeout>120000</call-timeout> <retry-interval>500</retry-interval> <retry-interval-multiplier>1.0</retry-interval-multiplier> <max-retry-interval>5000</max-retry-interval> <initial-connect-attempts>-1</initial-connect-attempts> <reconnect-attempts>-1</reconnect-attempts> <use-duplicate-detection>true</use-duplicate-detection> <forward-when-no-consumers>false</forward-when-no-consumers> <max-hops>1</max-hops> <confirmation-window-size>10000000</confirmation-window-size> <call-failover-timeout>30000</call-failover-timeout> <notification-interval>1000</notification-interval> <notification-attempts>2</notification-attempts> <discovery-group-ref discovery-group-name="jgroups-discovery" /> </cluster-connection> </cluster-connections> address.xml: | <addresses xmlns="urn:activemq:core"> <address name="DLQ"> <anycast> <queue name="DLQ" /> </anycast> </address> <address name="ExpiryQueue"> <anycast> <queue name="ExpiryQueue" /> </anycast> </address> </addresses> address-setting.xml: | <address-settings xmlns="urn:activemq:core"> <address-setting match="activemq.management#"> <dead-letter-address>DLQ</dead-letter-address> <expiry-address>ExpiryQueue</expiry-address> <redelivery-delay>0</redelivery-delay> <max-size-bytes>-1</max-size-bytes> <message-counter-history-day-limit>10</message-counter-history-day-limit> <address-full-policy>PAGE</address-full-policy> <auto-create-queues>true</auto-create-queues> <auto-create-addresses>true</auto-create-addresses> </address-setting> <address-setting match="#"> <dead-letter-address>DLQ</dead-letter-address> <expiry-address>ExpiryQueue</expiry-address> <redelivery-delay>0</redelivery-delay> <message-counter-history-day-limit>10</message-counter-history-day-limit> <address-full-policy>PAGE</address-full-policy> <auto-create-queues>true</auto-create-queues> <auto-create-addresses>true</auto-create-addresses> <auto-delete-queues>false</auto-delete-queues> <auto-delete-addresses>false</auto-delete-addresses> <page-size-bytes>10M</page-size-bytes> <max-size-bytes>-1</max-size-bytes> <max-size-messages>-1</max-size-messages> <max-read-page-messages>-1</max-read-page-messages> <max-read-page-bytes>20M</max-read-page-bytes> <page-limit-bytes>-1</page-limit-bytes> <page-limit-messages>-1</page-limit-messages> </address-setting> </address-settings> broadcast.xml: | <broadcast-groups xmlns="urn:activemq:core"> <broadcast-group name="jgroups-broadcast"> <jgroups-file>jgroups-discovery.xml</jgroups-file> <jgroups-channel>activemq_broadcast_channel</jgroups-channel> <connector-ref>{{ include "artemis.fullname" . }}</connector-ref> </broadcast-group> </broadcast-groups> discovery.xml: | <discovery-groups xmlns="urn:activemq:core" > <discovery-group name="jgroups-discovery"> <jgroups-file>jgroups-discovery.xml</jgroups-file> <jgroups-channel>activemq_broadcast_channel</jgroups-channel> <refresh-timeout>30000</refresh-timeout> </discovery-group> </discovery-groups> jgroups-discovery.xml: | <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:org:jgroups" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"> <TCP external_addr="match-interface:eth0" bind_addr="match-interface:eth0" bind_port="7800" thread_pool.min_threads="1" /> <org.jgroups.protocols.kubernetes.KUBE_PING primaryProtocol="https" namespace="{{ .Release.Namespace }}" labels="rl-type={{ .Values.conf.kubePing.name }}" /> <MERGE3 max_interval="30000" min_interval="10000"/> <FD_SOCK start_port="9000"/> <FD_ALL timeout="30000" interval="5000"/> <VERIFY_SUSPECT timeout="1500"/> <BARRIER /> <pbcast.NAKACK2 xmit_interval="500" xmit_table_num_rows="100" xmit_table_msgs_per_row="2000" xmit_table_max_compaction_time="30000" use_mcast_xmit="false" discard_delivered_msgs="true" /> <UNICAST3 xmit_table_num_rows="100" xmit_table_msgs_per_row="1000" xmit_table_max_compaction_time="30000"/> <pbcast.GMS print_local_addr="true" join_timeout="3000"/> <MFC max_credits="2M" min_threshold="0.4"/> <FRAG2 frag_size="60K"/> <pbcast.STATE_TRANSFER/> <COUNTER/> </config> log4j2.properties: | monitorInterval = 5 rootLogger = {{ .Values.conf.log_level }}, console, log_file logger.activemq.name=org.apache.activemq logger.activemq.level={{ .Values.conf.log_level }} logger.artemis_server.name=org.apache.activemq.artemis.core.server logger.artemis_server.level={{ .Values.conf.log_level }} logger.artemis_journal.name=org.apache.activemq.artemis.journal logger.artemis_journal.level={{ .Values.conf.log_level }} logger.artemis_utils.name=org.apache.activemq.artemis.utils logger.artemis_utils.level={{ .Values.conf.log_level }} logger.critical_analyzer.name=org.apache.activemq.artemis.utils.critical logger.critical_analyzer.level={{ .Values.conf.log_level }} logger.audit_base = OFF, audit_log_file logger.audit_base.name = org.apache.activemq.audit.base logger.audit_base.additivity = false logger.audit_resource = OFF, audit_log_file logger.audit_resource.name = org.apache.activemq.audit.resource logger.audit_resource.additivity = false logger.audit_message = OFF, audit_log_file logger.audit_message.name = org.apache.activemq.audit.message logger.audit_message.additivity = false logger.jetty.name=org.eclipse.jetty logger.jetty.level=WARN logger.authentication_filter.name=io.hawt.web.auth.AuthenticationFilter logger.authentication_filter.level=ERROR logger.curator.name=org.apache.curator logger.curator.level=WARN logger.zookeeper.name=org.apache.zookeeper logger.zookeeper.level=ERROR appender.console.type=Console appender.console.name=console appender.console.layout.type=PatternLayout appender.console.layout.pattern=%d %-5level [%logger] %msg%n appender.log_file.type = RollingFile appender.log_file.name = log_file appender.log_file.fileName = ${sys:artemis.instance}/log/artemis.log appender.log_file.filePattern = ${sys:artemis.instance}/log/artemis.log.%d{yyyy-MM-dd} appender.log_file.layout.type = PatternLayout appender.log_file.layout.pattern = %d %-5level [%logger] %msg%n appender.log_file.policies.type = Policies appender.log_file.policies.cron.type = CronTriggeringPolicy appender.log_file.policies.cron.schedule = 0 0 0 * * ? appender.log_file.policies.cron.evaluateOnStartup = true appender.audit_log_file.type = RollingFile appender.audit_log_file.name = audit_log_file appender.audit_log_file.fileName = ${sys:artemis.instance}/log/audit.log appender.audit_log_file.filePattern = ${sys:artemis.instance}/log/audit.log.%d{yyyy-MM-dd} appender.audit_log_file.layout.type = PatternLayout appender.audit_log_file.layout.pattern = %d [AUDIT](%t) %msg%n appender.audit_log_file.policies.type = Policies appender.audit_log_file.policies.cron.type = CronTriggeringPolicy appender.audit_log_file.policies.cron.schedule = 0 0 0 * * ? appender.audit_log_file.policies.cron.evaluateOnStartup = true management.xml: | <management-context xmlns="http://activemq.apache.org/schema"> <authorisation> <allowlist> <entry domain="hawtio"/> </allowlist> <default-access> <access method="list*" roles="amq"/> <access method="get*" roles="amq"/> <access method="is*" roles="amq"/> <access method="set*" roles="amq"/> <access method="browse*" roles="amq"/> <access method="count*" roles="amq"/> <access method="*" roles="amq"/> </default-access> <role-access> <match domain="org.apache.activemq.artemis"> <access method="list*" roles="amq"/> <access method="get*" roles="amq"/> <access method="is*" roles="amq"/> <access method="set*" roles="amq"/> <access method="browse*" roles="amq"/> <access method="count*" roles="amq"/> <access method="*" roles="amq"/> </match> </role-access> </authorisation> </management-context> bootstrap.xml: | {{ if .Values.conf.protocols.http.enabled }} <broker xmlns="http://activemq.apache.org/schema"> <jaas-security domain="activemq"/> <server configuration="file:/var/lib/artemis-instance/etc/broker.xml"/> <web path="web" rootRedirectLocation="console"> <binding name="artemis" uri="http://0.0.0.0:{{ .Values.conf.protocols.http.port }}"> <app name="branding" url="activemq-branding" war="activemq-branding.war"/> <app name="plugin" url="artemis-plugin" war="artemis-plugin.war"/> <app name="console" url="console" war="console.war"/> </binding> </web> </broker> {{ end }} jolokia-access.xml: | <restrict> <cors> <allow-origin>*://*</allow-origin> <strict-checking/> </cors> </restrict> login.config: | activemq { org.apache.activemq.artemis.spi.core.security.jaas.PropertiesLoginModule sufficient debug=false reload=true org.apache.activemq.jaas.properties.user="artemis-users.properties" org.apache.activemq.jaas.properties.role="artemis-roles.properties"; org.apache.activemq.artemis.spi.core.security.jaas.GuestLoginModule sufficient debug=false org.apache.activemq.jaas.guest.user="amq" org.apache.activemq.jaas.guest.role="amq"; }; security-setting.xml: | <security-settings xmlns="urn:activemq:core"> <security-setting match="#"> <permission type="createNonDurableQueue" roles="amq"/> <permission type="deleteNonDurableQueue" roles="amq"/> <permission type="createDurableQueue" roles="amq"/> <permission type="deleteDurableQueue" roles="amq"/> <permission type="createAddress" roles="amq"/> <permission type="deleteAddress" roles="amq"/> <permission type="consume" roles="amq"/> <permission type="browse" roles="amq"/> <permission type="send" roles="amq"/> <permission type="manage" roles="amq"/> </security-setting> </security-settings> connector.xml: | <connectors xmlns="urn:activemq:core"> <connector name="{{ include "artemis.fullname" . }}">tcp://{{ include "artemis.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local:{{ .Values.conf.protocols.netty.port }}</connector> </connectors>

thraved商店的HA块特定于主要:
      <primary>
        <failover-on-shutdown>true</failover-on-shutdown>
        <wait-for-activation>false</wait-for-activation>
      </primary>

thraved商店的HA块特有备份 <backup> <failover-on-shutdown>true</failover-on-shutdown> <allow-failback>true</allow-failback> </backup>

主要用于主(共享商店更改之前使用)

的block

<primary> <check-for-active-server>true</check-for-active-server> <initial-replication-sync-timeout>600</initial-replication-sync-timeout> </primary>

复制HA块用于备份(在共享商店更改之前使用) <backup> <allow-failback>true</allow-failback> </backup>

我搜索了类似问题的谷歌搜索。

唯一直接对应于错误的一个不适用于我们的方案(没有系统无法找到指定的路径误差)

我考虑了NFS问题

,但更多的是不是最佳性能,而不是锁

我们没有进行任何维护,这是锁定问题原因

有一些与Artemis直接相关的问题。
    

您正在看到此错误的事实:
ERROR [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock

指出您正在使用的共享存储设备/协议不支持正确的文件锁定语义或文件锁定未正确配置为MOUNT。
    发生的是,主要经纪人正在开始并获得共享日记的锁定。当备份经纪人开始时,似乎可以在共享日记中获取锁定。当备份修改应由初级锁定的文件锁定时,初级会看到这一点并自行关闭以避免大脑。
  • 我建议您调查您正在使用的存储设备/协议,并确保它支持整个网络上的独家文件锁定,并正确配置了此类锁定。
kubernetes azure-aks activemq-artemis
© www.soinside.com 2019 - 2025. All rights reserved.