Context
我正在尝试准备Artemis 2.37.0在AKS(K8S)群集上的工作对(单个主要+备份),并具有垂直量的量(我们使用Azure存储帐户)。我们使用 -kube_ping进行地址发现。 我们已经使用复制功能了几个月,但是分裂的大脑问题经常发生。我想将其更改为
shared-store在更改共享商店解决方案后,我将面临4个步骤面对方案:
主机不起作用,备份更改为主要模式〜30秒后重新启动后,备份转到备份模式
I搜索了主要日志(我无法在此处粘贴5.5k行),并在POD重新启动之前找到了这些日志:
2024-09-24 08:03:05,344 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock appears to be valid; double check by reading status
2024-09-24 08:03:05,344 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] getting state...
2024-09-24 08:03:05,344 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] trying to lock position: 0
2024-09-24 08:03:05,350 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] locked position: 0
2024-09-24 08:03:05,350 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] lock: sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid]
2024-09-24 08:03:05,355 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] state: L
2024-09-24 08:03:05,355 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock appears to be valid; triple check by comparing timestamp
2024-09-24 08:03:05,357 DEBUG [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lock file /var/lib/artemis-instance/data/journal/server.lock originally locked at 2024-09-24T08:02:33.067+0000 was modified at 2024-09-24T08:02:35.181+0000
2024-09-24 08:03:05,358 WARN [org.apache.activemq.artemis.core.server.impl.FileLockNodeManager] Lost the lock according to the monitor, notifying listeners
2024-09-24 08:03:05,358 ERROR [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock
netty
连接有关的错误,看起来更像是警告Artemis实例还没有盯着。Artemis.Artemis.SVC.Cluster.local是主要的POD地址(如果我正确理解的话,主demartty在primarter上正确地询问它是否有效)。
2024-09-24 08:03:02,454 ERROR [org.apache.activemq.artemis.core.client] AMQ214016: Failed to create netty connection
java.net.UnknownHostException: artemis.artemis.svc.cluster.local
问题我错了什么?我想念一个重要的参数吗?也许有一些时间可以增加我在文档中错过的?
对于复制,相同的配置正在工作(主要启动而无需重新启动循环)。
配置文件
artemis-roles.properties: |
amq = admin
admin = admin,guest
artemis-users.properties: |
admin = admin
guest = guest
artemis.profile: |
ARTEMIS_HOME='/opt/artemis'
ARTEMIS_INSTANCE='/var/lib/artemis-instance'
ARTEMIS_DATA_DIR='/var/lib/artemis-instance/data'
ARTEMIS_ETC_DIR='/var/lib/artemis-instance/etc'
ARTEMIS_OOME_DUMP='/var/lib/artemis-instance/log/oom_dump.hprof'
ARTEMIS_INSTANCE_URI='file:/var/lib/artemis-instance/./'
ARTEMIS_INSTANCE_ETC_URI='file:/var/lib/artemis-instance/./etc/'
HAWTIO_ROLE='amq'
if [ -z "$JAVA_ARGS" ]; then
JAVA_ARGS="-XX:AutoBoxCacheMax=20000 -XX:+PrintClassHistogram -XX:+UseG1GC -XX:+UseStringDeduplication -Xms512M -Xmx2G -Dhawtio.disableProxy=true -Dhawtio.realm=activemq -Dhawtio.offline=true -Dhawtio.rolePrincipalClasses=org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal -Dhawtio.http.strictTransportSecurity=max-age=31536000;includeSubDomains;preload -Djolokia.policyLocation=${ARTEMIS_INSTANCE_ETC_URI}jolokia-access.xml -Dlog4j2.disableJmx=true "
fi
JAVA_ARGS="$JAVA_ARGS -Djava.net.preferIPv4Stack=true -Dipv4addr=$(hostname -i)"
if [ "$1" = "run" ]; then :
fi;
broker.xml: |
<configuration xmlns="urn:activemq"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xi="http://www.w3.org/2001/XInclude"
xsi:schemaLocation="urn:activemq /schema/artemis-configuration.xsd">
<core xmlns="urn:activemq:core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:activemq:core ">
<name>{{ include "artemis.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local</name>
<persistence-enabled>true</persistence-enabled>
<max-redelivery-records>1</max-redelivery-records>
<paging-directory>/var/lib/artemis-instance/data/paging</paging-directory>
<bindings-directory>/var/lib/artemis-instance/data/bindings</bindings-directory>
<large-messages-directory>/var/lib/artemis-instance/data/large-messages</large-messages-directory>
<id-cache-size xmlns="urn:activemq:core">20000</id-cache-size>
<disk-scan-period>5000</disk-scan-period>
<max-disk-usage>90</max-disk-usage>
<critical-analyzer>true</critical-analyzer>
<critical-analyzer-timeout>180000</critical-analyzer-timeout>
<critical-analyzer-check-period>60000</critical-analyzer-check-period>
<critical-analyzer-policy>SHUTDOWN</critical-analyzer-policy>
<page-sync-timeout>512000</page-sync-timeout>
<global-max-messages>-1</global-max-messages>
<journal-type>ASYNCIO</journal-type>
<journal-directory>/var/lib/artemis-instance/data/journal</journal-directory>
<journal-datasync>true</journal-datasync>
<journal-min-files>2</journal-min-files>
<journal-pool-files>10</journal-pool-files>
<journal-device-block-size>4096</journal-device-block-size>
<journal-file-size>10M</journal-file-size>
<journal-buffer-timeout>144000</journal-buffer-timeout>
<journal-max-io>4096</journal-max-io>
<xi:include href="/var/lib/artemis-instance/etc/acceptor.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/security-setting.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/cluster-connection.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/broadcast.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/address.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/address-setting.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/discovery.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/ha.xml"/>
<xi:include href="/var/lib/artemis-instance/etc/connector.xml"/>
</core>
</configuration>
acceptor.xml: |
<acceptors xmlns="urn:activemq:core">
<acceptor name="artemis">tcp://0.0.0.0:{{ .Values.conf.protocols.netty.port }}?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;amqpMinLargeMessageSize=102400;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpDuplicateDetection=true;supportAdvisory=false;suppressInternalManagementObjects=false</acceptor>
{{ if .Values.conf.protocols.amqp.enabled }}
<acceptor name="amqp">tcp://0.0.0.0:5672?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=AMQP;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;amqpMinLargeMessageSize=102400;amqpDuplicateDetection=true</acceptor>
{{ end }}
{{ if .Values.conf.protocols.stomp.enabled }}
<acceptor name="stomp">tcp://0.0.0.0:{{ .Values.conf.protocols.stomp.port }}?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=STOMP;useEpoll=true</acceptor>
{{ end }}
{{ if .Values.conf.protocols.hornetq.enabled }}
<acceptor name="hornetq">tcp://0.0.0.0:5445?anycastPrefix=jms.queue.;multicastPrefix=jms.topic.;protocols=HORNETQ,STOMP;useEpoll=true</acceptor>
{{ end }}
{{ if .Values.conf.protocols.mqtt.enabled }}
<acceptor name="mqtt">tcp://0.0.0.0:1883?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=MQTT;useEpoll=true</acceptor>
{{ end }}
{{ if .Values.conf.protocols.ws.enabled }}
<acceptor name="stomp-ws-acceptor">tcp://0.0.0.0:61614?protocols=STOMP_WS</acceptor>
{{ end }}
</acceptors>
ha.xml: |
<ha-policy xmlns="urn:activemq:core">
# <replication> when replication enabled
<shared-store>
{{ .Values.conf.broker.ha | indent 20 }}
</shared-store>
</ha-policy>
cluster-connection.xml: |
<cluster-connections xmlns="urn:activemq:core">
<cluster-connection name="artemis">
<address>jms</address>
<connector-ref>{{ include "artemis.fullname" . }}</connector-ref>
<check-period>1000</check-period>
<connection-ttl>5000</connection-ttl>
<min-large-message-size>50000</min-large-message-size>
<call-timeout>120000</call-timeout>
<retry-interval>500</retry-interval>
<retry-interval-multiplier>1.0</retry-interval-multiplier>
<max-retry-interval>5000</max-retry-interval>
<initial-connect-attempts>-1</initial-connect-attempts>
<reconnect-attempts>-1</reconnect-attempts>
<use-duplicate-detection>true</use-duplicate-detection>
<forward-when-no-consumers>false</forward-when-no-consumers>
<max-hops>1</max-hops>
<confirmation-window-size>10000000</confirmation-window-size>
<call-failover-timeout>30000</call-failover-timeout>
<notification-interval>1000</notification-interval>
<notification-attempts>2</notification-attempts>
<discovery-group-ref discovery-group-name="jgroups-discovery" />
</cluster-connection>
</cluster-connections>
address.xml: |
<addresses xmlns="urn:activemq:core">
<address name="DLQ">
<anycast>
<queue name="DLQ" />
</anycast>
</address>
<address name="ExpiryQueue">
<anycast>
<queue name="ExpiryQueue" />
</anycast>
</address>
</addresses>
address-setting.xml: |
<address-settings xmlns="urn:activemq:core">
<address-setting match="activemq.management#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<max-size-bytes>-1</max-size-bytes>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
</address-setting>
<address-setting match="#">
<dead-letter-address>DLQ</dead-letter-address>
<expiry-address>ExpiryQueue</expiry-address>
<redelivery-delay>0</redelivery-delay>
<message-counter-history-day-limit>10</message-counter-history-day-limit>
<address-full-policy>PAGE</address-full-policy>
<auto-create-queues>true</auto-create-queues>
<auto-create-addresses>true</auto-create-addresses>
<auto-delete-queues>false</auto-delete-queues>
<auto-delete-addresses>false</auto-delete-addresses>
<page-size-bytes>10M</page-size-bytes>
<max-size-bytes>-1</max-size-bytes>
<max-size-messages>-1</max-size-messages>
<max-read-page-messages>-1</max-read-page-messages>
<max-read-page-bytes>20M</max-read-page-bytes>
<page-limit-bytes>-1</page-limit-bytes>
<page-limit-messages>-1</page-limit-messages>
</address-setting>
</address-settings>
broadcast.xml: |
<broadcast-groups xmlns="urn:activemq:core">
<broadcast-group name="jgroups-broadcast">
<jgroups-file>jgroups-discovery.xml</jgroups-file>
<jgroups-channel>activemq_broadcast_channel</jgroups-channel>
<connector-ref>{{ include "artemis.fullname" . }}</connector-ref>
</broadcast-group>
</broadcast-groups>
discovery.xml: |
<discovery-groups xmlns="urn:activemq:core" >
<discovery-group name="jgroups-discovery">
<jgroups-file>jgroups-discovery.xml</jgroups-file>
<jgroups-channel>activemq_broadcast_channel</jgroups-channel>
<refresh-timeout>30000</refresh-timeout>
</discovery-group>
</discovery-groups>
jgroups-discovery.xml: |
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:org:jgroups"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<TCP
external_addr="match-interface:eth0"
bind_addr="match-interface:eth0"
bind_port="7800"
thread_pool.min_threads="1"
/>
<org.jgroups.protocols.kubernetes.KUBE_PING
primaryProtocol="https"
namespace="{{ .Release.Namespace }}"
labels="rl-type={{ .Values.conf.kubePing.name }}"
/>
<MERGE3 max_interval="30000" min_interval="10000"/>
<FD_SOCK start_port="9000"/>
<FD_ALL timeout="30000" interval="5000"/>
<VERIFY_SUSPECT timeout="1500"/>
<BARRIER />
<pbcast.NAKACK2
xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
use_mcast_xmit="false"
discard_delivered_msgs="true" />
<UNICAST3
xmit_table_num_rows="100"
xmit_table_msgs_per_row="1000"
xmit_table_max_compaction_time="30000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG2 frag_size="60K"/>
<pbcast.STATE_TRANSFER/>
<COUNTER/>
</config>
log4j2.properties: |
monitorInterval = 5
rootLogger = {{ .Values.conf.log_level }}, console, log_file
logger.activemq.name=org.apache.activemq
logger.activemq.level={{ .Values.conf.log_level }}
logger.artemis_server.name=org.apache.activemq.artemis.core.server
logger.artemis_server.level={{ .Values.conf.log_level }}
logger.artemis_journal.name=org.apache.activemq.artemis.journal
logger.artemis_journal.level={{ .Values.conf.log_level }}
logger.artemis_utils.name=org.apache.activemq.artemis.utils
logger.artemis_utils.level={{ .Values.conf.log_level }}
logger.critical_analyzer.name=org.apache.activemq.artemis.utils.critical
logger.critical_analyzer.level={{ .Values.conf.log_level }}
logger.audit_base = OFF, audit_log_file
logger.audit_base.name = org.apache.activemq.audit.base
logger.audit_base.additivity = false
logger.audit_resource = OFF, audit_log_file
logger.audit_resource.name = org.apache.activemq.audit.resource
logger.audit_resource.additivity = false
logger.audit_message = OFF, audit_log_file
logger.audit_message.name = org.apache.activemq.audit.message
logger.audit_message.additivity = false
logger.jetty.name=org.eclipse.jetty
logger.jetty.level=WARN
logger.authentication_filter.name=io.hawt.web.auth.AuthenticationFilter
logger.authentication_filter.level=ERROR
logger.curator.name=org.apache.curator
logger.curator.level=WARN
logger.zookeeper.name=org.apache.zookeeper
logger.zookeeper.level=ERROR
appender.console.type=Console
appender.console.name=console
appender.console.layout.type=PatternLayout
appender.console.layout.pattern=%d %-5level [%logger] %msg%n
appender.log_file.type = RollingFile
appender.log_file.name = log_file
appender.log_file.fileName = ${sys:artemis.instance}/log/artemis.log
appender.log_file.filePattern = ${sys:artemis.instance}/log/artemis.log.%d{yyyy-MM-dd}
appender.log_file.layout.type = PatternLayout
appender.log_file.layout.pattern = %d %-5level [%logger] %msg%n
appender.log_file.policies.type = Policies
appender.log_file.policies.cron.type = CronTriggeringPolicy
appender.log_file.policies.cron.schedule = 0 0 0 * * ?
appender.log_file.policies.cron.evaluateOnStartup = true
appender.audit_log_file.type = RollingFile
appender.audit_log_file.name = audit_log_file
appender.audit_log_file.fileName = ${sys:artemis.instance}/log/audit.log
appender.audit_log_file.filePattern = ${sys:artemis.instance}/log/audit.log.%d{yyyy-MM-dd}
appender.audit_log_file.layout.type = PatternLayout
appender.audit_log_file.layout.pattern = %d [AUDIT](%t) %msg%n
appender.audit_log_file.policies.type = Policies
appender.audit_log_file.policies.cron.type = CronTriggeringPolicy
appender.audit_log_file.policies.cron.schedule = 0 0 0 * * ?
appender.audit_log_file.policies.cron.evaluateOnStartup = true
management.xml: |
<management-context xmlns="http://activemq.apache.org/schema">
<authorisation>
<allowlist>
<entry domain="hawtio"/>
</allowlist>
<default-access>
<access method="list*" roles="amq"/>
<access method="get*" roles="amq"/>
<access method="is*" roles="amq"/>
<access method="set*" roles="amq"/>
<access method="browse*" roles="amq"/>
<access method="count*" roles="amq"/>
<access method="*" roles="amq"/>
</default-access>
<role-access>
<match domain="org.apache.activemq.artemis">
<access method="list*" roles="amq"/>
<access method="get*" roles="amq"/>
<access method="is*" roles="amq"/>
<access method="set*" roles="amq"/>
<access method="browse*" roles="amq"/>
<access method="count*" roles="amq"/>
<access method="*" roles="amq"/>
</match>
</role-access>
</authorisation>
</management-context>
bootstrap.xml: |
{{ if .Values.conf.protocols.http.enabled }}
<broker xmlns="http://activemq.apache.org/schema">
<jaas-security domain="activemq"/>
<server configuration="file:/var/lib/artemis-instance/etc/broker.xml"/>
<web path="web" rootRedirectLocation="console">
<binding name="artemis" uri="http://0.0.0.0:{{ .Values.conf.protocols.http.port }}">
<app name="branding" url="activemq-branding" war="activemq-branding.war"/>
<app name="plugin" url="artemis-plugin" war="artemis-plugin.war"/>
<app name="console" url="console" war="console.war"/>
</binding>
</web>
</broker>
{{ end }}
jolokia-access.xml: |
<restrict>
<cors>
<allow-origin>*://*</allow-origin>
<strict-checking/>
</cors>
</restrict>
login.config: |
activemq {
org.apache.activemq.artemis.spi.core.security.jaas.PropertiesLoginModule sufficient
debug=false
reload=true
org.apache.activemq.jaas.properties.user="artemis-users.properties"
org.apache.activemq.jaas.properties.role="artemis-roles.properties";
org.apache.activemq.artemis.spi.core.security.jaas.GuestLoginModule sufficient
debug=false
org.apache.activemq.jaas.guest.user="amq"
org.apache.activemq.jaas.guest.role="amq";
};
security-setting.xml: |
<security-settings xmlns="urn:activemq:core">
<security-setting match="#">
<permission type="createNonDurableQueue" roles="amq"/>
<permission type="deleteNonDurableQueue" roles="amq"/>
<permission type="createDurableQueue" roles="amq"/>
<permission type="deleteDurableQueue" roles="amq"/>
<permission type="createAddress" roles="amq"/>
<permission type="deleteAddress" roles="amq"/>
<permission type="consume" roles="amq"/>
<permission type="browse" roles="amq"/>
<permission type="send" roles="amq"/>
<permission type="manage" roles="amq"/>
</security-setting>
</security-settings>
connector.xml: |
<connectors xmlns="urn:activemq:core">
<connector name="{{ include "artemis.fullname" . }}">tcp://{{ include "artemis.fullname" . }}.{{ .Release.Namespace }}.svc.cluster.local:{{ .Values.conf.protocols.netty.port }}</connector>
</connectors>
<primary>
<failover-on-shutdown>true</failover-on-shutdown>
<wait-for-activation>false</wait-for-activation>
</primary>
thraved商店的HA块特有备份
<backup>
<failover-on-shutdown>true</failover-on-shutdown>
<allow-failback>true</allow-failback>
</backup>
的block
<primary>
<check-for-active-server>true</check-for-active-server>
<initial-replication-sync-timeout>600</initial-replication-sync-timeout>
</primary>
复制HA块用于备份(在共享商店更改之前使用)
<backup>
<allow-failback>true</allow-failback>
</backup>
唯一直接对应于错误的一个不适用于我们的方案(没有系统无法找到指定的路径误差)
我考虑了NFS问题,但更多的是不是最佳性能,而不是锁
我们没有进行任何维护,这是锁定问题原因有一些与Artemis直接相关的问题。
您正在看到此错误的事实:
ERROR [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=Lost NodeManager lock, message=NULL
java.io.IOException: lost lock
指出您正在使用的共享存储设备/协议不支持正确的文件锁定语义或文件锁定未正确配置为MOUNT。