我们有一个 3 节点的 RabbitMQ 集群,节点有
A
、B
和 C
为了测试故障场景,我们使用以下命令停止了节点
A
上的服务:systemctl stop rabbitmq-server.service
然后在 Node
B
上运行以下命令:rabbitmqctl forget_cluster_node rabbit@A
在此之后,当我们尝试在节点上启动服务时
A
我们得到一个错误:
[root@A ~]# systemctl start rabbitmq-server.service
Job for rabbitmq-server.service failed because the control process exited with error code. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
所以为了让这个工作我们做了以下事情:
[root@A ~]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@A ...
Error: unable to perform an operation on node 'rabbit@A'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more
* Consult server logs on node rabbit@A
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rabbit@A']
rabbit@A:
* connected to epmd (port 4369) on A
* epmd reports: node 'rabbit' not running at all
no other nodes on A
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli-672-rabbit@A'
* effective user's home directory: /var/lib/rabbitmq
* Erlang cookie hash: qLO74L2PYI4QpJ2CDyQZBw==
[root@A ~]# rabbitmqctl reset
Resetting node rabbit@A ...
17:38:54.144 [warning] Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@A-feature_flags`:
17:38:54.144 [warning] Feature flags: - list of previously disabled feature flags now marked as such: [:drop_unroutable_metric, :empty_basic_get_metric]
Error:
{{:badmap, :undefined}, [{:maps, :get, [:depends_on, :undefined, []], [file: 'maps.erl', line: 188]}, {:rabbit_feature_flags, :enable_dependencies, 2, [file: 'rabbit_feature_flags.erl', line: 1564]}, {:rabbit_feature_flags, :do_enable_locally, 1, [file: 'rabbit_feature_flags.erl', line: 1544]}, {:rabbit_feature_flags, :do_sync_feature_flags_with_node, 1, [file: 'rabbit_feature_flags.erl', line: 2174]}, {:rabbit_feature_flags, :sync_feature_flags_with_cluster1, 2, [file: 'rabbit_feature_flags.erl', line: 2144]}, {:rabbit_mnesia, :ensure_feature_flags_are_in_sync, 2, [file: 'rabbit_mnesia.erl', line: 645]}, {:rabbit_mnesia, :init_db, 3, [file: 'rabbit_mnesia.erl', line: 574]}, {:rabbit_mnesia, :init_db_and_upgrade, 4, [file: 'rabbit_mnesia.erl', line: 585]}]}
[root@A ~]# systemctl start rabbitmq-server
[root@A ~]# sudo rabbitmqctl stop_app
Stopping rabbit application on node rabbit@A ...
[root@A ~]# sudo rabbitmqctl reset
Resetting node rabbit@A ...
[root@A ~]# rabbitmqctl join_cluster rabbit@B
Clustering node rabbit@A with rabbit@B
17:39:52.964 [warning] Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@A-feature_flags`:
17:39:52.964 [warning] Feature flags: - list of previously disabled feature flags now marked as such: [:maintenance_mode_status]
17:39:53.660 [error] Failed to create a tracked connection table for node :"rabbit@A": {:node_not_running, :"rabbit@A"}
17:39:53.661 [error] Failed to create a per-vhost tracked connection table for node :"rabbit@A": {:node_not_running, :"rabbit@A"}
17:39:53.661 [error] Failed to create a per-user tracked connection table for node :"rabbit@A": {:node_not_running, :"rabbit@A"}
[root@A ~]# sudo rabbitmqctl start_app
Starting node rabbit@A ...
[root@A ~]#
虽然节点重新加入了集群,但为什么在执行
reset
时报错?同样在运行 join_cluster
命令后,它给出了错误,但它仍然连接到集群,因为我们可以在管理 UI 中看到它为绿色。我们遵循的步骤是否正确?