MariaDB Galera 集群设置问题

问题描述 投票:0回答:4

我正在尝试启动并运行 mariadb 集群,但它对我来说不起作用。现在我在 64 位 Red hat ES6 机器上使用 MariaDB Galera 5.5.36。我通过这里的存储库安装了 mariadb:

[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/5.5-galera/rhel6-amd64/
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1

在 server.conf 中,我在服务器 1 中有以下内容:

[mariadb]
log_error=/var/log/mariadb.log
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.211.133
wsrep_cluster_name='cluster'
wsrep_node_address='192.168.211.132'
wsrep_node_name='cluster1'
wsrep_sst_method=rsync

在服务器 2 上我有

[mariadb]
log_error=/var/log/mariadb.log
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.211.132
wsrep_cluster_name='cluster'
wsrep_node_address='192.168.211.133'
wsrep_node_name='cluster2'
wsrep_sst_method=rsync

当我使用以下命令启动服务器 1 时: sudo service mysql start --wsrep-new-cluster 它启动得很好,如果我打开 mysql 并检查 wsrep 的状态,它会说一切都已启动并正在运行,这很好,但是当我尝试在第二台服务器上执行 sudo service mysql start 时,我在错误日志中收到以下内容:

140609 14:47:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140609 14:47:56 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.i5qfm2' --pid-file='/var/lib/mysql/localhost.localdomain-recover.pid'
140609 14:47:57 mysqld_safe WSREP: Recovered position 85448d73-ebe8-11e3-9c20-fbc1995fee11:0
140609 14:47:57 [Note] WSREP: wsrep_start_position var submitted: '85448d73-ebe8-11e3-9c20-fbc1995fee11:0'
140609 14:47:57 [Note] WSREP: Read nil XID from storage engines, skipping position init
140609 14:47:57 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
140609 14:47:57 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <[email protected]> loaded successfully.
140609 14:47:57 [Note] WSREP: CRC-32C: using hardware acceleration.
140609 14:47:57 [Note] WSREP: Found saved state: 85448d73-ebe8-11e3-9c20-fbc1995fee11:-1
140609 14:47:57 [Note] WSREP: Passing config to GCS: base_host = 192.168.211.133; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
140609 14:47:57 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
140609 14:47:57 [Note] WSREP: wsrep_sst_grab()
140609 14:47:57 [Note] WSREP: Start replication
140609 14:47:57 [Note] WSREP: Setting initial position to 85448d73-ebe8-11e3-9c20-fbc1995fee11:0
140609 14:47:57 [Note] WSREP: protonet asio version 0
140609 14:47:57 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
140609 14:47:57 [Note] WSREP: backend: asio
140609 14:47:57 [Note] WSREP: GMCast version 0
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
140609 14:47:57 [Note] WSREP: EVS version 0
140609 14:47:57 [Note] WSREP: PC version 0
140609 14:47:57 [Note] WSREP: gcomm: connecting to group 'cluster', peer '192.168.211.132:,192.168.211.134:'
140609 14:48:00 [Warning] WSREP: no nodes coming from prim view, prim not possible
140609 14:48:00 [Note] WSREP: view(view_id(NON_PRIM,0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,1) memb {
        0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,0
} joined {
} left {
} partitioned {
})
140609 14:48:01 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50775S), skipping check
140609 14:48:31 [Note] WSREP: view((empty))
140609 14:48:31 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():141
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out)
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'cluster' at 'gcomm://192.168.211.132,192.168.211.134': -110 (Connection timed out)
140609 14:48:31 [ERROR] WSREP: gcs connect failed: Connection timed out
140609 14:48:31 [ERROR] WSREP: wsrep::connect() failed: 7
140609 14:48:31 [ERROR] Aborting

140609 14:48:31 [Note] WSREP: Service disconnected.
140609 14:48:32 [Note] WSREP: Some threads may fail to exit.
140609 14:48:32 [Note] /usr/sbin/mysqld: Shutdown complete

140609 14:48:32 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid ended

我不知道为什么第二个服务器无法检测到集群已启动并正在运行。这些机器可以很好地相互通信,我可以从一台机器通过 SSH 连接到另一台机器,并且它们可以互相 ping 通。我尝试删除 galera 缓存,尝试降级我的 mariadb galera 版本,尝试禁用 SELinux,尝试以其他用户身份运行 mysql 服务,验证正确的端口已打开,尝试在具有不同 IP 地址的不同计算机上的 2 个虚拟机上运行它们等等。有谁知道这里发生了什么,因为我已经搜索了 3 天试图解决这个问题,但似乎没有解决方案适合我。

mysql mariadb galera
4个回答
7
投票

这是我解决类似问题的方法。

CentOS 7 w/MariaDB Galera 10.1。

Node2 我看到了这个:

016-12-27 15:40:38 140703512762624 [Warning] WSREP: no nodes coming from prim view, prim not possible

阅读完一些内容后,我尝试在node1上运行它。

service mysql start --wsrep-new-cluster

但这失败了,在日志中,我发现了这个......

2016-12-27 15:44:08 140438853814528 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .

所以我编辑了文件

/var/lib/mysql/grastate.dat
,将
safe_to_bootstrap
更改为
1

然后我可以使用以下命令启动主节点:

service mysql start --wsrep-new-cluster

其他的我就用了

service mysql start

注意:这是在演示预生产环境中。在通过同时重新启动所有服务器使一切正常工作后,我立即破坏了它:P,但我知道没有写入,并且数据库是同步的。如果您在生产中并且发生这种情况,您可以使用以下命令来确定在哪个节点上运行“new-cluster”,这类似于说,让我成为主要节点。

mysqld_safe --wsrep-recover

如果这是一个生产问题,我强烈建议阅读本文并在向损坏的客户端发出命令之前使用 CloneZilla 进行备份!

https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/


5
投票

集群必须在主节点上使用以下命令启动:

galera_new_cluster

启动第一个节点后,即可成功启动集群中的其他节点。


2
投票

我相信您需要在 wsrep_cluster_address 参数中列出所有 IP。

wsrep_cluster_address=gcomm://192.168.211.132,192.168.211.133

这应该在两台主机上完成。 顺便说一句,您可能需要三个节点而不是两个节点,以避免脑裂情况。


0
投票

从两个配置文件中删除 wsrep_node_address。

© www.soinside.com 2019 - 2024. All rights reserved.