RabbitMQ脑裂「建议收藏」

RabbitMQ脑裂「建议收藏」在RabbitMQ3.4.x中会出现错误的网络分区检测(某种意义上可以称之为脑裂)的现象,本文通过实验验证此现象,愿小伙伴们少走弯路。Preview网上有两篇帖子(需要翻墙)https://groups.google.com/forum/#!topic/rabbitmq-users/dt8VFhMb2zMhttps://groups.google.com/forum/#!top…

大家好,又见面了,我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全家桶1年46,售后保障稳定


欢迎支持笔者新作:《深入理解Kafka:核心设计与实践原理》和《RabbitMQ实战指南》,同时欢迎关注笔者的微信公众号:朱小厮的博客。
RabbitMQ脑裂「建议收藏」


欢迎跳转到本文的原文链接:https://honeypps.com/mq/rabbitmq-network-partition-1/

在RabbitMQ3.4.x中会出现错误的网络分区检测(某种意义上可以称之为脑裂)的现象,本文通过实验验证此现象,愿小伙伴们少走弯路。

Preview

网上有两篇帖子(需要翻墙)
https://groups.google.com/forum/#!topic/rabbitmq-users/dt8VFhMb2zM
https://groups.google.com/forum/#!topic/rabbitmq-users/06OQkYtLJd8
陈述了脑裂的现象。

帖子中描述现象:

Hey Folk,

i just set up a rabbitmq cluster:

Three Nodes:
Node A | Node B | Node C

All three nodes see each other (same erlang-cookie, mode: pause_minority).
 rabbitmqctl cluster_status => shows status of all nodes on every instance.

Every queue is mirrored to the other nodes.

If i shutdown Node B, the following is happening:
* Node A realizes Node B is offline.
* Node A asks Node C for Node B status.
* Node C answers: "I still have connection to Node B."
* Node A shuts down itself.
* Node C realizes some seconds later, that the connection to Node B is no more possible.

From three Nodes only one is left in case of an unexpected outage.

I would like to realize a setup where Node A and C keep the connection even if Node B goes offline.
Is there any way to do this?

Jetbrains全家桶1年46,售后保障稳定

Michael Klishin(rabbitmq-server第二贡献者)回复:

A known issue which is partially resolve in 3.4.x releases. 26474 can be related. 

(根据RabbitMQ 3.4.2 Release日志:26474 prevent false positive detection of partial partitions (since 3.4.0)) ====》错误的网络分区检测。

Simon MacMullen(也是rabbitmq-server的contributor):

So this is caused by the new partial partition detection in 3.4.x. It 
looks like it is too sensitive - C should only reply "yes" if it has 
positive confirmation that it can still talk to B, not if the connection 
just hasn't failed yet. 

This will be fixed in 3.4.2. 

假设

自此可以假设:rabbitmq3.4.0存在错误的网络分区检测,rabbitmq3.4.2修复了此bug。
论证过程:分别对rabbitmq3.4.0, rabbitmq3.4.1, rabbitmq3.4.2, rabbitmq3.6.0进行实验, 分别配置A B C三个节点组成一个cluster,然后通过停止C的网络来验证A和B是否出现错误的网络分区检测.


论证

论证1

rabbitmq版本:3.4.0
rabbitmq节点配置
共三个节点:A B C,分别为:
A:rabbit@zhuzhonghua2-fqawb
B:rabbit@hiddenzhu-8drd
C:rabbit@hidden-local
B join_cluster A; C join_cluster A

查看cluster_status:(rabbitmqctl cluster_status)

Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ...
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                 'rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]

在C节点执行service network stop
在A节点查看cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]

再次在A节点查看cluster_status

Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ...
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hiddenzhu-8drdc']}]}]
在B节点查看cluster_status
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]

结论:【这里出现了网络分区,但是真正的网络分区是要在网络恢复连通之后才能检测】

在C节点执行service network start
查看A节点cluster_status

[{nodes,
     [{disc,
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
           'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,
     [{'rabbit@zhuzhonghua2-fqawb',
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc']}]}]

查看B节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]

查看C节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]

论证2

rabbitmq版本:3.4.1
节点配置如上(B join_cluster A, C join_cluster A)
查看节点状态:

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb',
                 'rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]

在C节点执行service network stop
查看A节点cluster_status

Cluster status of node 'rabbit@zhuzhonghua2-fqawb' ...
[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hiddenzhu-8drdc']}]}]

查看B节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]

结论:【复现】

在C节点执行service network start
查看A节点cluster_status

[{nodes,
     [{disc,
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
           'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,
     [{'rabbit@zhuzhonghua2-fqawb',
          ['rabbit@hidden-local','rabbit@hiddenzhu-8drdc']}]}]

查看B节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hiddenzhu-8drdc',['rabbit@zhuzhonghua2-fqawb']}]}]

查看C节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]

论证3

rabbitmq版本:3.4.2 (版本3.6.0与此相同)
节点配置如上(B join_cluster A, C join_cluster A)
查看节点状态

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb',
                 'rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]

在C节点执行service network stop
查看A节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]

查看B节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb','rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[]}]

结论:【未复现】

在C节点执行service network start
查看A节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hiddenzhu-8drdc','rabbit@zhuzhonghua2-fqawb']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hidden-local']}]}]

查看B节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@zhuzhonghua2-fqawb','rabbit@hiddenzhu-8drdc']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@zhuzhonghua2-fqawb',['rabbit@hidden-local']}]}]

查看C节点cluster_status

[{nodes,[{disc,['rabbit@hidden-local','rabbit@hiddenzhu-8drdc',
                'rabbit@zhuzhonghua2-fqawb']}]},
 {running_nodes,['rabbit@hidden-local']},
 {cluster_name,<<"rabbit@zhuzhonghua2-fqawb">>},
 {partitions,[{'rabbit@hidden-local',['rabbit@zhuzhonghua2-fqawb']}]}]

结论

版本问题基本得到验证,为了防止错误的网络分区检测现象,建议正在使用rabbitmq的小伙伴升级,避免使用3.4.0和3.4.1这两个版本。

网络分区

有关网络分区有篇文章(RabbitMQ 网络分区问题)这样介绍:

RabbitMQ 集群的网络分区容错性并不是非常高,在网络经常发生分区时会有些问题,最明显的就是脑裂问题。

官方文档是这样介绍的:

RabbitMQ clusters do not tolerate network partitions well. If you are thinking of clustering across a WAN, don't. You should use federation or the shovel instead.

从中我们可以看出,在广域网环境下不应该使用集群,而应该使用 federation 或者 shovel 来解决。

不过即使是在局域网环境下,网络分区也不可能完全避免,网络设备(比如中继设备、网卡)出现故障也会导致网络分区。

Network partition detected

Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions. 

当出现网络分区时,不同分区里的节点会认为不属于自身所在分区的节点都已经挂了,对 queue、exchange、binding 的操作仅对当前分区有效。在 RabbitMQ 的默认配置下,即使网络恢复了也不会自动处理网络分区带来的问题从而恢复集群。RabbitMQ(3.1+)会自动探测网络分区,并且提供了配置来解决这个问题。

[
 {rabbit,
 [{tcp_listeners,[5672]},
 {cluster_partition_handling, ignore}]
 }
].

RabbitMQ 提供了4种配置(详细参考:http://blog.csdn.net/u013256816/article/details/73757884):

  1. ignore:默认配置,发生网络分区时不作处理,当认为网络是可靠时选用该配置
  2. autoheal:各分区协商后重启客户端连接最少的分区节点,恢复集群(CAP 中保证 AP,有状态丢失)
  3. pause_if_all_down。
  4. pause_minority:分区发生后判断自己所在分区内节点是否超过集群总节点数一半,如果没有超过则暂停这些节点(保证 CP,总节点数为奇数个)

参考:
RabbitMQ 官方文档
网络分区
脑裂问题

欢迎跳转到本文的原文链接:https://honeypps.com/mq/rabbitmq-network-partition-1/


欢迎支持笔者新作:《深入理解Kafka:核心设计与实践原理》和《RabbitMQ实战指南》,同时欢迎关注笔者的微信公众号:朱小厮的博客。
RabbitMQ脑裂「建议收藏」


版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/219186.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)


相关推荐

  • mm理论怎么通俗理解_通俗易懂的理解三观

    mm理论怎么通俗理解_通俗易懂的理解三观讲这个之前,先讲一下什么叫对称加密,什么叫非对称加密:通俗理解:对称加密,一个盒子,两把钥匙,两把钥匙都可以锁和开锁。非对称加密,一个盒子,一个私人钥匙,很多公共的钥匙。两种情况:1,私人的钥匙能锁能开锁,公共的钥匙,只能开锁,不能上锁2,私人的钥匙能锁能开锁,公共的钥匙,只能上锁,不能开锁改革春风吹进家,江南贸易遍开花。随着改革开放的推进,小K家里的生意也越做越好,正洽迎上互联网的…

  • python正则表达式菜鸟教程_正则表达式空格怎么表示

    python正则表达式菜鸟教程_正则表达式空格怎么表示正则表达式的作用:用来匹配字符串一、字符串方法字符串提供的方法是完全匹配,不能进行模糊匹配s=’helloworld’#字符串提供的方法是完全匹配,不能进行模糊匹配print(s.find(‘ll’))#2查找ll的位置,输出的是第一个l的位置ret=s.replace(‘ll’,’xx’)#替换,用ll替换为xxprint(ret)#hexxowo…

  • STM32中3个延时函数「建议收藏」

    STM32中3个延时函数「建议收藏」第一个延时函数:voiddelay(u16num){u16i,j;for(i=0;i&lt;num;i++)for(j=0;j&lt;0x800;j++);}eg:delay(50);第二个延时函数:staticu8fac_us=0;//us延时倍乘数staticu16fac_ms=0;//ms延时倍乘数//初始化延迟函数//SYSTICK的时钟固…

  • 【全网独家】手把手教你制作一个Ubuntu Deb 安装包「建议收藏」

    Ubuntu Deb 安装包加班到9点多了,今天本来准备整理一下Linux常用命令,没有时间了。目前还没有回家。就发一篇昨天整理的一个deb打包的教程,网上搜了很多,都是讲解命令的,没有一个比较完整的教程。如果你公司用到了deb打包,可以仔细阅读一下,如果你没有接触过deb打包,那可以简单了解一下,毕竟技多不压身。本文主要参考博文:Debconf程序员的教程 :http://www.fif…

  • Android apk中so库文件未压缩

    Android apk中so库文件未压缩背景:升级AS3.6.1,并且升级projectbuild.gradleAGP到3.6.1,一个项目发现打包后文件骤然增大,查看apk包,发现apk包中so库文件未被压缩.但是一个类似项目,相同版本却没有问题升级前升级后升级后RawFileSize正好是未压缩的大小可能原因不同版本AGPgradlebuildtask实现不一样,再某情况下回不进行…

  • 打开redis远程访问端口_linux端口开放命令

    打开redis远程访问端口_linux端口开放命令一、问题详情最近我在阿里云ESC上购买了一台服务器,但是在安装完redis后,我在本地的电脑上怎么也没法调用这台服务器上面的redis服务。最后,我终于解决了,所以来记录一下。二、解决方案想要解决这个问题,前提条件是已经在阿里云的安全组设置里面已经开放了3679这个端口。接着我们要修改两个配置文件。redis.conf尽量将最初始的redis.conf复制一份,防止以后修改该配置文件出现问题。 执行修改配置文件的命令 vim/opt/myRedis/redis.co

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号