一种解决 HAProxy 节点网络异常(sendmsg: Invalid argument, Connection timed out )的办法

问题

之前连接一个 HAProxy 前端服务时总是时不时出现 [Errno 110] Connection timed out ,并且本地 ping 服务器丢包率特别高。 到服务器上看了以后,发现 ping 127.0.0.1 的丢包率也特别高,而且 ping 命令还出现 ping: sendmsg: Invalid argument 错误:

64 bytes from 127.0.0.1: icmp_seq=150 ttl=64 time=0.050 ms
64 bytes from 127.0.0.1: icmp_seq=151 ttl=64 time=0.062 ms
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
64 bytes from 127.0.0.1: icmp_seq=158 ttl=64 time=0.962 ms
64 bytes from 127.0.0.1: icmp_seq=159 ttl=64 time=0.033 ms

查看 dmesg -H 有很多类似 net_ratelimit: 478 callbacks suppressed 的记录:

```
[  +6.555833] net_ratelimit: 478 callbacks suppressed
[Oct19 11:08] net_ratelimit: 57 callbacks suppressed
```

需要提一下的就是,这个 HAProxy 服务在一个非常大的内网(large subnets)里, 内网里很多机器都会去连接这个服务。

解决方法

修改了一下 sysctl, 加大了 ARP cache:

$ sudo sysctl -w net.ipv4.neigh.default.gc_thresh1=1024
$ sudo sysctl -w net.ipv4.neigh.default.gc_thresh2=2048
$ sudo sysctl -w net.ipv4.neigh.default.gc_thresh3=4096
$ sudo sysctl -p
$ sudo sysctl -a |grep net.ipv4.neigh.default.gc_thresh

为什么要修改为上面的值?

先来看看这几个配置项的含义(摘自 https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt):

neigh/default/gc_thresh1 - INTEGER
    Minimum number of entries to keep.  Garbage collector will not
    purge entries if there are fewer than this number.
    Default: 128

neigh/default/gc_thresh2 - INTEGER
    Threshold when garbage collector becomes more aggressive about
    purging entries. Entries older than 5 seconds will be cleared
    when over this number.
    Default: 512

neigh/default/gc_thresh3 - INTEGER
    Maximum number of neighbor entries allowed.  Increase this
    when using large numbers of interfaces and when communicating
    with large numbers of directly-connected peers.
    Default: 1024

ARP 相关的 简单解释就是 (详见 arp(7)):

  • net.ipv4.neigh.default.gc_thresh1: min IPV4 entries to keep in ARP cache - garbage collection never runs if this many or less entries are in cache
  • net.ipv4.neigh.default.gc_thresh2: IPV4 entries allowed in ARP cache before garbage collection will be scheduled in 5 seconds
  • net.ipv4.neigh.default.gc_thresh3: maximum IPV4 entries allowed in ARP cache; garbage collection runs when this many entries reached

然后我们通过 arp -an|wc -l 查看当前记录的 ARP 记录的数量:

$ arp -an|wc -l
1108

或者通过 ip -4 neigh show nud all | wc -l 查看当前 IPv4 的 ARP 记录的数量:

$ ip -4 neigh show nud all | wc -l
1112

可以看到上面的值比 net.ipv4.neigh.default.gc_thresh3 的默认值 1024 要大, <del>*此时就会进行 gc 操作,如果 gc 操作持续时间太久就会导致新的 ARP 记录无法被创建,进而导致 ARP 通信无法正常完成,TCP 之类的操作更加就无法完成了(有空的时候再仔细求证这个理解...)* ,</del> 所以我们要修改为更大的值。

如果上面的值特别大,可以考虑配置再大一点的值,比如:

net.ipv4.neigh.default.gc_thresh1 = 8192
net.ipv4.neigh.default.gc_thresh2 = 32768
net.ipv4.neigh.default.gc_thresh3 = 65536

注:上面修改的都是 IPv4 相关的配置,如果有用到 IPv6 网络的话可以把对应的配置项也修改一下。 注:如果机器性能特别好或者比较介意 gc,可以考虑把值调到非常非常大,然后禁用 gc: net.ipv4.neigh.default.gc_interval, net.ipv4.neigh.default.gc_stale_time


Comments