tcp socket发送缓冲区探究结论
1: 未设置SO_SNDBUF时,sk->sk_sndbuf值由tcp_finish_connect->tcp_init_buffer_space->tcp_sndbuf_expand决定,TCP协议栈会自己计算一个值出来46080,sk_sndbuf是46080和net.ipv4.tcp_wmem[2](4194304)的较小值
2: 设置SO_SNDBUF后,tcp_sndbuf_expand将不会再被调用,其值情况完全由sock_setsockopt决定
2-1: 设置值较小 value < 2304 { SOCK_MIN_SNDBUF(4608)/2 }
sk_sndbuf = 4608
2-2: 设置值适中 { SOCK_MIN_SNDBUF(4608)/2 } < value < net.core.wmem_max
sk_sndbuf = value*2
2-3: 设置值较大 value > net.core.wmem_max
sk_sndbuf = net.core.wmem_max* 2
默认情况下(未设置SO_SNDBUF)
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.tcp_wmem = 4096 16384 4194304
- TCPsocket未connect之前 sendbuf:16384 sk->sk_sndbuf是sysctl_tcp_wmem[1]的值
- connect之后,sendbuf:46080
通过调试机制可知,sendbuf默认大小为sysctl_tcp_wmem[1] 为16384
connect连接连接到服务端后,sendbuf变为46080,该值不是尚书配置中任何一个值
原因探究
阶段1:tcp_init_sock初始化,sk->sk_sndbuf = sysctl_tcp_wmem[1]
阶段2:主动连接进入ES状态时候,状态切换时候调用tcp_sndbuf_expand调整sk_sndbuf
stp脚本探测结果如下:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18tcp_v4_connect[2017/6/20,10:57:56]local=0.0.0.0:3000,remote=0.0.0.0:0 state:CLOSE,sndbubf 0:16384
tcp_v4_connect return [2017/6/20,10:57:56]local=192.168.55.178:3000,remote=180.97.33.108:80 state:SYN_SENT,sndbubf 1280:16384
tcp_input:302 return [2017/6/20,10:57:56]local=192.168.55.178:3000,remote=180.97.33.108:80 state:ESTABLISHED,sndbubf 0:16384 sndmem : 46080 permss 2304
0xffffffff815c3527 : tcp_sndbuf_expand+0x67/0x90 [kernel]
0xffffffff815c7ba8 : tcp_init_buffer_space+0x178/0x190 [kernel]
0xffffffff815cbbae : tcp_finish_connect+0x6e/0x120 [kernel]
0xffffffff815cc297 : tcp_rcv_state_process+0x637/0xf20 [kernel]
0xffffffff815d5ffb : tcp_v4_do_rcv+0x17b/0x340 [kernel]
0xffffffff815d76d9 : tcp_v4_rcv+0x799/0x9a0 [kernel]
0xffffffff815b1094 : ip_local_deliver_finish+0xb4/0x1f0 [kernel]
0xffffffff815b1379 : ip_local_deliver+0x59/0xd0 [kernel]
0xffffffff815b0d1a : ip_rcv_finish+0x8a/0x350 [kernel]
0xffffffff815b16a6 : ip_rcv+0x2b6/0x410 [kernel]
0xffffffff815700d2 : __netif_receive_skb_core+0x582/0x800 [kernel]
0xffffffff81570368 : __netif_receive_skb+0x18/0x60 [kernel]
0xffffffff815703f0 : netif_receive_skb_internal+0x40/0xc0 [kernel]
0xffffffff81571578 : napi_gro_receive+0xd8/0x130 [kernel]
0xffffffffa00472fc [e1000]
1 | static void tcp_sndbuf_expand(struct sock *sk) |
设置发送缓冲区大小为较小值
1 | socklen_t sendbuflen = 0; |
输出信息如下:
default,sendbuf:16384
now,sendbuf:4608
输出信息总结:设置sendbubf为100时,没有生效,反而设置出来一个较大的值4608
原因探究
1 |
|
设置socket选项SO_SNDBUF会触发系统调用最终调用sock_setsockopt函数,其处理设置选项过程如上:
其会将用户设置的缓冲区大小乘以2,然后和SOCK_MIN_SNDBUF(4608)比较,取较大值
因此最终较小的缓冲区设置值200没有生效,生效的是4608
设置发送缓冲区大小为中间值
缓冲区系统设置值大小:
net.core.wmem_max = 212992
net.ipv4.tcp_wmem = 4096 16384 4194304
实验动作将缓冲区大小设置为30001
2
3
4
5
6
7
8
9socklen_t sendbuflen = 0;
socklen_t len = sizeof(sendbuflen);
getsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("default,sendbuf:%d\n", sendbuflen);
socklen_t sendbuflen = 3000;
setsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, len);
getsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("now,sendbuf:%d\n", sendbuflen);
实验程序输出:
default,sendbuf:16384
now,sendbuf:6000
输出信息总结:设置大小3000生效,sndbuf大小会被设置成为3000*2
设置发送缓冲区大小威较大值
缓冲区系统设置值大小:
net.core.wmem_max = 212992
net.ipv4.tcp_wmem = 4096 16384 4194304
实验动作将缓冲区大小设置为2300001
2
3
4
5
6
7
8
9socklen_t sendbuflen = 0;
socklen_t len = sizeof(sendbuflen);
getsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("default,sendbuf:%d\n", sendbuflen);
socklen_t sendbuflen = 230000;
setsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, len);
getsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void*)&sendbuflen, &len);
printf("now,sendbuf:%d\n", sendbuflen);
实现程序输出:
default,sendbuf:16384
now,sendbuf:425984
实验结果分析:设置大小23000(大于系统212992),sendbuf最终结果为212992*2
原因探究
1 | case SO_SNDBUF: |
val为用户set的值,其在选择时候会同sysctl_wmem_max比较,选取一个较小的值,如果设置值大于sysctl_wmem_max值的话,val就取系统wmem的最大值。
如上可知:230000 > net.core.wmem_max ,所以用户设置SO_SNDBUF选项最大只能取net.core.wmem_max,所以最终sk_sndbubf值为net.core.wmem_max*2 即425984
其它说明
tcp socket记录当前发送队列的占用缓冲区大小的变量为sk_wmem_queued
和发送缓冲区判断函数如下:1
2
3
4
5
6
7
8static inline bool sk_stream_memory_free(const struct sock *sk)
{
if (sk->sk_wmem_queued >= sk->sk_sndbuf)
return false;
return sk->sk_prot->stream_memory_free ?
sk->sk_prot->stream_memory_free(sk) : true;
}
从上述判别中我们可以知道,发送缓冲区记录和比对单位均是字节