Eric reported and diagnosed another possible deadlock problem with
the MPTCP diag code. The first patch address the issue, and the second
adds self-test coverage for the relevant codepath, to get lockdep help
to catch future similar issues.
Paolo Abeni (2):
mptcp: fix possible deadlock in subflow diag
selftests: mptcp: explicitly trigger the listener diag code-path
net/mptcp/diag.c | 3 +++
tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++
2 files changed, 33 insertions(+)
--
2.43.0
The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.
Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.
Signed-off-by: Paolo Abeni <[email protected]>
---
tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 60a7009ce1b5..3ab584b38566 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -81,6 +81,21 @@ chk_msk_nr()
__chk_msk_nr "grep -c token:" "$@"
}
+chk_listener_nr()
+{
+ local expected=$1
+ local msg="$2"
+
+ if [ $expected -gt 0 ] && \
+ ! mptcp_lib_kallsyms_has "mptcp_diag_dump_listeners"; then
+ printf "%-50s%s\n" "$msg - mptcp" "[ skip ]"
+ mptcp_lib_result_skip "many listener sockets"
+ else
+ __chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp"
+ fi
+ __chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
wait_msk_nr()
{
local condition="grep -c token:"
@@ -279,5 +294,20 @@ flush_pids
chk_msk_inuse 0 "many->0"
chk_msk_cestab 0 "many->0"
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+ ip netns exec $ns ./mptcp_connect -p $((I + 20001)) -l 0.0.0.0 2>&1 >/dev/null &
+ mptcp_lib_wait_local_port_listen $ns $((I + 20001))
+done
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# gracefull termination
+for I in $(seq 1 $NR_SERVERS); do
+ echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 2>&1 >/dev/null
+done
+flush_pids
+
mptcp_lib_result_print_all_tap
exit $ret
--
2.43.0
Syzbot and Eric reported a lockdep splat in the subflow diag:
WARNING: possible circular locking dependency detected
6.8.0-rc4-syzkaller-00212-g40b9385dd8e6 #0 Not tainted
syz-executor.2/24141 is trying to acquire lock:
ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
ffff888045870130 (k-sk_lock-AF_INET6){+.+.}-{0:0}, at:
tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
but task is already holding lock:
ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at: spin_lock
include/linux/spinlock.h:351 [inline]
ffffc9000135e488 (&h->lhash2[i].lock){+.+.}-{2:2}, at:
inet_diag_dump_icsk+0x39f/0x1f80 net/ipv4/inet_diag.c:1038
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&h->lhash2[i].lock){+.+.}-{2:2}:
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__inet_hash+0x335/0xbe0 net/ipv4/inet_hashtables.c:743
inet_csk_listen_start+0x23a/0x320 net/ipv4/inet_connection_sock.c:1261
__inet_listen_sk+0x2a2/0x770 net/ipv4/af_inet.c:217
inet_listen+0xa3/0x110 net/ipv4/af_inet.c:239
rds_tcp_listen_init+0x3fd/0x5a0 net/rds/tcp_listen.c:316
rds_tcp_init_net+0x141/0x320 net/rds/tcp.c:577
ops_init+0x352/0x610 net/core/net_namespace.c:136
__register_pernet_operations net/core/net_namespace.c:1214 [inline]
register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1283
register_pernet_device+0x33/0x80 net/core/net_namespace.c:1370
rds_tcp_init+0x62/0xd0 net/rds/tcp.c:735
do_one_initcall+0x238/0x830 init/main.c:1236
do_initcall_level+0x157/0x210 init/main.c:1298
do_initcalls+0x3f/0x80 init/main.c:1314
kernel_init_freeable+0x42f/0x5d0 init/main.c:1551
kernel_init+0x1d/0x2a0 init/main.c:1441
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
-> #0 (k-sk_lock-AF_INET6){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
lock_sock_fast include/net/sock.h:1723 [inline]
subflow_get_info+0x166/0xd20 net/mptcp/diag.c:28
tcp_diag_put_ulp net/ipv4/tcp_diag.c:100 [inline]
tcp_diag_get_aux+0x738/0x830 net/ipv4/tcp_diag.c:137
inet_sk_diag_fill+0x10ed/0x1e00 net/ipv4/inet_diag.c:345
inet_diag_dump_icsk+0x55b/0x1f80 net/ipv4/inet_diag.c:1061
__inet_diag_dump+0x211/0x3a0 net/ipv4/inet_diag.c:1263
inet_diag_dump_compat+0x1c1/0x2d0 net/ipv4/inet_diag.c:1371
netlink_dump+0x59b/0xc80 net/netlink/af_netlink.c:2264
__netlink_dump_start+0x5df/0x790 net/netlink/af_netlink.c:2370
netlink_dump_start include/linux/netlink.h:338 [inline]
inet_diag_rcv_msg_compat+0x209/0x4c0 net/ipv4/inet_diag.c:1405
sock_diag_rcv_msg+0xe7/0x410
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
As noted by Eric we can break the lock dependency chain avoid
dumping any extended info for the mptcp subflow listener:
nothing actually useful is presented there.
Fixes: b8adb69a7d29 ("mptcp: fix lockless access in subflow ULP diag")
Reported-by: Eric Dumazet <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
---
net/mptcp/diag.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c
index 6ff6f14674aa..7017dd60659d 100644
--- a/net/mptcp/diag.c
+++ b/net/mptcp/diag.c
@@ -21,6 +21,9 @@ static int subflow_get_info(struct sock *sk, struct sk_buff *skb)
bool slow;
int err;
+ if (inet_sk_state_load(sk) == TCP_LISTEN)
+ return 0;
+
start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP);
if (!start)
return -EMSGSIZE;
--
2.43.0
On Wed, 2024-02-21 at 12:46 +0100, Paolo Abeni wrote:
> The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
>
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
>
> Signed-off-by: Paolo Abeni <[email protected]>
> ---
> tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
> index 60a7009ce1b5..3ab584b38566 100755
> --- a/tools/testing/selftests/net/mptcp/diag.sh
> +++ b/tools/testing/selftests/net/mptcp/diag.sh
> @@ -81,6 +81,21 @@ chk_msk_nr()
> __chk_msk_nr "grep -c token:" "$@"
> }
>
> +chk_listener_nr()
> +{
> + local expected=$1
> + local msg="$2"
> +
> + if [ $expected -gt 0 ] && \
> + ! mptcp_lib_kallsyms_has "mptcp_diag_dump_listeners"; then
> + printf "%-50s%s\n" "$msg - mptcp" "[ skip ]"
> + mptcp_lib_result_skip "many listener sockets"
I was too hasty and this is too fragile/requires debug. A v2 with
something more reliable is needed. Given this is not going to land into
this week PR, we will probably send it with a burst with other fixes.
Thanks,
Paolo
On Wed, Feb 22, 2024 at 12:46:59PM +0100, Paolo Abeni wrote:
>l The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
>
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
>
> Signed-off-by: Paolo Abeni <[email protected]>
> ---
> tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
> index 60a7009ce1b5..3ab584b38566 100755
> --- a/tools/testing/selftests/net/mptcp/diag.sh
> +++ b/tools/testing/selftests/net/mptcp/diag.sh
> @@ -81,6 +81,21 @@ chk_msk_nr()
> __chk_msk_nr "grep -c token:" "$@"
> }
>
> +chk_listener_nr()
> +{
> + local expected=$1
> + local msg="$2"
> +
> + if [ $expected -gt 0 ] && \
> + ! mptcp_lib_kallsyms_has "mptcp_diag_dump_listeners"; then
> + printf "%-50s%s\n" "$msg - mptcp" "[ skip ]"
> + mptcp_lib_result_skip "many listener sockets"
> + else
> + __chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp"
> + fi
> + __chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
> +}
> +
> wait_msk_nr()
> {
> local condition="grep -c token:"
> @@ -279,5 +294,20 @@ flush_pids
> chk_msk_inuse 0 "many->0"
> chk_msk_cestab 0 "many->0"
>
> +chk_listener_nr 0 "no listener sockets"
> +NR_SERVERS=100
> +for I in $(seq 1 $NR_SERVERS); do
> + ip netns exec $ns ./mptcp_connect -p $((I + 20001)) -l 0.0.0.0 2>&1 >/dev/null &
> + mptcp_lib_wait_local_port_listen $ns $((I + 20001))
> +done
> +
> +chk_listener_nr $NR_SERVERS "many listener sockets"
> +
> +# gracefull termination
nit, as you plan to re-spin anyway: graceful
> +for I in $(seq 1 $NR_SERVERS); do
> + echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 2>&1 >/dev/null
I'm not sure of the validity of this but shellcheck complains that:
SC2069 (warning): To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).
Also for the same constriction slightly earlier in this patch.
> +done
> +flush_pids
> +
> mptcp_lib_result_print_all_tap
> exit $ret
> --
> 2.43.0
>
>
Hi Simon,
Thank you for the review!
On 22/02/2024 2:13 pm, Simon Horman wrote:
> On Wed, Feb 22, 2024 at 12:46:59PM +0100, Paolo Abeni wrote:
>> l The mptcp diag interface already experienced a few locking bugs
>> that lockdep and appropriate coverage have detected in advance.
>>
>> Let's add a test-case triggering the relevant code path, to prevent
>> similar issues in the future.
>>
>> Signed-off-by: Paolo Abeni <[email protected]>
>> ---
>> tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++
>> 1 file changed, 30 insertions(+)
>>
>> diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
>> index 60a7009ce1b5..3ab584b38566 100755
>> --- a/tools/testing/selftests/net/mptcp/diag.sh
>> +++ b/tools/testing/selftests/net/mptcp/diag.sh
>> @@ -81,6 +81,21 @@ chk_msk_nr()
>> __chk_msk_nr "grep -c token:" "$@"
>> }
>>
>> +chk_listener_nr()
>> +{
>> + local expected=$1
>> + local msg="$2"
>> +
>> + if [ $expected -gt 0 ] && \
>> + ! mptcp_lib_kallsyms_has "mptcp_diag_dump_listeners"; then
>> + printf "%-50s%s\n" "$msg - mptcp" "[ skip ]"
>> + mptcp_lib_result_skip "many listener sockets"
>> + else
>> + __chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp"
>> + fi
>> + __chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
>> +}
>> +
>> wait_msk_nr()
>> {
>> local condition="grep -c token:"
>> @@ -279,5 +294,20 @@ flush_pids
>> chk_msk_inuse 0 "many->0"
>> chk_msk_cestab 0 "many->0"
>>
>> +chk_listener_nr 0 "no listener sockets"
>> +NR_SERVERS=100
>> +for I in $(seq 1 $NR_SERVERS); do
>> + ip netns exec $ns ./mptcp_connect -p $((I + 20001)) -l 0.0.0.0 2>&1 >/dev/null &
>> + mptcp_lib_wait_local_port_listen $ns $((I + 20001))
>> +done
>> +
>> +chk_listener_nr $NR_SERVERS "many listener sockets"
>> +
>> +# gracefull termination
>
> nit, as you plan to re-spin anyway: graceful
Paolo sent a new version on MPTCP ML only. When applying the new
version, I fixed this:
https://lore.kernel.org/mptcp/[email protected]/
>> +for I in $(seq 1 $NR_SERVERS); do
>> + echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 2>&1 >/dev/null
>
> I'm not sure of the validity of this but shellcheck complains that:
>
> SC2069 (warning): To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).
>
> Also for the same constriction slightly earlier in this patch.
Good catch!
I should add shellcheck support in our CI to catch that :)
I just applied the suggested modification in our tree [1]. The new
version that will be sent to netdev will no longer have this issue.
https://github.com/multipath-tcp/mptcp_net-next/commit/31f1460d552c
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
On Thu, Feb 22, 2024 at 03:49:44PM +0100, Matthieu Baerts wrote:
> Hi Simon,
>
> Thank you for the review!
>
> On 22/02/2024 2:13 pm, Simon Horman wrote:
> > On Wed, Feb 22, 2024 at 12:46:59PM +0100, Paolo Abeni wrote:
> >> l The mptcp diag interface already experienced a few locking bugs
> >> that lockdep and appropriate coverage have detected in advance.
> >>
> >> Let's add a test-case triggering the relevant code path, to prevent
> >> similar issues in the future.
> >>
> >> Signed-off-by: Paolo Abeni <[email protected]>
> >> ---
> >> tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++
> >> 1 file changed, 30 insertions(+)
> >>
> >> diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
> >> index 60a7009ce1b5..3ab584b38566 100755
> >> --- a/tools/testing/selftests/net/mptcp/diag.sh
> >> +++ b/tools/testing/selftests/net/mptcp/diag.sh
> >> @@ -81,6 +81,21 @@ chk_msk_nr()
> >> __chk_msk_nr "grep -c token:" "$@"
> >> }
> >>
> >> +chk_listener_nr()
> >> +{
> >> + local expected=$1
> >> + local msg="$2"
> >> +
> >> + if [ $expected -gt 0 ] && \
> >> + ! mptcp_lib_kallsyms_has "mptcp_diag_dump_listeners"; then
> >> + printf "%-50s%s\n" "$msg - mptcp" "[ skip ]"
> >> + mptcp_lib_result_skip "many listener sockets"
> >> + else
> >> + __chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp"
> >> + fi
> >> + __chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
> >> +}
> >> +
> >> wait_msk_nr()
> >> {
> >> local condition="grep -c token:"
> >> @@ -279,5 +294,20 @@ flush_pids
> >> chk_msk_inuse 0 "many->0"
> >> chk_msk_cestab 0 "many->0"
> >>
> >> +chk_listener_nr 0 "no listener sockets"
> >> +NR_SERVERS=100
> >> +for I in $(seq 1 $NR_SERVERS); do
> >> + ip netns exec $ns ./mptcp_connect -p $((I + 20001)) -l 0.0.0.0 2>&1 >/dev/null &
> >> + mptcp_lib_wait_local_port_listen $ns $((I + 20001))
> >> +done
> >> +
> >> +chk_listener_nr $NR_SERVERS "many listener sockets"
> >> +
> >> +# gracefull termination
> >
> > nit, as you plan to re-spin anyway: graceful
>
> Paolo sent a new version on MPTCP ML only. When applying the new
> version, I fixed this:
>
> https://lore.kernel.org/mptcp/[email protected]/
>
> >> +for I in $(seq 1 $NR_SERVERS); do
> >> + echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 2>&1 >/dev/null
> >
> > I'm not sure of the validity of this but shellcheck complains that:
> >
> > SC2069 (warning): To redirect stdout+stderr, 2>&1 must be last (or use '{ cmd > file; } 2>&1' to clarify).
> >
> > Also for the same constriction slightly earlier in this patch.
>
> Good catch!
>
> I should add shellcheck support in our CI to catch that :)
>
> I just applied the suggested modification in our tree [1]. The new
> version that will be sent to netdev will no longer have this issue.
>
> https://github.com/multipath-tcp/mptcp_net-next/commit/31f1460d552c
Thanks Matt,
much appreciated.